TimeBase Storage 5.0
Time Slice Files
Messages are recorded and read from TimeBase in a chronological order according to their
timestamps. On a stream level, TimeBase distributes messages by the
timestamp into a number of compact and often compresses files, each storing messages for a specific time range. These files are called Time Slice Files, or TSFs. The number of TSFs in a single stream could be very large.
Each file size can be configured when creating it and TimeBase ensures that all TSFs are of a reasonable size - not too small, not too large. The reason a TSF should not be too small is because there is non-trivial overhead of handling and maintaining individual files. Experiments on modern hardware show, that when a TSF size is around 10MB (or more), TimeBase can store data at speeds close to 50% of a sustained throughput of the underlying storage. If TSF size is set to about 1MB (bad), the speed falls dramatically. Reasons why TSFs should not be too large:
- You have to re-write an entire TSF to edit even one record. The larger the file, the more work needs to be done.
- The more important consideration in sizing a TSF is that it limits the amount of memory consumed by readers and writers.
A reader needs to retrieve all TSFs from disk, one at a time (if no filter applies)
- If the maximum TSF size is 10MB, then we can be certain that, for example, 1GB of RAM is sufficient to support 100 readers.
- If the maximum TSF size is 100MB, then only 10 simultaneous readers will fit in 1GB of RAM.
The TSF's time range is widely variable, because a single TSF holds messages for all entities, which fall into the given time range. In extreme cases, a TSF may contain several messages, all with the same
timestamp or a TSF may contain a wide internal time gap.
When data is written to TimeBase, loaders buffer the most recent TSFs. Buffer is a dynamic data structure. As the cumulative size of the stored TSFs exceeds the defined threshold, TSFs are scheduled for Commit (write to disk), and the Loader switches over to the next TSF.
Note, that the last TSF is never committed to disk, until the threshold is reached or the writer is closed. Otherwise, the commit process would lock on the last TSF and delay the process of writing data to it. As a result of such architecture, should TimeBase experience abnormal termination, system crash or accidental power blackout, the content of all uncommitted TSFs would be lost.
Another reason for limiting the TSF maximum size is that recent TSFs are kept in Data Cache. When Data Cache gets full, the latest committed TSFs are evicted from it.
The Index Block tells where the Data Blocks are within the TSF. Each Data Block contains chronologically arranged messages for one given entity and a time range of the corresponding TSF. Data Blocks can be randomly accessed by readers (cursors). Therefore, for example, to read data for a specific entity, the Cursor needs to read the Index Block, and then it can access directly the required Data Block.
There is little overhead in terms of the amount of extra data read from the disk. Typically, Data Blocks are lightly compressed on write and decompressed on read. The disk space reduction (sometimes 4X) comes at the expense of a relatively small additional CPU consumption. When data needs to be transmitted over the network, for example when using Hadoop, light compression speeds up the reading.
TSFs are organized into a balanced tree of folders in the file system.
Every stream is maintained in a separate root folder. Under the root, the may be zero or more levels of intermediate folders. TSFs are at the bottom of the hierarchy.
TimeBase keeps balance in the tree by ensuring that every folder has between
N files, where
N is a configurable Maximum Folder Size. When a folder fills up (has N files or child folders), it is split into two folders, each having N/2 children. This process increases the number of child nodes in the parent folders, and may cause the parent to split.
Along with the basic tree structure, folders remember an additional information about their children. Every folder remembers its first and last TSF. Based on this index, TimeBase is able to very quickly find the first and last TSFs for each entity, as well as to navigate between TSFs.