Data Architecture
TimeBase Storage 5.0
Time Slice Files
Messages are recorded and read from TimeBase in a chronological order according to their timestamps
. On the stream level, TimeBase distributes messages by timestamp
into a number of compact and often compressed files, each storing messages for a specific time range. These files are called Time Slice Files, or TSFs. The number of TSFs in a single stream could be very large.
You can configure file size when creating each file. TimeBase ensures that all TSFs are of a reasonable size - not too small, not too large.
The reason a TSF should not be too small is because there is non-trivial overhead of handling and maintaining individual files. Experiments on modern hardware show that, when TSF size is around 10MB (or more), TimeBase can store data at speeds close to 50% of the sustained throughput of the underlying storage. If TSF size is set to about 1MB (too small), the speed falls dramatically.
The reasons why TSFs should not be too large include:
- You have to re-write an entire TSF to edit even one record. The larger the file, the more work needs to be done.
- An important consideration when choosing the size of a TSF is that the size limits the amount of memory consumed by readers and writers.
A reader needs to retrieve all TSFs from disk, one at a time if no filter applies.
- If the maximum TSF size is 10MB, we can be certain that, for example, 1GB of RAM is sufficient to support 100 readers.
- If the maximum TSF size is 100MB, only 10 simultaneous readers will fit in 1GB of RAM.
The TSF's time range is widely variable because a single TSF holds messages for all entities that fall into the given time range. In extreme cases, a TSF may contain several messages all with the same timestamp
, or a TSF may contain a wide internal time gap.
When data is written to TimeBase, loaders buffer the most recent TSFs. The buffer is a dynamic data structure. As the cumulative size of the stored TSFs exceeds the defined threshold, TSFs are scheduled for commit (written to disk), and the loader switches over to the next TSF.
Note that the last TSF is not committed to disk until the threshold is reached or the writer is closed. Otherwise, the commit process would lock on the last TSF and delay the process of writing data to it. In such a case, should TimeBase experience abnormal termination like a system crash or an accidental power blackout, the content of all uncommitted TSFs would be lost.
Another reason for limiting the TSF maximum size is that recent TSFs are kept in a data cache. When the data cache gets full, the latest committed TSFs are evicted from it.
TSF Structure
The Index Block reveals where the Data Blocks are within the TSF. Each Data Block contains chronologically arranged messages for one given entity and a time range of the corresponding TSF. Data Blocks can be randomly accessed by readers (cursors). Therefore, for example, to read data for a specific entity, the Cursor needs to read the Index Block, after which it can directly access the required Data Block.
There is little overhead in the amount of extra data read from the disk. Typically, Data Blocks are lightly compressed on write and decompressed on read. The disk space reduction (sometimes 4x) comes at the expense of a relatively small additional CPU consumption. When data needs to be transmitted over the network, for example, when using Hadoop, light compression speeds up the reading.
TSF System
TSFs are organized into a balanced tree of folders in the file system.
Every stream is maintained in a separate root folder. Under the root, the may be zero or more levels of intermediate folders. TSFs are at the bottom of the hierarchy.
TimeBase keeps balance in the tree by ensuring that every folder has between N/2
and N
files, where N
is a configurable Maximum Folder Size. When a folder fills up (has N files or child folders), it is split into two folders, each having N/2 children. This process increases the number of child nodes in the parent folders, and may cause the parent to split.
Along with the basic tree structure, folders remember additional information about their children. Every folder remembers its first and last TSF. Based on this index, TimeBase is able to find the first and last TSFs for each entity very quickly, as well as navigate between TSFs.
TimeBase Storage 4.0
M-Files
Messages are stored in M-Files inside streams.
M-Files are critical for TimeBase administration. Each M-File is represented by a data file and an index file stored on disk. Message data is stored in the data file, while the index file allows the server to quickly find messages inside the data file by their timestamps. Both files grow as messages are added to the M-File.
Note that messages are placed chronologically in M-Files, and each of them occupies a different amount of space.
Distribution Factor
Another important stream parameter is called the Distribution Factor (DF). The DF is specified when a new stream is created. It determines the number of M-Files in a stream and how messages are distributed among these files.
DF = Max (dedicated M-Files)
DF is a positive number, which can also have a special out-of-band maximum (Max) value. This is the default setting. When DF is set to Max, messages for each symbol are stored into a separate, dedicated M-File. For example, all messages related to AAPL are written in one M-File, and all messages related to GOOG are written in another M-File.
Distributing messages between M-Files is very important for various reasons:
- Each symbol can be treated independently. For example, you can re-load AAPL data without wiping out, or affecting in any way, the GOOG data. This is why the Market Data Aggregator requires the destination stream to be set up with DF = Max.
- Very fast retrieval of small symbol subsets. Consider an example where you have a stream with tick data for 1000 stocks and you are backtesting an order execution strategy. In order to simulate order execution, the strategy needs tick data, but only for symbols it has outstanding orders for. This is likely to be a very small subset of the original 1000-symbol universe. The TimeBase server needs to read only a few files in order to supply the data to the backtested strategy.
- Since TimeBase knows that all messages in any given M-File are related to a specific symbol, it is not necessary to store the symbol in every message, resulting in space saving compared to alternative configurations.
DF = N (shared M-Files)
There is one important disadvantage in configuring a stream with dedicated M-Files: when selecting data simultaneously for more than a few hundred symbols, retrieval throughput begins to fall sharply.
For example, backtesting a typical equity alpha-generation or portfolio strategy on intraday data (such as one-minute bars) requires retrieving a fairly large data volume for a large number of symbols. If you selected all data from a stream with one-minute bars for 10,000 equities stored in dedicated M-Files, TimeBase would have to join messages from 10,000 M-Files on time stamp value. TimeBase uses a highly optimized join algorithm, but at a certain point this algorithm begins to run out of Level II CPU cache.
The fall of retrieval performance is non-linear, and begins abruptly at the point when the join algorithm begins to run out of Level II CPU cache. The exact number of symbols after which performance starts to fall depends on the CPU architecture. As a rule of thumb, this number is somewhere between the high hundreds and low thousands. In order to overcome the join performance limitation in cases when the mass-retrieval of data is important, the DF has to be set to a finite number.
When the DF is set to a positive number, between 1 and usually a few hundred, messages are proportionally distributed among the specified number of M-Files. When DF = 1, all messages in a stream are simply placed in a single M-File. When the DF is greater than 1, every symbol is assigned to a specific M-File, but multiple symbols share the same M-File. Limiting the number of M-Files solves the CPU cache miss problem, but entails the following costs:
- The data for each symbol can no longer be manipulated independently. The Market Data Aggregator cannot work with streams with a finite DF.
- Depending on the DF value, small symbol subsets may or may not take longer to retrieve. When the DF is relatively high, the impact is small.
- TimeBase has to store the symbol in every message, resulting in slightly increased disk space consumption.
Hybrid Solutions
Often, users require the ability to both mass-select
data and run selective retrieval
for a large number of symbols.
TimeBase provides several tools for these purposes:
- In case of different datasets, they can be stored into distinct streams, each with the properly configured DF. For example, a hybrid alpha/execution strategy is frequently designed to perform the slower but broader alpha phase on one-minute price bars and then execute algo-orders on tick data. In this case, store one-minute bars in a stream with DF = 1, and store ticks in a stream with the DF = Max.
- If the same dataset must be used in both modes, set the DF to a relatively high value. The exact DF value depends on the hardware, the type of queries you run, and the acceptable trade-off between mass and selective query performance. Start with a number in the low hundreds, such as 200, and adjust based on experimental results.
- For ultimate performance, you can have two versions of the same stream, one with DF = Max and one with DF = 1. It is possible to set up replication so that one stream is automatically produced from another.
M-File Truncation
TimeBase always stores messages in M-Files chronologically. This ensures that when a client is reading messages chronologically, the data file is also read strictly chronologically. There is no need to jump from one section of the data file to another.
Such design ensures that you achieve maximum performance on a defragmented file system without caching a substantial amount of data in the computer memory, which is not practical with very large datasets, such as large collections of tick data.
Strict adherence to the above policy excludes the possibility of a message insertion in the middle of an M-File without re-writing the entire section of the file, following the modification point:
In fact, TimeBase does strictly adhere to a sequential ordering policy. An attempt to insert data in the middle of any M-File causes the immediate truncation of the affected M-File at the insertion point.
Note, that only the M-File being inserted into is truncated; while the rest of the M-Files in the stream remain unaffected. Therefore, what happens when data is inserted critically depends on the Distribution Factor.
Physical Layout
TimeBase Folders
A TimeBase instance occupies one or more folders on the hard disk. When multiple folders are in use, TimeBase files can be distributed among them using a round robin approach to selecting storage locations. This is possible because TimeBase makes no assumptions about which folders contain which files. All files have unique names within the instance and can be placed in any of the multiple folders. On startup, TimeBase makes a catalog of folders and can find any of the required files, even if they have been moved around.
By using multiple folders you can:
- Spread data among multiple storage devices not connected into a disk array to increase the amount of stored data.
- Spread data among multiple storage devices not connected into a disk array to increase the data retrieval performance.
- Overcome specific file system shortcomings related to a large number of files being placed in a single folder. For example, placing more than a few thousand files into a single NTFS folder is not recommended. NTFS performance begins to fall exponentially after folder size exceeds about 4000 files. In this case, using multiple folders, even on the same physical device, is beneficial.
Comparison
TimeBase Storage 4.0
Advantages:
- Low latencies, minimum GC in runtime
- Fast historical data append for the new symbols
- Low memory consumption (can work with 1G heap size)
Disadvantages:
- No data compression on storage level
- File-per-symbol storage: big amount of IOps for large databases
append
andrewrite
modes only- Blocking "purge"/"truncate" opeations cause large latencies when executed in parallel
TimeBase Storage 5.0
Advantages:
- Optimized for live reading/writing
- Support of "insert" mode - data can be replaced and inserted
- Data compression in storage level
- Low IOps on storage level
- Cluster support
- Non-blocking "purge"/"truncate" operations
- "Spaces" (data partitioning) support on stream level
Disadvantages:
- Memory allocations
- Slow insert for historical data
- Memory consumption (heap size starting from 4G)