Streams
Durable Streams
The content of durable streams is persisted on disk.
Durable messages are stored in Time Slice Files (TSF) of a predefined size. One of the specifics of such data storage model is that data in the latest TSF file, until persisted, is stored in the system memory. Until data in the latest TSF file is flushed on disk, there may be a risk of loosing this data.
This may be very important for rarely updated streams, where data in the latest TSF file may not be persisted for quite a long time period, because file capacity has not been reached. To mitigate risks of loosing data, durable streams may be configured to flush data on disk at a predefined time and the size of the actual TSF file may be adjusted as well. Data may be flushed by TimeBase automatically or via API on the side of data loaders.
//Create a durable stream
RecordclassNameDescriptor classNameDescriptor = Introspector.createMessageIntrospector().introspectRecordclassName(TestMessage.className);
...
String streamName = "durable";
StreamOptions options = StreamOptions.fixedType(StreamScope.DURABLE, streamName, streamName, 0, classNameDescriptor);
DXTickStream stream = db.createStream(streamName, streamOptions);
...
info
You can also create a durable stream by running a Data Definition Language command.
Transient Streams
Transient types of TimeBase streams are used for message brokerage. Transient messages are stored in the system circular memory buffer. When data in a transient stream is read by all readers (cursors), it is discarded.
Example
Suppose that Cursors 1-3 are the only registered readers. Cursor 3 is the slowest one (behind the other two). When Cursor 3 reads the next message (on the far left of the box), it will "fall off the queue" and be discarded. If at this time Cursor 4 were to subscribe for this stream, it would never get the discarded message. In fact, if no cursors are subscribed to a transient stream, then all messages written to it are immediately discarded. In TimeBase, a transient stream is almost indistinguishable from a durable stream API-wise. Just like multiple clients may subscribe to a transient stream, they could subscribe to a durable stream in the exact same way. A client can even open a cursor that is subscribed to a number of streams at the same time, some durable and some transient.
//Create a transient stream
RecordclassNameDescriptor classNameDescriptor = Introspector.createMessageIntrospector().introspectRecordclassName(TestMessage.className);
...
String streamName = "transient";
StreamOptions options = StreamOptions.fixedType(StreamScope.TRANSIENT, streamName, streamName, 0, classNameDescriptor);
DXTickStream stream = db.createStream(streamName, streamOptions);
...
info
You can also create a transient stream by running a Data Definition Language command.
Memory Management for Transient Streams
As stated earlier, transient messages are stored in a circular memory buffer. In case any cursor (reader) is slower in consuming stream data, the pressure on the memory may start growing. In this case memory buffer will start expanding within predefined settings:
- Memory buffer maximum amount of memory to use, in bytes. Set this limit to a reasonable number, depending on the amount of available memory.
- Memory buffer maximum time difference between head and tail. This is important when buffering live data, and you do not need data older than the specified time range (e.g. 30 seconds).
Either or both ways can be used independently. What happens when the limit is exceeded, and the buffer fills up, depends on the type of the transient stream.
There are two types of transient streams:
- Lossy: it is the most common transient stream type. In case the buffer gets full, writers (loaders) continue to write messages (on the right side of the diagram). This causes messages on the left side of the diagram to fall off and get discarded, even though they have not been read. In this case, one or more slower readers may miss a chance to read some messages, and will jump to the next available message. TimeBase will diligently notify slow readers of the fact that they are jumping over a gap.
- Lossless: less common type of transient streams. In case the buffer is full, writers (loaders) will be stopped (blocked), until the slowest reader frees up some room by reading more messages. A lossless stream may be the right option when data loss is not tolerable. However, in a lossless stream, even a single slow reader can slow down or even completely stop the entire process of dispatching messages, even if all other readers are able to handle the flow. The lossless configuration is used rarely in practice, but TimeBase makes this option available.
//Create a lossless transient stream
...
RecordclassNameDescriptor classNameDescriptor = Introspector.createMessageIntrospector().introspectRecordclassName(TestMessage.className);
...
String streamName = "transient-lossless";
StreamOptions options = StreamOptions.fixedType(StreamScope.TRANSIENT, streamName, streamName, 0, classNameDescriptor);
streamOptions.bufferOptions = new BufferOptions();
streamOptions.bufferOptions.lossless = true;
DXTickStream stream = db.createStream(streamName, streamOptions);
...
Latest Events Snapshot
Consider a case, when somebody publishes trading portfolio events into TimeBase. Every time a certain trade instrument changes, we send a message to TimeBase. Some business cases may require getting a snapshot of the last values for each trading instrument in the portfolio. One of the solutions is to use a reversal cursor to scan the entire stream backwards to search for required messages. Due to the fact, that TimeBase stores data in a chronological order, such approach may be inefficient and time-consuming, because it may lead to going through all historical messages one by one, looking for the right once.
TimeBase offers a better solution to this problem with a unique
stream option (see example below - streamOptions.unique = true;
), that can be applied for both durable and transient types of streams. This option allows you to get a snapshot of the latest stream messages out-of-the-box. In case activated, it enables streams to keep a cache
of last
positions for each stream symbol in TimeBase server cache. Any new subscriber (cursor) receives a snapshot of all known positions immediately before any live messages. You may inspect timestamps
to distinguish between live data and historical data snapshots.
Cache key
is the message primary key (pk
), which is defined in the className schema. If schema does not contain any fields marked as pk
, then symbol name
is used as a primary key
. When user creates cursor on a unique stream, it additionally pushes all cache ahead of stream data. Cache is flushed on disk periodically and, in case of a system restart, contains the latest messages of each className per key
.
Unique streams have the same behavior as regular (Durable or Transient) streams and additionally contain internal cache of messages. Special action or API changes are not required to publish data into a Unique stream. Special action is not required from a subscriber (cursor) to read from a Unique stream.
//Create a durable stream with a unique cache
RecordclassNameDescriptor classNameDescriptor = Introspector.createMessageIntrospector().introspectRecordclassName(TestMessage.className);
...
String streamName = "durable-unique";
StreamOptions options = StreamOptions.fixedType(StreamScope.DURABLE, streamName, streamName, 0, messageDescriptor);
streamOptions.unique = true;
DXTickStream stream = db.createStream(streamName, streamOptions);
...
...
// Consume data from a durable stream with a unique stream option
SelectionOptions options = new SelectionOptions();
options.rebroadcast = true;
long startTime = System.currentTimeMillis();
try (TickCursor cursor = stream.select(startTime, options)) {
while (cursor.next()) {
InstrumentMessage message = cursor.getMessage();
if (message.getTimeStampMs() < startTime) {
// snapshot message
} else {
// live message
}
}
}
...
More Info
Data Sizing
This technical note provides TimeBase disk space utilization statistics.
RECOMMENDATIONS
- Use 5.X storage format (compression results in 8-10x space savings compared to 4.X).
- If you must use 4.X format, consider using MAX distribution factor for streams containing small number of contracts (e.g. less than 500). For 8x space savings, use OS-level compression to store large data volumes. TimeBase allows placing individual streams in dedicated volumes.
- Make sure TimeBase stream schema defines all unused fields as
static
(rather than filling with NULL in each message). - In some cases, you may want to use classic Level1/Level 2 market data format compared to newer Universal Market Data Format.
Market Data Type | 5.x Storage Format | 4.x Storage Format |
---|---|---|
Level 1 (BBO+Trades) | 6.8 Mb/1 million messages | 32 Mb/1 million messages |
Level 2 (Market by Level) - 10 levels | 12.7 Mb/1 million messages | 90 Mb / 1 million messages |
5.X Format Storage
LEVEL 1 MARKET DATA
Best-Bid-Offer and Trades data sample:
- CME (NYMEX) Crude Oil (CL) and Natural Gas (NG) FUTURE contracts.
Parameter | Value |
---|---|
Sample time range | 6 market days (4/23/2020-5/1/2020) |
Number of Level 1 messages stored | 26,896,346 |
Disk space | 184,147,968 bytes |
6.84 Mb per million of L1 messages |
LEVEL 2 MARKET DATA
Market-by-Level and Trades data sample:
- CME (NYMEX) Crude Oil (CL) and Natural Gas (NG) FUTURE contracts.
- Market depth: 10 levels
Parameter | Value |
---|---|
Sample time range | 30 market days (April 2020) |
Number of Level 2 messages stored | 459 million |
Disk space | 5.57 Gb |
12.75 Mb per million of L2 messages or 13 bytes per message |
4.X Format Storage
LEVEL 1 MARKET DATA
Best-Bid-Offer and Trades data sample:
- CME (NYMEX) Crude Oil (CL) and Natural Gas (NG) FUTURE contracts.
Parameter | Value |
---|---|
Sample time range | 6 market days (4/23/2020-5/1/2020) |
Number of Level 1 messages stored | 25,873,909 |
Disk space | 807 Mb |
32 Mb per million of L1 messages |
LEVEL 2 MARKET DATA
Market-by-Level and Trades data sample:
- CME (NYMEX) Crude Oil (CL) and Natural Gas (NG) FUTURE contracts.
- Market depth: 10 levels
Parameter | Value |
---|---|
Sample time range | 6 market days (4/23/2020-5/1/2020) |
Number of Level 2 messages stored | 43,617,735 |
Disk space | 4,046,630,912 bytes/541,920,890 bytes (gzip size) |
90 Mb per million of L2 messages or 93 bytes per message |
4.3 Classic Message Format
LEVEL 2 MARKET DATA
- FX market from 10 LPs for 27 currency pairs
- Market depth: 10 levels
Parameter | Value |
---|---|
Sample time range | 1 market day (10/21/2014) |
Number of Level 2 messages stored | 834,503,715 |
Disk space | 85,523,931,136 bytes |
100 Mb per million of L2 messages or 102 bytes per message |
Annexes
LEVEL2 STREAM SAMPLES
Snapshot message sample:
Incremental update message:
Working with Streams
To learn more about how to use streams, visit the How To page's Working with Streams section.