TimeBase structure highlights: messages, streams, topics.
Click Tag to Display Pages: structure tbd

Messages

The main purpose of TimeBase is to store messages.

In TimeBase, messages are (usually) small data records, always associated with a symbol, and almost always tagged with a timestamp.

Message Structure

Every message in TimeBase is automatically associated with a moment in absolute time and an entity.

Entity is an abstract concept that identifies the object, to which the event relates the most.

Entity consists of a symbol and a type.

Type of each message is defined by Schema.

Each message has:

  • Timestamp
  • Symbol
  • Some number of custom fields (simple or composite), according to schema

Class Definition

The structure (schema) of a message is defined by a class definition.

TimeBase uses the Object-Oriented Design terminology, because it fully supports OOD in modeling events.

Message classes can be organized into hierarchies.

Each class definition describes the fields (equivalent to attributes or data items in other terminologies), which an instance of this class may contain.

Some classes can be tagged as NON INSTANTIABLE (equivalent to abstract in other terminologies), if the class is introduced to define common properties of other classes, but no concrete message can be an instance of this class.

Example

For example, suppose we want to define a hierarchy of classes to describe financial events, such as trade, top bid change, and top offer change.

Schema will contain at least three classes of messages. Let us call them Trade, TopBid and TopOffer.

Trade will contain fields price and size, TopBid will contain bidPrice and bidSize, and TopOffer will contain offerPrice and offerSize.

Additionally, we would like to work with data from multiple exchanges.

We may later want to perform common operations, such as filtering, on the exchange code, regardless of the specific message type.

The best way to prepare for that is to organize the three classes into a hierarchy with a common parent.

We may call the parent class MarketMessage, and define the field exchangeCode at this level.

We end up with the following class hierarchy:

Enum Definition

TBD

Streams

Individual messages are organized into sequences, called Streams in TimeBase terminology.

A stream is a self-contained repository of data.

Each stream has its own schema (message class definitions), not shared with other streams.

A TimeBase instance is a collection of streams. Streams do not share anything but server resources.

When a message is added to TimeBase, the user explicitly specifies the destination stream.

When messages are selected from TimeBase, the user specifies one or more source streams.

If several source streams are specified, TimeBase will very efficiently merge the data from all of them, and return it accurately ordered by time.

Just as you can dynamically add or remove subscription to symbols, you can as easily add or remove entire streams from the subscription list of a cursor.

A parallel can be drawn between TimeBase streams and messages on one side, and relational tables and rows on the other side.

Both are durable collections of related, relatively small, flat-structured records of data (called messages and rows, accordingly).

However, streams support polymorphic messages, may be non-durable (see below), and differ from classic tables in so many other ways, that TimeBase terminology never uses the words table and row when referring to streams and messages.

Another parallel can be drawn between TimeBase streams and what is referred to as topic in standard messaging middleware terminology.

You can have one, or sometimes more, publishers sending messages to a topic, and any number of subscribers reading messages from a topic.

When message X is read by one subscriber, this does not affect other topic subscribers - they will still get copies of message X (the possibility of filtering is beside the point).

However, topics in a typical messaging system are not durable, or not designed to hold terabytes of searchable data.

Because durability and messaging are so tightly integrated in TimeBase, we use the term stream rather than topic, and typically refer to publishers as writers, and to subscribers as readers.

Durable and Transient Streams

Streams can be Durable or Transient in scope.

Writer can add messages to the tail of the stream, on the right.

Readers are independently reading the transient stream from left to right.

Below diagrams illustrate the difference between a durable stream and a transient one.

When data is written into a durable stream, it is initially retained in memory, and eventually flushed to durable storage:

In reality, this process is a bit more robust than depicted above, in that even messages not yet read by all readers will be flushed to persistent storage.

Otherwise, in the event of abrupt termination (e.g., system crash) a lot of data could end up not stored on disk.

But for the purposes of transient vs. durable discussion, we will stick with the above approximation.

The content of a transient stream is not persisted anywhere.

Transient messages are stored in a memory buffer, which acts as a queue with one tail and multiple heads. When data in a transient stream is read by all readers, it is discarded.

Example

Suppose that Readers 1-3 are the only registered readers.

Reader 3 is the slowest one (behind the other two).

When Reader 3 reads the next message (on the far left of the box), it will “fall off the queue” and be discarded.

If at this time Reader 4 were to subscribe for this stream, it would never get the discarded message.

In fact, if no readers are subscribed to a transient stream, then all messages written to it are immediately discarded.

In TimeBase, a transient stream is almost indistinguishable from a durable stream API-wise.

Just like multiple clients may subscribe to a transient stream, they could subscribe to a durable stream in the same exact way.

A client can even open a cursor that is subscribed to a number of streams at the same time, some durable and some transient.

Buffer Size and Lossy vs. Lossless

If even one transient stream reader gets too slow while data is being added, the buffer begins to grow.

How much memory a buffer can consume can be limited in two ways:

  • It is possible to specify the maximum amount of memory to use, in bytes. Set this limit to a reasonable number, depending on the amount of available memory.
  • It is possible to specify the maximum time difference between head and tail. This is important when buffering live market data, and you do not need data older than 30 seconds.

Either or both ways of constraining buffer size can be used independently.

What happens when the limit is exceeded, and the buffer fills up, depends on the type of the transient stream.

There are two types of transient streams in TimeBase: lossy and lossless.

  • IMost common is lossy stream type. In a lossy stream, if the buffer gets full, the writer continues to write messages (on the right side of the diagram). This causes messages on the left side of the diagram to fall off and get discarded, even though they have not been read. In this case, one or more readers will miss the chance to read some messages, and will jump to the next available message when they go to retrieve the next one. TimeBase will diligently notify the slow readers of the fact that they are jumping over a gap.
  • In a lossless stream, when the buffer is full, the writer will be stopped (blocked), until the slowest reader frees up some room by reading more messages. A lossless stream may be the right option when data loss is not tolerable. However, in a lossless stream, even a single slow reader can slow down or even completely stop the entire process of dispatching messages, even if all other readers are able to handle the flow. The lossless configuration is used rarely in practice, but TimeBase makes this option available to users.

Unique Streams

TimeBase provides an ability to receive a copy of last known values for new stream subscribers.

Streams that use this feature are called unique streams.

Special action or API changes are not required to produce data into a Unique stream.

Special action is not required from a consumer to subscribe to Unique stream.

Note, that initial set of messages you receive is a snapshot of last messages.

You may inspect timestamps to distinguish live data and historical snapshot.

Unique Stream Behavior

Unique Streams have the same behavior as regular (DURABLE or TRANSIENT) streams and additionally contain internal cache of messages.

Cache is updated for each message written into the stream.

Cache key is the message primary key, which is defined in the class schema:

pk attribute of the deltix.qsrv.hf.pub.md.NonStaticDataField class

If schema does not contain any fields marked as pk, then symbol name is used as a primary key.

When user creates cursor on this stream, it additionally pushes all cache ahead of stream data.

Example
Consider a case when somebody publishes portfolio positions into TimeBase.

Every time a certain instrument position changes, we send a new message into TimeBase.

For rarely traded instruments there may be days or months between changes.

If we want to know positions for all instruments it may be very demanding to process the entire stream backwards, until we see the last message in each instrument.

Unique streams values solve this problem by keeping a cache of last position for each instrument in TimeBase server cache.

Any new subscriber will receive a snapshot of all known positions immediately before any live messages.
Usage Example

1. Create a TRANSIENT stream having:

1.1 deltix.qsrv.hf.tickdb.pub.StreamOptions#unique=true

1.2 started Data Producer

3. To get a snapshot of the latest messages, create cursor starting from current time with option

deltix.qsrv.hf.tickdb.pub.SelectionOptions#rebroadcast=true

4. To get snapshot of the latest messages only, create cursor starting from future time with option

deltix.qsrv.hf.tickdb.pub.SelectionOptions#rebroadcast=true

Topics

Topic is a high-performance messaging channel.

Topics use UDP or IPC protocol and based on Aeron.

Topic connects Producer and Consumers directly (data bypasses TimeBase server).

There is an option to write carbon copy of transmitted data to Durable TimeBase Stream (if throughput permits).

Topic Throughput

  • 1 Producer, 1 Consumer: 9M msg/s
  • 1 Producer, 4 Consumers: 6M msg/s each, 24M msg/s total (100 bytes per message)

TimeBase Topics Features

  • Low latency (down to 0.4us average, 35us for 99.9999% messages)
  • Higher throughput (up 10M for single consumer)
  • Better scaling with number of consumers for same topic
  • No persistence
  • No data filtering (no subscription)
  • High CPU consumption (even if idle)
  • Only IPC or multicast UDP (no simple 1to1 connections)

Stream Schema

TBD

Stream schemas are formed by classes and enums.

Schemas are self-contained: all classes and enums are described inside a specific schema and cannot be referenced from outside.

All classes that are elements of a stream are inherited from an instrument message.

Instrument Message - TBD

All other classes do not have to be inherited from an instrument message.

Abstract and not abstract classes - describe.

Types of messages that can form a stream.

Polymorphic streams are supported.

API for working with schemas - TBD

Schema migration - TBD

Using Web Admin to include descriptors of specific classes in stream schemas - main classes. Not selected classes are complementary.