TimeBase performance achievement and measurement.
Click Tag to Display Pages: performance tbd

Performance

Data Cache

TimeBase has the optional ability to retain clean time slices in memory after they have been read from, or written to disk. The amount of data that is maintained in memory is controlled by the Data Cache setting, in megabytes.

  • First, let us make it clear that TimeBase can work quite efficiently even with the Data Cache Size set to very small numbers. Even without OS-level data caching, TimeBase is able to utilize typical hard drives at about 50% of their maximum speed.
  • Second, when using local storage, the operating system caches recently used files in available free memory. Therefore, some caching takes place even without any configuration on TimeBase side. In fact, the more free RAM is on the server, the more data the operating system will be able to cache. When using local storage, it is definitely a good idea to configure the TimeBase server with a lot of seemingly unused free memory. In this case, depending on a multitude of factors, Data Cache configuration may not even be necessary for great performance.

However, there are several potential reasons to configure caching at TimeBase level:

  • The operating system makes caching decisions using heuristics, and it cannot be configured to give priority to TimeBase.
  • Reading data from the OS-level disk cache goes through the disk I/O API and incurs more overhead than having the data ready in the process’ own virtual memory space.
  • Index Blocks in TimeBase’s Data Cache are kept in efficient data structures, and do not require repeated parsing, as is the case when reading from the disk cache.

With Data Cache on, TimeBase will retain in memory as much recently used data as configured. When the cache is full, and more data needs to be added, the least recently used clean pages of data are evicted from cache, i.e. erased from memory, to free up room for new data.

Read Ahead

All modern operating systems support read-ahead for large files. When the operating system detects that a file of non-trivial size is being actively read by an application, it assumes that the application is likely to want to read the file farther forward. If sufficient memory is available for disk caching, and the disk is not terribly overloaded, the OS will often take initiative and pro-actively cache file content in front of the point being currently read.

The way the data is laid out in TimeBase is particularly compatible with the read-ahead heuristic: the data is, indeed, stored in a number of large files that are typically read sequentially. As a result, the operating system cooperates with TimeBase, providing it with a performance boost.

Performance Benchmarks

Throughput

  • TimeBase Streams (TCP)
    • 1 Producer x 1 Consumer = 1.8 M messages/sec
    • 1 Producer x 4 Consumers = 5.5 M messages/sec
  • TimeBase Topics (100 bytes payload)
    • 1 Producer x 1 Consumer = 9 M messages/sec
    • 1 Producer x 4 Consumers = 24 M messages/sec (6M per consumer)

Latency (one way, microsec)

  • The default TimeBase communication mode, called Normal Buffering, calls for messages being accumulated in a buffer for a small period of time, at which point the buffer is sent to the recipient as a whole. In the normal buffering mode, the delay in sending messages is determined by the ability of TimeBase to ask the operating system to schedule the buffer-flushing thread to sleep for a short period of time. Even if TimeBase asks to for a wake-up call in fractions of a millisecond, the operating system will often delay the wake-up call up to about one millisecond. As a result, messages may get delayed by up to one millisecond, but computing resources are utilized in the most efficient way possible. The default mode is good for querying historical data or for transmitting live data in situations when latency on the order of one millisecond is tolerable.
  • The second communication mode, called Low-Latency Buffering, also buffers messages, but does not use the operating system to set wake-up alarms. Instead, the buffer-flushing thread spins in place, waiting for a very short amount of time to flush accumulated messages. The spinning in place consumes one CPU core in each participant process that sends messages. However, worst-case latency drops down to about 20 microseconds, and messages are still sent in the efficiently buffered form.

  • Streams
    Mean = 49.2 (StdDev = 120)
    Percentiles:
    50% = 46
    90% = 56
    99% = 62
    99.9% = 66
    99.99% = 7065
    99.999% = 8781
    Max = 9822

  • Topics
    Mean = 0.372 (StdDev = 0.186)
    Percentiles:
    50% = 0.356
    90% = 0.373
    99% = 1.404
    99.9% = 1.484
    99.99% = 1.643
    99.999% = 4.735
    99.9999% = 34.239
    Max = 130.943