TimeBase Read Throughput

This article examines the read performance of TimeBase, focusing on how overall server throughput scales as the number of concurrent consumers increases when reading a single historical data stream. The second part of this article explores how READ throughput scales for multiple streams.

Summary

Our measurements show that TimeBase can deliver historical data from a single stream to multiple concurrent consumers at a combined rate exceeding 7 million messages per second.

When reading multiple independent streams in parallel, combined throughput scaled up to 48.8 million messages per second with 16 consumers.
Per-consumer throughput decreased as more consumers were added, from about 4.6 million with 4 consumers to 3.0 million with 16 consumers.
This shows that while overall throughput grows with concurrency, efficiency per consumer declines.

Experiment 1 - Parallel READs of single stream

Environment

For this experiment results were obtained on a developer workstation with the following configuration:

CPU: i7-13850HX, 2100 Mhz, 20 Cores
RAM: 32G
SSD: Intel 660p NVMe
OS: Windows 11 PRO
OpenJDK 17.0.12
TimeBase 5.6.161 (default settings)

INFO: Initializing RAMDisk. Data cache size = 128MB.
INFO: Initializing Data Cache [PageSize = 65,536 bytes; Pages = 2044; MaxFileLength = 134,217,727 MB]

note

Disclaimer: This experiment was conducted on a developer laptop. Network throughput was explored in the next experiment described later in this document.

TimeBase and all consumers were running on the same machine. The TickDB Shell (TimeBase CLI) was used to conduct the test (see Appendix A for logs).

Specifically, the tptimea command was executed to read a given stream with a specified number of concurrent consumers. Each consumer read the entire stream independently. Once all consumers completed, the tool reported the combined average data rate, which is shown in the results.

The following diagram illustrates the test setup:

Historical Data

A 2 GB sample of compressed Level 1 market data from Bloomberg (Trades and BBO) was used as the test dataset.
The average encoded message size was 44 bytes.

Results

The TimeBase client library was configured with the default number of transport threads (2).

Consumers	Data Rate (msg/s)
1	1,031,602
2	1,959,044
3	2,779,573
4	3,567,595
5	5,155,398
6	5,984,051
7	6,298,562
8	6,497,084
9	6,574,442
10	6,730,122

Analysis:

Throughput scales almost linearly up to 5–6 consumers (≈6 M msg/s).
Beyond that, gains flatten: 7–10 consumers only add ~0.4 M msg/s total.
Efficiency per consumer drops from ~1.0 M msg/s (1–5 consumers) to ~0.67 M msg/s at 10 consumers.

note

Observation: Overall CPU utilization increased from ~25% to ~48% as the number of consumers grew across tests.

Parallel Reads of a Single Stream with Multiple Channels

In this test, the same dataset is consumed by 10 concurrent consumers, but the number of transport channels is increased beyond the default setting.

Consumers	Channels	Data Rate (msg/s)
10	2	6,730,122
10	4	6,927,077
10	6	7,095,391

In this experiment, throughput varied between runs. For each series, the median rate was used in the results table above.

Increasing number of channels above 8 was shown to be counter-productive.

Experiment 2 - AWS, Linux, Multiple streams, remote consumers

CPU: Intel(R) Xeon(R) Platinum 8488C, 64 vCPUs (m7i.16xlarge)
RAM: 256G
Disk: GP3 EBS (3000 IOPS)
Network: 25 Gigabit
OS: Amazon Linux 2023
OpenJDK 17.0.16
TimeBase 5.6.161 (default settings)
TimeBase.network.socket.receiveBufferSize=4194304
TimeBase.ramCacheSize=10737418240 (10G)
Storage format: 5.0
The benchmark implementation is available in: deltix.qsrv.hf.tickdb.tool.perf.thr.Benchmark_HistoricThroughput

The following diagram illustrate this test setup:

All consumers running remotely, each consumer runs in own JVM process (and has separate instance of TimeBase client connection).

TimeBase Server startup parameters:

export DELTIX_HOME=/home/ec2-user/timebase-home
export TIMEBASE_SERIAL=***
java -Xms60G -Xmx60G -XX:+AlwaysPreTouch $ADD_OPENS -cp "/home/ec2-user/timebase-home/lib/*" deltix.qsrv.comm.cat.TBServerCmd -tb -home /home/ec2-user/QSHome -port 8011

TimeBase clients startup parameters:

java -DTimeBase.network.socket.bufferSize=4194304 $ADD_OPENS -cp "/home/ec2-user/jars-bench/*" deltix.qsrv.hf.tickdb.benchmark.Benchmark_HistoricThroughput -url dxtick://172.31.17.217:8011 -producers 4 -consumers 1 -channelType streams -messageCount 200000000 -stream thr_4 -skipGeneration

Historical Data

Our market data in this test is historical OHLC bars in several TimeBase streams (about 1Gb each). Each bar message is encoded into 36 bytes.

Results

Consumers	Total Throughput (msg/s)	Per-Consumer Throughput (msg/s)
4	18,544,919	4,636,229
8	34,215,267	4,276,908
16	48,876,041	3,054,752

As you can see with 16 consumers reading data we are approaching 50 million messages per second, during this time we are consuming 22 Gigabit per second (These instances are using 25 Gigabit network).

CPU Usage

These tests were run on large AWS instances with 64 vCPUs, primarily to ensure sufficient network bandwidth and to minimize the impact of “noisy neighbors.” From a CPU standpoint, the servers are very much over-provisioned for the workload under test.

To evaluate CPU scaling effects, we restricted TimeBase to a smaller subset of vCPUs and observed the impact on performance. The test setup was the same as before, with 4 consumers. We first measured throughput with the full CPU set available to TimeBase, then repeated the tests while limiting the server to 8, 4, and 2 vCPUs.

As an example, to restrict TimeBase to 8 vCPUs we prefixed the server command line with:

taskset -c 10-13,42-45 java -XX:ActiveProcessorCount=8 ...

On this instance, there are 32 physical cores with hyper-threading enabled; vCPUs 0 and 32 map to physical core 0. Hence CPU groups 10-13 and 42-45 are HT pairs.

CPU Allocation	Median Throughput (M msg/s)
All 64 vCPUs	16.9
16 vCPUs	19.1
8 vCPUs	15.7
4 vCPUs	12.3
2 vCPUs	6.3

Analysis

The results show that throughput does not scale linearly with the number of vCPUs allocated to TimeBase. Even when restricted to 16 or 8 vCPUs, performance remains close to the levels observed with the full 64 vCPU server. Only at very low allocations (4 and especially 2 vCPUs) does throughput drop substantially.

This indicates that the 64 vCPU instance type is significantly over-provisioned in terms of raw compute. The main reason for choosing such large instances is network bandwidth and isolation from noisy neighbors, rather than CPU capacity. In practice, TimeBase requires only a fraction of the available cores to sustain tens of millions of messages per second.

Appendix: Sample of test logs when running tickdb shell

The following screenshots show how the TickDB Shell was used to execute these tests:

#/QuantServer/bin/tickdb.sh
==> set db dxtick://localhost:8011
==> open
==> set stream BLOOMBERG_TICKS
==> tptimea 1
54,986,443 messages in 53.302s; speed: 1,031,602 msg/s
==> tptimea 2
109,972,886 messages in 56.136s; speed: 1,959,044 msg/s
==> tptimea 3
164,959,329 messages in 59.347s; speed: 2,779,573 msg/s
==> tptimea 4
219,945,772 messages in 61.651s; speed: 3,567,595 msg/s
==> tptimea 5
274,932,215 messages in 53.329s; speed: 5,155,398 msg/s
==> tptimea 6
329,918,658 messages in 55.133s; speed: 5,984,051 msg/s
==> tptimea 7
384,905,101 messages in 61.110s; speed: 6,298,562 msg/s
==> tptimea 8
439,891,544 messages in 67.706s; speed: 6,497,084 msg/s
==> tptimea 9
494,877,987 messages in 75.273s; speed: 6,574,442 msg/s
==> tptimea 10
549,864,430 messages in 81.702s; speed: 6,730,122 msg/s

Variation of test results:

==> tptimea 5
274,932,215 messages in 52.905s; speed: 5,196,715 msg/s
==> tptimea 5
274,932,215 messages in 53.738s; speed: 5,116,160 msg/s
==> tptimea 5
274,932,215 messages in 53.329s; speed: 5,155,398 msg/s

CPU Limiting results

Baseline (all 64 vCPUs)
===================
Messages per second (total): 10691935
Messages per second (total): 13834487
Messages per second (total): 16897332
Messages per second (total): 19118745
Messages per second (total): 19728608

16 vCPUs / taskset -c 10-17,42-49 -XX:ActiveProcessorCount=16
===========================================================
Messages per second (total): 12326513
Messages per second (total): 13547267
Messages per second (total): 19073504
Messages per second (total): 19130289
Messages per second (total): 19212987

8 vCPUs / taskset -c 10-13,42-45 -XX:ActiveProcessorCount=8
===========================================================
Messages per second (total): 12520492
Messages per second (total): 11066078
Messages per second (total): 15768828
Messages per second (total): 15740207
Messages per second (total): 15798178


4 vCPUs / taskset -c 10-11,42-43 java -XX:ActiveProcessorCount=4
================================================================
Messages per second (total): 9379350
Messages per second (total): 8794736
Messages per second (total): 12498340
Messages per second (total): 12413300
Messages per second (total): 12313185


2 vCPUs / taskset -c 10,42 java -XX:ActiveProcessorCount=2
==========================================================
Messages per second (total): 4662106
Messages per second (total): 4629602
Messages per second (total): 6314626
Messages per second (total): 6325223
Messages per second (total): 6267848

TimeBase Read Throughput

Summary

Experiment 1 - Parallel READs of single stream​

Environment​

note

Historical Data​

Results​

note

Parallel Reads of a Single Stream with Multiple Channels​

Experiment 2 - AWS, Linux, Multiple streams, remote consumers​

Historical Data​

Results​

CPU Usage​

Analysis​

Appendix: Sample of test logs when running tickdb shell​

CPU Limiting results​

Experiment 1 - Parallel READs of single stream

Environment

Historical Data

Results

Parallel Reads of a Single Stream with Multiple Channels

Experiment 2 - AWS, Linux, Multiple streams, remote consumers

Historical Data

Results

CPU Usage

Analysis

Appendix: Sample of test logs when running tickdb shell

CPU Limiting results