TimeBase codecs performance
In this JMH microbenchmark we measure peformance of timebase message codecs.
Test description
We measure speed of encoders and decodes separately for three kind of messages:
- Empty message. JSON equivalent of this message is
{
"$type": "deltix.qsrv.hf.pub.InstrumentMessage",
"symbol": "ETHUSDT",
"timestamp": "2025-04-02T13:56:00.337Z"
}
note
This empty message is practically a "noop" operatoin for codes since symbol/timestamp/message length are system-level fields and appear in message header (outside of payload codecs).
- Incremetnal market data update containing single Trade inside the payload package. JSON equivalent:
{
"$type": "deltix.timebase.api.messages.universal.PackageHeader",
"symbol": "ETHUSDT",
"timestamp": "2025-04-02T13:56:00.337Z",
"originalTimestamp": "2025-04-02T13:56:00.257Z",
"receiveTimestamp": "2025-04-02T13:56:00.336Z",
"packageType": "INCREMENTAL_UPDATE",
"entries": [
{
"$type": "deltix.timebase.api.messages.universal.TradeEntry",
"exchangeId": "BINANCE",
"matchId": "2293043024",
"price": "1881.52",
"side": "SELL",
"size": "0.003"
}
],
"receivedTime": "2025-04-02T13:56:00.336Z"
}
- Full MBP order book snapshot (20 bids + 20 offer). JSON equivalent:
{
"$type": "deltix.timebase.api.messages.universal.PackageHeader",
"symbol": "BTCUSDT",
"timestamp": "2025-04-02T13:56:00.300Z",
"receiveTimestamp": "2025-04-02T13:56:00.247Z",
"packageType": "VENDOR_SNAPSHOT",
"entries": [
{
"$type": "deltix.timebase.api.messages.universal.L2EntryNew",
"exchangeId": "BINANCE",
"price": "85664.11",
"size": "0.01974",
"level": 0,
"side": "BID"
},
{
"$type": "deltix.timebase.api.messages.universal.L2EntryNew",
"exchangeId": "BINANCE",
"price": "85664.1",
"size": "0.00185",
"level": 1,
"side": "BID"
},
... 18 more entires ...
],
"firstSequenceNumber": 0,
"lastSequenceNumber": 0,
"receivedTime": "2025-04-02T13:56:00.247Z"
}
The source code for this benchmark is available in:
deltix.qsrv.hf.pub.codec.perf2.Benchmark_Codecs1
Environment
Results were measured on developer workstation:
- CPU: i7-13850HX, 2100 Mhz, 20 Cores
- RAM: 32G
- SSD: Intel 660p NVMe
- OS: Windows 11 PRO
- OpenJDK 17.0.12
- TimeBase 5.6.161
Results
Benchmark (codecType) (msgFile) Mode Cnt Score Error Units
Benchmark_Codecs1.decode compiled test_message_1_vendor_snapshot.json avgt 15 1390.452 ± 37.552 ns/op
Benchmark_Codecs1.decode compiled test_message_2_inc_update.json avgt 15 131.405 ± 1.907 ns/op
Benchmark_Codecs1.decode compiled test_message_3_minimal.json avgt 15 2.258 ± 0.068 ns/op
Benchmark_Codecs1.decode interpreted test_message_1_vendor_snapshot.json avgt 15 6650.045 ± 363.036 ns/op
Benchmark_Codecs1.decode interpreted test_message_2_inc_update.json avgt 15 375.875 ± 6.072 ns/op
Benchmark_Codecs1.decode interpreted test_message_3_minimal.json avgt 15 4.174 ± 0.099 ns/op
Benchmark_Codecs1.encode compiled test_message_1_vendor_snapshot.json avgt 15 1335.759 ± 35.540 ns/op
Benchmark_Codecs1.encode compiled test_message_2_inc_update.json avgt 15 58.077 ± 1.200 ns/op
Benchmark_Codecs1.encode compiled test_message_3_minimal.json avgt 15 0.795 ± 0.035 ns/op
Benchmark_Codecs1.encode interpreted test_message_1_vendor_snapshot.json avgt 15 5853.044 ± 97.254 ns/op
Benchmark_Codecs1.encode interpreted test_message_2_inc_update.json avgt 15 284.026 ± 7.198 ns/op
Benchmark_Codecs1.encode interpreted test_message_3_minimal.json avgt 15 2.137 ± 0.045 ns/op
Analysis
The benchmark highlights a consistent pattern: compiled codecs are an order of magnitude faster than interpreted codecs across all tested message types.
Minimal message (NOOP codec) The smallest possible message shows the lowest latency:
- Compiled: ~2.3 ns/op (decode), ~0.8 ns/op (encode)
- Interpreted: ~4.2 ns/op (decode), ~2.1 ns/op (encode)
Given that this kind of message is practically empty, we are essentially measuring encoding framework overhead.
Incremental update (single Trade)
- Compiled: ~131 ns/op (decode), ~58 ns/op (encode)
- Interpreted: ~376 ns/op (decode), ~284 ns/op (encode)
This shows a 3–5× advantage for compiled codecs, with latencies in the sub-100 ns range for encode.
Full order book snapshot (40 entries)
- Compiled: ~1390 ns/op (decode), ~1336 ns/op (encode)
- Interpreted: ~6650 ns/op (decode), ~5853 ns/op (encode)
For large payloads, compiled codecs are still ~5× faster. Interpreted latencies approach 6–7 μs per operation.
Overall, the results demonstrate linear scaling of compiled codecs with message size, while interpreted codecs accumulate overhead with each field.
Conclusion
The fastest result is the minimal message encode with compiled codecs at ~0.8 ns/op.
For practical workloads, compiled codecs remain consistently 3–7× faster than interpreted, making them the clear choice for high-throughput scenarios.