GROUP BY
Use group by
to aggregate data by a specific value(s). When group by
is included in the query, the output dataset includes just the last records aggregated by the specified value. In the below examples, we use the aggregate function count{}()
to get a number of records for each aggregated group for demonstration purposes.
tip
QQL supports group by
by BOOLEAN, CHAR, DATETIME, TIMEOFDAY, INTEGER, VARCHAR, and ENUM.
For example, let's group by
by a specific object type:
-- Output dataset includes the last records aggregated by symbol.
SELECT * FROM binance GROUP BY symbol
-- aggregates the output by the entry.level
-- count{}() aggregate function provides a records count for each aggregated group
SELECT entry.*, entry.level, count{}()
FROM binance
ARRAY JOIN entries AS entry
GROUP BY entry.level
You can group by
by more than one value:
-- Output dataset includes the last records aggregated by entry.level and symbol, count() aggregate function provides a records count for each aggregated group.
SELECT entry.*, entry.level, count{}()
FROM binance
ARRAY JOIN entries AS entry
GROUP BY entry.level, symbol
group by
does not have to be performed by a specific column name, like in previous examples. You can group by a calculated value:
SELECT THIS.*, avg{}(totalQuantity) FROM infoA GROUP BY (infoIdA % 3)
There may be a case, depending on each server configurations, when there is not enough server memory to process all the created groups at once, when their count is very large. In this case, groups are cached on disk, which may cause a significant query performance downgrade. There is a TimeBase server java vm system property called maxGroupsCount
which is set to one million by default: -DTimeBase.qql.maxGroupsCount=1000000
. When the groups count reaches the maxGroupsCount
value groups start to get cached on disk. On top of that, the system performs a server memory check before the caching procedure each time the groups count overpasses maxGroupsCount
value by 5%. To release the server memory, query may be performed in several steps, processing maxGroupsCount
in each step, which may cause delays in query processing. Please keep it in mind when creating your queries.
Group by Timestamp
Group by timestamp
is not an efficient operation as it produces a large number of groups. Use over time (1ms)
when you need to group by timestamp:
-- Sum total traded size for each timestamp
select sum{}(entry.size) from binance
array join (entries as array(TradeEntry)) as entry
over time (1ms)
where entry is not null