TimescaleDB: Continuous Aggregates
Hey everyone, in this blog we will learn how to create continuous aggregates in timescaledb,
To understand continuous aggregates, we will take a scenario where we have events table that contains response_time for the api hits to a given endpoint
CREATE TABLE events (
id bigserial,
endoint varchar,
response_time smallint,
created_at timestamptz
);
Now we will convert this table to a hypertable
SELECT create_hypertable('events', 'created_at', chunk_time_interval => INTERVAL '1 day'
Now let’s add some randomized data to the table
INSERT INTO events
(id, endpoint, response_time, created_at)
VALUES
(generate_series(1, 10000000), '/health', floor(random() * 900 + 100), timestamptz '2023-08-01 00:00:00 UTC' + random() * Interval '31 day')
Now let’s create a continuous aggregate over this table which will help us fetch average response time and daily hits per endpoint on a day basis
- Create a materialized view (continuous aggregate requires a special parameter time_bucket on the partitioning column of hypertable)
CREATE MATERIALIZED VIEW events_summary_daily
WITH (timescaledb.continuous) AS
SELECT endpoint,
time_bucket(INTERVAL '1 day', created_at) AS day,
AVG(response_time) AS avg_response_time,
COUNT(*) AS total_hits
FROM events
GROUP BY endpoint, day;
2. Create a refresh policy
start_offset -> It determines how much of historical data to consider relative to the time when policy runs
end_offset -> It determines how recent data to be considered
schedule_interval -> Refresh Interval
SELECT add_continuous_aggregate_policy('events_summary_daily',
start_offset => INTERVAL '1 month',
end_offset => INTERVAL '1 day',
schedule_interval => INTERVAL '1 day');
And voilla, we are done.