PostgreSQL: Automated Partitions using pgPartman
Hey everyone, in this post we’ll see how to use pg_partman to automatically create partitions:
Let’s consider a scenario where in we have an events table which contains some information with the timestamp:
Before we jump on to create partitioned table, let’s actually create a normal table and see how is the query time,
CREATE TABLE events (
id bigserial,
created_at timestamptz
);
Let’s insert 10 Crore records into this table (Took around 4 minutes on my system)
INSERT INTO
events (id, created_at)
VALUES
(
generate_series(1, 100000000),
timestamptz '2023-08-01 00:00:00 UTC' + random() * INTERVAL '31 days'
)
Now let’s see how much time does it take to query count of events logged on a particular day,
EXPLAIN ANALYZE
SELECT count(*)
FROM events
WHERE
created_at >= '2023-08-05 00:00:00'
AND
created_at < '2023-08-06 00:00:00';
Execution Time: 6271.237 ms
Now, let’s create an index and see how much improvement we get on same query,
CREATE INDEX events_created_at_idx ON events (created_at);
Execution Time: 1068.649 ms
Now, let’s see the performance on the table partitioned by created_at, each partition holding data for a day
CREATE TABLE events_v2 (
id bigserial,
created_at timestamptz
) PARTITION BY RANGE(created_at);
SELECT
partman.create_parent('public.events_v2', 'created_at', 'native', 'daily');
Now, let’s copy the data of events table to events_v2 table
INSERT INTO events_v2
SELECT * FROM events;
If we see the description of table now, we will find that there are total 10 partitions, 9 based on range and 1 default partition,
\d+ events_v2;
Now, let’s run partition_data_proc to move data out of the default to their respective partitions as per predefined partition definition
CALL partman.partition_data_proc('public.events_v2');
Now, we will run VACUUM ANALYZE so as to clean up dead rows and refresh the statistics used by the query planner.
Now, let’s see the performance of same query as we did for non-partitioned table
EXPLAIN ANALYZE
SELECT count(*)
FROM events_v2
WHERE
created_at >= '2023-08-05 00:00:00'
AND
created_at < '2023-08-06 00:00:00';
Execution Time: 946.281 ms
You might be thinking no worth creating partitions as we are still getting same performance as we were getting in case of non-partitioned indexed table, but there’s a catch
Let’s now create index in case of partitioned table as well and see the performace improvements
CREATE INDEX events_v2_created_at_idx ON events_v2 (created_at)
We still see the same performance than what’s the catch?
Now, let’s actually set 48M memory limit to container and see how it changes the performance scenarios
Above illustration clearly reflects how a partitioned table outperforms a non-partitioned table in a memory constraint enviroment which is generally the case in real production systems where we might have billions of rows and limited memory.