PostgreSQL: Automated Partitions using pgPartman

Karun Agarwal
3 min readAug 30, 2023

--

Hey everyone, in this post we’ll see how to use pg_partman to automatically create partitions:

Let’s consider a scenario where in we have an events table which contains some information with the timestamp:

Before we jump on to create partitioned table, let’s actually create a normal table and see how is the query time,

CREATE TABLE events (
id bigserial,
created_at timestamptz
);

Let’s insert 10 Crore records into this table (Took around 4 minutes on my system)

INSERT INTO 
events (id, created_at)
VALUES
(
generate_series(1, 100000000),
timestamptz '2023-08-01 00:00:00 UTC' + random() * INTERVAL '31 days'
)

Now let’s see how much time does it take to query count of events logged on a particular day,

EXPLAIN ANALYZE
SELECT count(*)
FROM events
WHERE
created_at >= '2023-08-05 00:00:00'
AND
created_at < '2023-08-06 00:00:00';

Execution Time: 6271.237 ms

Now, let’s create an index and see how much improvement we get on same query,

CREATE INDEX events_created_at_idx ON events (created_at);

Execution Time: 1068.649 ms

Now, let’s see the performance on the table partitioned by created_at, each partition holding data for a day

CREATE TABLE events_v2 (
id bigserial,
created_at timestamptz
) PARTITION BY RANGE(created_at);

SELECT
partman.create_parent('public.events_v2', 'created_at', 'native', 'daily');

Now, let’s copy the data of events table to events_v2 table

INSERT INTO events_v2
SELECT * FROM events;

If we see the description of table now, we will find that there are total 10 partitions, 9 based on range and 1 default partition,

\d+ events_v2;

Now, let’s run partition_data_proc to move data out of the default to their respective partitions as per predefined partition definition

CALL partman.partition_data_proc('public.events_v2');

Now, we will run VACUUM ANALYZE so as to clean up dead rows and refresh the statistics used by the query planner.

Now, let’s see the performance of same query as we did for non-partitioned table

EXPLAIN ANALYZE
SELECT count(*)
FROM events_v2
WHERE
created_at >= '2023-08-05 00:00:00'
AND
created_at < '2023-08-06 00:00:00';

Execution Time: 946.281 ms

You might be thinking no worth creating partitions as we are still getting same performance as we were getting in case of non-partitioned indexed table, but there’s a catch

Let’s now create index in case of partitioned table as well and see the performace improvements

CREATE INDEX events_v2_created_at_idx ON events_v2 (created_at)

We still see the same performance than what’s the catch?

Now, let’s actually set 48M memory limit to container and see how it changes the performance scenarios

Above illustration clearly reflects how a partitioned table outperforms a non-partitioned table in a memory constraint enviroment which is generally the case in real production systems where we might have billions of rows and limited memory.

--

--