Thread: Range partitioning query performance with date_trunc (vs timescaledb)
Hi, I'm trying to implement some range partitioning on timeseries data. But it looks some queries involving date_trunc() doesn't make use of partitioning. BEGIN; CREATE TABLE test ( time TIMESTAMP WITHOUT TIME ZONE NOT NULL, value FLOAT NOT NULL ) PARTITION BY RANGE (time); CREATE INDEX test_time_idx ON test(time DESC); CREATE TABLE test_y2010 PARTITION OF test FOR VALUES FROM ('2020-01-01') TO ('2021-01-01'); CREATE TABLE test_y2011 PARTITION OF test FOR VALUES FROM ('2021-01-01') TO ('2022-01-01'); CREATE VIEW vtest AS SELECT DATE_TRUNC('year', time) AS time, SUM(value) AS value FROM test GROUP BY 1; EXPLAIN (COSTS OFF) SELECT * FROM vtest WHERE time >= TIMESTAMP '2021-01-01'; ROLLBACK; The plan query all partitions: HashAggregate Group Key: (date_trunc('year'::text, test."time")) -> Append -> Seq Scan on test_y2010 test_1 Filter: (date_trunc('year'::text, "time") >= '2021-01-01 00:00:00'::timestamp without time zone) -> Seq Scan on test_y2011 test_2 Filter: (date_trunc('year'::text, "time") >= '2021-01-01 00:00:00'::timestamp without time zone) The view is there so show the use case, but we get almost similar plan with SELECT * FROM test WHERE DATE_TRUNC('year', time)>= TIMESTAMP '2021-01-01'; I tested a variation with timescaledb which seem using trigger based partitioning: BEGIN; CREATE EXTENSION IF NOT EXISTS timescaledb; CREATE TABLE test ( time TIMESTAMP WITHOUT TIME ZONE NOT NULL, value FLOAT NOT NULL ); SELECT create_hypertable('test', 'time', chunk_time_interval => INTERVAL '1 year'); CREATE VIEW vtest AS SELECT time_bucket('1 year', time) AS time, SUM(value) AS value FROM test GROUP BY 1; -- insert some data as partitions are created on the fly INSERT INTO test VALUES (TIMESTAMP '2020-01-15', 1.0), (TIMESTAMP '2021-12-15', 2.0); \d+ test EXPLAIN (COSTS OFF) SELECT * FROM vtest WHERE time >= TIMESTAMP '2021-01-01'; ROLLBACK; The plan query a single partition: GroupAggregate Group Key: (time_bucket('1 year'::interval, _hyper_1_2_chunk."time")) -> Result -> Index Scan Backward using _hyper_1_2_chunk_test_time_idx on _hyper_1_2_chunk Index Cond: ("time" >= '2021-01-01 00:00:00'::timestamp without time zone) Filter: (time_bucket('1 year'::interval, "time") >= '2021-01-01 00:00:00'::timestamp without time zone) Note single partition query only works with time_bucket(), not with date_trunc(), I guess there is some magic regarding this in time_bucket() implementation. I wonder if there is a way with a reasonable amount of SQL code to achieve this with vanilla postgres ? Maybe by taking assumption that DATE_TRUNC(..., time) <= time ? Thanks!
On Tue, 29 Aug 2023 at 19:40, Philippe Pepiot <phil@philpep.org> wrote: > I'm trying to implement some range partitioning on timeseries data. But it > looks some queries involving date_trunc() doesn't make use of partitioning. > > BEGIN; > CREATE TABLE test ( > time TIMESTAMP WITHOUT TIME ZONE NOT NULL, > value FLOAT NOT NULL > ) PARTITION BY RANGE (time); > CREATE INDEX test_time_idx ON test(time DESC); > CREATE TABLE test_y2010 PARTITION OF test FOR VALUES FROM ('2020-01-01') TO ('2021-01-01'); > CREATE TABLE test_y2011 PARTITION OF test FOR VALUES FROM ('2021-01-01') TO ('2022-01-01'); > CREATE VIEW vtest AS SELECT DATE_TRUNC('year', time) AS time, SUM(value) AS value FROM test GROUP BY 1; > EXPLAIN (COSTS OFF) SELECT * FROM vtest WHERE time >= TIMESTAMP '2021-01-01'; > ROLLBACK; > > The plan query all partitions: > I wonder if there is a way with a reasonable amount of SQL code to achieve this > with vanilla postgres ? The only options I see for you are 1) partition by LIST(date_Trunc('year', time)), or; 2) use a set-returning function instead of a view and pass the date range you want to select from the underlying table via parameters. I imagine you won't want to do #1. However, it would at least also allow the aggregation to be performed before the Append if you SET enable_partitionwise_aggregate=1. #2 isn't as flexible as a view as you'd have to create another function or expand the parameters of the existing one if you want to add items to the WHERE clause. Unfortunately, date_trunc is just a black box to partition pruning, so it's not able to determine that DATE_TRUNC('year', time) >= '2021-01-01' is the same as time >= '2021-01-01'. It would be possible to make PostgreSQL do that, but that's a core code change, not something that you can do from SQL. David
Re: Range partitioning query performance with date_trunc (vs timescaledb)
From
Philippe Pepiot
Date:
On 29/08/2023, David Rowley wrote: > On Tue, 29 Aug 2023 at 19:40, Philippe Pepiot <phil@philpep.org> wrote: > > I'm trying to implement some range partitioning on timeseries data. But it > > looks some queries involving date_trunc() doesn't make use of partitioning. > > > > BEGIN; > > CREATE TABLE test ( > > time TIMESTAMP WITHOUT TIME ZONE NOT NULL, > > value FLOAT NOT NULL > > ) PARTITION BY RANGE (time); > > CREATE INDEX test_time_idx ON test(time DESC); > > CREATE TABLE test_y2010 PARTITION OF test FOR VALUES FROM ('2020-01-01') TO ('2021-01-01'); > > CREATE TABLE test_y2011 PARTITION OF test FOR VALUES FROM ('2021-01-01') TO ('2022-01-01'); > > CREATE VIEW vtest AS SELECT DATE_TRUNC('year', time) AS time, SUM(value) AS value FROM test GROUP BY 1; > > EXPLAIN (COSTS OFF) SELECT * FROM vtest WHERE time >= TIMESTAMP '2021-01-01'; > > ROLLBACK; > > > > The plan query all partitions: > > > I wonder if there is a way with a reasonable amount of SQL code to achieve this > > with vanilla postgres ? > > The only options I see for you are > > 1) partition by LIST(date_Trunc('year', time)), or; > 2) use a set-returning function instead of a view and pass the date > range you want to select from the underlying table via parameters. > > I imagine you won't want to do #1. However, it would at least also > allow the aggregation to be performed before the Append if you SET > enable_partitionwise_aggregate=1. > > #2 isn't as flexible as a view as you'd have to create another > function or expand the parameters of the existing one if you want to > add items to the WHERE clause. > > Unfortunately, date_trunc is just a black box to partition pruning, so > it's not able to determine that DATE_TRUNC('year', time) >= > '2021-01-01' is the same as time >= '2021-01-01'. It would be > possible to make PostgreSQL do that, but that's a core code change, > not something that you can do from SQL. Ok I think I'll go for Set-returning function since LIST or RANGE on (date_trunc('year', time)) will break advantage of partitioning when querying with "time betwen x and y". Thanks!