BUG #15779: Partition elimination doesn't work as expected when using PARTITION BY RANGE - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #15779: Partition elimination doesn't work as expected when using PARTITION BY RANGE |
Date | |
Msg-id | 15779-fd2fb982346b7a22@postgresql.org Whole thread Raw |
Responses |
Re: BUG #15779: Partition elimination doesn't work as expected whenusing PARTITION BY RANGE
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 15779 Logged by: Damir Ciganović-Janković Email address: damir.ciganovic.jankovic@gmail.com PostgreSQL version: 11.2 Operating system: Windows, Centos 6/7 Description: We have an issue that our queries are slow due to bad partition elimination when executing queries for our partitioned tables, here is an example, I will put all explain query outputs later for each explain separately: ------------ DROP TABLE IF EXISTS test; DROP TYPE IF EXISTS test_enum; CREATE TYPE test_enum AS ENUM ('FIRST', 'SECOND'); CREATE TABLE test (enum_col test_enum, timestamp_col timestamp, counter int) PARTITION BY RANGE (enum_col, timestamp_col); CREATE TABLE test_FIRST_1 PARTITION OF test FOR VALUES FROM ('FIRST', '2019-01-01') TO ('FIRST', '2019-01-02'); CREATE TABLE test_FIRST_2 PARTITION OF test FOR VALUES FROM ('FIRST', '2019-01-02') TO ('FIRST', '2019-01-03'); CREATE TABLE test_SECOND_1 PARTITION OF test FOR VALUES FROM ('SECOND', '2019-01-01') TO ('SECOND', '2019-01-02'); CREATE TABLE test_SECOND_2 PARTITION OF test FOR VALUES FROM ('SECOND', '2019-01-02') TO ('SECOND', '2019-01-03'); EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01'; EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and enum_col = 'FIRST'; EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and enum_col IN ('FIRST', 'SECOND'); ------------ This is what we get for first explain: EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01'; Append (cost=0.00..132.68 rows=36 width=16) (actual time=0.044..0.044 rows=0 loops=1) -> Seq Scan on test_first_1 (cost=0.00..33.13 rows=9 width=16) (actual time=0.017..0.017 rows=0 loops=1) Filter: (timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone) -> Seq Scan on test_first_2 (cost=0.00..33.13 rows=9 width=16) (actual time=0.010..0.010 rows=0 loops=1) Filter: (timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone) -> Seq Scan on test_second_1 (cost=0.00..33.13 rows=9 width=16) (actual time=0.008..0.008 rows=0 loops=1) Filter: (timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone) -> Seq Scan on test_second_2 (cost=0.00..33.13 rows=9 width=16) (actual time=0.007..0.007 rows=0 loops=1) Filter: (timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone) As we can see, we got all partitions despite setting timestamp. When we add an enum value, partition pruning is fine: EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and enum_col = 'FIRST'; Append (cost=0.00..37.76 rows=1 width=16) (actual time=0.015..0.015 rows=0 loops=1) -> Seq Scan on test_first_1 (cost=0.00..37.75 rows=1 width=16) (actual time=0.014..0.014 rows=0 loops=1) Filter: ((timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone) AND (enum_col = 'FIRST'::test_enum)) Another bug here is when we put both enum values in IN clause, we still go through all partitions despite timestamp targeting only _1 partitions: EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and enum_col IN ('FIRST', 'SECOND'); Append (cost=0.00..151.02 rows=4 width=16) (actual time=0.083..0.083 rows=0 loops=1) -> Seq Scan on test_first_1 (cost=0.00..37.75 rows=1 width=16) (actual time=0.045..0.045 rows=0 loops=1) Filter: ((enum_col = ANY ('{FIRST,SECOND}'::test_enum[])) AND (timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone)) -> Seq Scan on test_first_2 (cost=0.00..37.75 rows=1 width=16) (actual time=0.012..0.012 rows=0 loops=1) Filter: ((enum_col = ANY ('{FIRST,SECOND}'::test_enum[])) AND (timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone)) -> Seq Scan on test_second_1 (cost=0.00..37.75 rows=1 width=16) (actual time=0.012..0.012 rows=0 loops=1) Filter: ((enum_col = ANY ('{FIRST,SECOND}'::test_enum[])) AND (timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone)) -> Seq Scan on test_second_2 (cost=0.00..37.75 rows=1 width=16) (actual time=0.011..0.011 rows=0 loops=1) Filter: ((enum_col = ANY ('{FIRST,SECOND}'::test_enum[])) AND (timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone)) We could fix this by adding check constraints. But I think that it should work without adding constraints, that's why it seems like a bug to me. ------------ ALTER TABLE test_FIRST_1 ADD CONSTRAINT ck_enum_col CHECK(enum_col = 'FIRST' AND timestamp_col >='2019-01-01' AND timestamp_col < '2019-01-02'); ALTER TABLE test_FIRST_2 ADD CONSTRAINT ck_enum_col CHECK(enum_col = 'FIRST' AND timestamp_col >='2019-01-02' AND timestamp_col < '2019-01-03'); ALTER TABLE test_SECOND_1 ADD CONSTRAINT ck_enum_col CHECK(enum_col = 'SECOND' AND timestamp_col >='2019-01-01' AND timestamp_col < '2019-01-02'); ALTER TABLE test_SECOND_2 ADD CONSTRAINT ck_enum_col CHECK(enum_col = 'SECOND' AND timestamp_col >='2019-01-02' AND timestamp_col < '2019-01-03'); -- same explains EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01'; EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and enum_col = 'FIRST'; EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and enum_col IN ('FIRST', 'SECOND'); ------------ EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01'; Append (cost=0.00..66.34 rows=18 width=16) (actual time=0.034..0.034 rows=0 loops=1) -> Seq Scan on test_first_1 (cost=0.00..33.13 rows=9 width=16) (actual time=0.018..0.018 rows=0 loops=1) Filter: (timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone) -> Seq Scan on test_second_1 (cost=0.00..33.13 rows=9 width=16) (actual time=0.014..0.014 rows=0 loops=1) Filter: (timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone) EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and enum_col = 'FIRST'; Append (cost=0.00..37.76 rows=1 width=16) (actual time=0.013..0.013 rows=0 loops=1) -> Seq Scan on test_first_1 (cost=0.00..37.75 rows=1 width=16) (actual time=0.013..0.013 rows=0 loops=1) Filter: ((timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone) AND (enum_col = 'FIRST'::test_enum)) EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and enum_col IN ('FIRST', 'SECOND'); Append (cost=0.00..75.51 rows=2 width=16) (actual time=0.037..0.037 rows=0 loops=1) -> Seq Scan on test_first_1 (cost=0.00..37.75 rows=1 width=16) (actual time=0.025..0.025 rows=0 loops=1) Filter: ((enum_col = ANY ('{FIRST,SECOND}'::test_enum[])) AND (timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone)) -> Seq Scan on test_second_1 (cost=0.00..37.75 rows=1 width=16) (actual time=0.010..0.010 rows=0 loops=1) Filter: ((enum_col = ANY ('{FIRST,SECOND}'::test_enum[])) AND (timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone)) ####################################################### BONUS NOTE: In Postgres version 10.2 (we migrated from 10.2 to 11.2), partition pruning were better in cases where we used no enum_col or multiple enum_col values in where clause: Only 2 partitions, instead of all four. Timestamp clause is still respected EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01'; Append (cost=0.00..66.25 rows=18 width=16) (actual time=0.011..0.011 rows=0 loops=1) -> Seq Scan on test_first_1 (cost=0.00..33.12 rows=9 width=16) (actual time=0.007..0.007 rows=0 loops=1) Filter: (timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone) -> Seq Scan on test_second_1 (cost=0.00..33.12 rows=9 width=16) (actual time=0.001..0.001 rows=0 loops=1) Filter: (timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone) Two partitions, instead of 1, this was fixed in https://www.postgresql.org/message-id/2b20c220-50cb-3fae-da57-92e8cb3675dd%40lab.ntt.co.jp Timestamp clause is respected in both versions EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and enum_col = 'FIRST'; Append (cost=0.00..75.50 rows=2 width=16) (actual time=0.007..0.007 rows=0 loops=1) -> Seq Scan on test_first_1 (cost=0.00..37.75 rows=1 width=16) (actual time=0.004..0.004 rows=0 loops=1) Filter: ((timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone) AND (enum_col = 'FIRST'::test_enum)) -> Seq Scan on test_second_1 (cost=0.00..37.75 rows=1 width=16) (actual time=0.002..0.002 rows=0 loops=1) Filter: ((timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone) AND (enum_col = 'FIRST'::test_enum)) Two partitions, instead of 4, timestamp clause is still respected. EXPLAIN ANALYZE SELECT * FROM test WHERE timestamp_col = '2019-01-01' and enum_col IN ('FIRST', 'SECOND'); Append (cost=0.00..75.50 rows=2 width=16) (actual time=0.004..0.004 rows=0 loops=1) -> Seq Scan on test_first_1 (cost=0.00..37.75 rows=1 width=16) (actual time=0.003..0.003 rows=0 loops=1) Filter: ((enum_col = ANY ('{FIRST,SECOND}'::test_enum[])) AND (timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone)) -> Seq Scan on test_second_1 (cost=0.00..37.75 rows=1 width=16) (actual time=0.001..0.001 rows=0 loops=1) Filter: ((enum_col = ANY ('{FIRST,SECOND}'::test_enum[])) AND (timestamp_col = '2019-01-01 00:00:00'::timestamp without time zone))
pgsql-bugs by date: