Bad execution plan when joining partitioned tables - Mailing list pgsql-sql
From | Richard Schmidlechner |
---|---|
Subject | Bad execution plan when joining partitioned tables |
Date | |
Msg-id | b28b04ea-db37-4429-b7df-f1324ce22363@elisanet.fi Whole thread Raw |
List | pgsql-sql |
Hi, I am trying to apply table partitioning in our software solution. When joining two partitioned tables the execution plan calculates an unreasonable high number of estimated rows which results in unsuitable execution plans (running for hours) if more tables are joined in the statement. The problem can be reproduced using the following data CREATE TABLE a ( partition_key int, id int, key text ) PARTITION BY HASH( partition_key ); CREATE TABLE a_0 PARTITION OF a ( PRIMARY KEY( id ) ) FOR VALUES WITH ( MODULUS 2, REMAINDER 0 ); CREATE TABLE a_1 PARTITION OF a ( PRIMARY KEY( id ) ) FOR VALUES WITH ( MODULUS 2, REMAINDER 1 ); CREATE UNIQUE INDEX ON a_0( key ); CREATE UNIQUE INDEX ON a_1( key ); CREATE TABLE b ( partition_key int, id int ) PARTITION BY HASH( partition_key ); CREATE TABLE b_0 PARTITION OF b ( PRIMARY KEY( id ) ) FOR VALUES WITH ( MODULUS 2, REMAINDER 0 ); CREATE TABLE b_1 PARTITION OF b ( PRIMARY KEY( id ) ) FOR VALUES WITH ( MODULUS 2, REMAINDER 1 ); INSERT INTO a select 0, id, MD5( id::text ) from generate_series( 1, 1000000 ) id; INSERT INTO b select 0, id from generate_series( 1, 1000000 ) id; and using the following SQL statement which uses a unique index to lookup a row in table 'a' which is then joined with table 'b' using the primary key of the relevant partition (for the given partition key): EXPLAIN SELECT * FROM a JOIN b ON b.partition_key = a.partition_key AND b.id = a.id WHERE a.partition_key = 0 AND a.key = MD5( '8' ) Nested Loop (cost=0.85..5.30 rows=5000 width=49) -> Index Scan using a_0_key_idx on a_0 a (cost=0.42..2.65 rows=1 width=41) Index Cond: (key = 'c9f0f895fb98ab9159f51fd0297e236d'::text) Filter: (partition_key = 0) -> Index Scan using b_0_pkey on b_0 b (cost=0.42..2.65 rows=1 width=8) Index Cond: (id = a.id) Filter: (partition_key = 0) The estimated number of rows is 5000 when it should be 1. The 'a.partition_key = 0' condition allows PostgreSQL to pick the relevant partitions (a_0 and b_0) in the execution plan and the row estimates for those tables are 1 (there are unique indexes!) If only this statement needed to be executed then it wouldn't be a problem that the estimated number of rows is wrong as the execution plan of this statement is fine. However, in my case the real tables are much bigger (more than 100 million rows) and the number of estimates rows is astronomic high leading to full table scans for additional tables which would be joined. Also adding unique indexes (including also the partition_key column) on the partitioned base tables a and b doesn't change anything. It is also strange that when removing the 'a.partition_key = 0' condition the estimated number of rows is much smaller (50) Nested Loop (cost=0.85..14.02 rows=50 width=48) -> Append (cost=0.42..5.02 rows=2 width=40) -> Index Scan using a_0_key_idx on a_0 a_1 (cost=0.42..2.64 rows=1 width=41) Index Cond: (key = 'c9f0f895fb98ab9159f51fd0297e236d'::text) -> Index Scan using a_1_key_idx on a_1 a_2 (cost=0.15..2.37 rows=1 width=40) Index Cond: (key = 'c9f0f895fb98ab9159f51fd0297e236d'::text) -> Append (cost=0.42..4.48 rows=2 width=8) -> Index Scan using b_0_pkey on b_0 b_1 (cost=0.42..2.65 rows=1 width=8) Index Cond: (id = a.id) Filter: (a.partition_key = partition_key) -> Index Scan using b_1_pkey on b_1 b_2 (cost=0.15..1.83 rows=1 width=8) Index Cond: (id = a.id) Filter: (a.partition_key = partition_key) Any help is very appreciated. Richard Schmidlechner