Home > mailing lists

Bad execution plan when joining partitioned tables - Mailing list pgsql-sql

From	Richard Schmidlechner
Subject	Bad execution plan when joining partitioned tables
Date	November 24, 2023 19:23:44
Msg-id	b28b04ea-db37-4429-b7df-f1324ce22363@elisanet.fi Whole thread Raw
List	pgsql-sql

Tree view

Hi,

I am trying to apply table partitioning in our software solution.
When joining two partitioned tables the execution plan calculates an 
unreasonable high number of estimated rows which results in unsuitable 
execution plans (running for hours) if more tables are joined in the 
statement.

The problem can be reproduced using the following data

CREATE TABLE a ( partition_key int, id int, key text ) PARTITION BY 
HASH( partition_key );
CREATE TABLE a_0 PARTITION OF a ( PRIMARY KEY( id ) ) FOR VALUES WITH ( 
MODULUS 2, REMAINDER 0 );
CREATE TABLE a_1 PARTITION OF a ( PRIMARY KEY( id ) ) FOR VALUES WITH ( 
MODULUS 2, REMAINDER 1 );
CREATE UNIQUE INDEX ON a_0( key );
CREATE UNIQUE INDEX ON a_1( key );

CREATE TABLE b ( partition_key int, id int ) PARTITION BY HASH( 
partition_key );
CREATE TABLE b_0 PARTITION OF b ( PRIMARY KEY( id ) ) FOR VALUES WITH ( 
MODULUS 2, REMAINDER 0 );
CREATE TABLE b_1 PARTITION OF b ( PRIMARY KEY( id ) ) FOR VALUES WITH ( 
MODULUS 2, REMAINDER 1 );

INSERT INTO a select 0, id, MD5( id::text ) from generate_series( 1, 
1000000 ) id;
INSERT INTO b select 0, id from generate_series( 1, 1000000 ) id;

and using the following SQL statement which uses a unique index to 
lookup a row in table 'a' which is then joined with table 'b' using the 
primary key of the relevant partition (for the given partition key):

EXPLAIN
SELECT * FROM a JOIN b ON
   b.partition_key = a.partition_key AND
   b.id = a.id
WHERE
   a.partition_key = 0 AND
   a.key = MD5( '8' )


Nested Loop  (cost=0.85..5.30 rows=5000 width=49)
   ->  Index Scan using a_0_key_idx on a_0 a  (cost=0.42..2.65 rows=1 
width=41)
         Index Cond: (key = 'c9f0f895fb98ab9159f51fd0297e236d'::text)
         Filter: (partition_key = 0)
   ->  Index Scan using b_0_pkey on b_0 b  (cost=0.42..2.65 rows=1 width=8)
         Index Cond: (id = a.id)
         Filter: (partition_key = 0)

The estimated number of rows is 5000 when it should be 1.
The 'a.partition_key = 0' condition allows PostgreSQL to pick the 
relevant partitions (a_0 and b_0) in the execution plan and the row 
estimates for those tables are 1 (there are unique indexes!)

If only this statement needed to be executed then it wouldn't be a 
problem that the estimated number of rows is wrong as the execution plan 
of this statement is fine.
However, in my case the real tables are much bigger (more than 100 
million rows) and the number of estimates rows is astronomic high 
leading to full table scans for additional tables which would be joined.
Also adding unique indexes (including also the partition_key column) on 
the partitioned base tables a and b doesn't change anything.

It is also strange that when removing the 'a.partition_key = 0' 
condition the estimated number of rows is much smaller (50)

Nested Loop  (cost=0.85..14.02 rows=50 width=48)
   ->  Append  (cost=0.42..5.02 rows=2 width=40)
         ->  Index Scan using a_0_key_idx on a_0 a_1  (cost=0.42..2.64 
rows=1 width=41)
               Index Cond: (key = 'c9f0f895fb98ab9159f51fd0297e236d'::text)
         ->  Index Scan using a_1_key_idx on a_1 a_2  (cost=0.15..2.37 
rows=1 width=40)
               Index Cond: (key = 'c9f0f895fb98ab9159f51fd0297e236d'::text)
   ->  Append  (cost=0.42..4.48 rows=2 width=8)
         ->  Index Scan using b_0_pkey on b_0 b_1  (cost=0.42..2.65 
rows=1 width=8)
               Index Cond: (id = a.id)
               Filter: (a.partition_key = partition_key)
         ->  Index Scan using b_1_pkey on b_1 b_2  (cost=0.15..1.83 
rows=1 width=8)
               Index Cond: (id = a.id)
               Filter: (a.partition_key = partition_key)

Any help is very appreciated.

Richard Schmidlechner

pgsql-sql by date:

From: "David G. Johnston"
Date: 08 November 2023, 17:30:23
Subject: Re: DEFAULT in update & prepared statements

From: Richard Schmidlechner
Date: 29 November 2023, 15:16:36
Subject: Re: Bad execution plan when joining partitioned tables

Bad execution plan when joining partitioned tables - Mailing list pgsql-sql

Previous

Next