Re: BUG #15947: Worse plan is chosen after giving the planner more freedom (partitionwise join) - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: BUG #15947: Worse plan is chosen after giving the planner more freedom (partitionwise join) |
Date | |
Msg-id | 786.1565541557@sss.pgh.pa.us Whole thread Raw |
In response to | BUG #15947: Worse plan is chosen after giving the planner more freedom (partitionwise join) (PG Bug reporting form <noreply@postgresql.org>) |
Responses |
Re: BUG #15947: Worse plan is chosen after giving the planner morefreedom (partitionwise join)
|
List | pgsql-bugs |
PG Bug reporting form <noreply@postgresql.org> writes: > After creating of the tables below: > ... > ANALYZE sg_19_01_d, sg_19_02_d, sc_19_01_d, sc_19_02_d; Hm, you made a tactical error there: you should have done ANALYZE sg, sc; ie analyze the parent partitioned tables not the partitions. As you have it, we never compute inherited stats across the whole partitioned tables, which leads to wrong estimates about the join size and hence cost: Aggregate (cost=147.25..147.26 rows=1 width=8) (actual time=9.934..9.934 rows=1 loops=1) -> Hash Join (cost=80.00..146.62 rows=250 width=0) (actual time=6.204..9.632 rows=2000 loops=1) ... After analyzing the parents it's much better: Aggregate (cost=195.00..195.01 rows=1 width=8) (actual time=6.199..6.200 rows=1 loops=1) -> Hash Semi Join (cost=88.00..190.00 rows=2000 width=0) (actual time=2.586..5.885 rows=2000 loops=1) ... The size and cost estimates for partitionwise-join paths are made by adding up the per-partition sizes/costs, so that those are a lot closer to being correct even without any parent-level stats: Aggregate (cost=175.00..175.01 rows=1 width=8) (actual time=4.562..4.563 rows=1 loops=1) -> Append (cost=39.00..170.00 rows=2000 width=0) (actual time=1.105..4.327 rows=2000 loops=1) ... So *if* you have parent-level stats in place, this example works well: the partitionwise join is estimated as cheaper than the other way, and that estimate is correct, and all is great. But when you don't, why doesn't it seize on the incorrectly-estimated- as-cheaper non-partitioned join? The answer seems to be that apply_scanjoin_target_to_paths throws away all the non-partitioned paths, if the join is partitioned. It claims that * If the rel is partitioned, we want to drop its existing paths and * generate new ones. This function would still be correct if we kept the * existing paths: we'd modify them to generate the correct target above * the partitioning Append, and then they'd compete on cost with paths * generating the target below the Append. However, in our current cost * model the latter way is always the same or cheaper cost, so modifying * the existing paths would just be useless work. but evidently that's not really true when considering partitioned vs non-partitioned joins. It holds only if cost estimates that are made in very different ways are comparable. I'm inclined to think that the partitioned estimates are more trustworthy than the non-partitioned ones, so maybe we should leave things as they stand for now. Still, this is another point that's causing me to form an increasingly hardened conviction that apply_scanjoin_target_to_paths needs to be nuked from orbit. It's been nothing but trouble since it was committed, which is unsurprising considering how many planner structural conventions it tramples on. I'm not very sure what a better way would look like though :-(. Another point that this example raises is that letting autoanalyze ignore partition parent tables is not just a minor problem, but a potentially serious bug, causing very poor plan choices to be made for any nontrivial queries involving partitioned tables. I don't see us doing anything about either of these points in the very short term, and certainly not back-patching any changes into released branches. But we ought to think about fixes going forward. regards, tom lane
pgsql-bugs by date: