Re: inconsistent results querying table partitioned by date - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: inconsistent results querying table partitioned by date |
Date | |
Msg-id | 12600.1558034735@sss.pgh.pa.us Whole thread Raw |
In response to | Re: inconsistent results querying table partitioned by date (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: inconsistent results querying table partitioned by date
Re: inconsistent results querying table partitioned by date Re: inconsistent results querying table partitioned by date |
List | pgsql-bugs |
I wrote: > Anyway, I've pushed David's patch now, on to look at Amit's. So, the changes in gen_prune_steps_from_opexps seem okay, but the change in perform_pruning_base_step is just wrong, as can be seen from this extended form of the test case: create table mc3p (a int, b int, c int) partition by range (a, abs(b), c); create table mc3p1 partition of mc3p for values from (1, 1, 1) to (1, 1, 10); create table mc3p2 partition of mc3p for values from (1, 1, 10) to (2, minvalue, minvalue); create table mc3p3 partition of mc3p for values from (2, minvalue, minvalue) to (3, maxvalue, maxvalue); insert into mc3p values (1, 1, 1), (1,1,10), (2, 1, 1); set plan_cache_mode = force_generic_plan; prepare init_exec_prune_q1 as select * from mc3p where a = $1 and abs(b) <= $2 and c < (select 13); explain (analyze, costs off, summary off, timing off) execute init_exec_prune_q1 (1,2); deallocate init_exec_prune_q1; reset plan_cache_mode; drop table mc3p; This should of course find the (1,1,1) row, but it does not because it converts the initial pruning condition to be "a = 1 and abs(b) = 2", whereas the correct thing would be "a = 1 and abs(b) <= 2". It seems to me that the correct fix, rather than forcing equality strategy when you've missed out some of the quals, is to back up and use the strategy of the last clause you did use. But unless I'm missing something, that information is just not available from the PartitionPruneStepOp data structure. We could extend the PartitionPruneStepOp struct to have an array or list of StrategyNumbers (but I'm unsure about how safe that'd be to backpatch into v11). Another idea is that we ought to generate separate partitioning steps lists for the initial-pruning and runtime-pruning cases, ie invoke gen_partprune_steps yet a third time. Ugh (and that still means a data structure change, just in a different place). The bottom line here is that we've got an inflexible data structure that was designed on the assumption that gen_partprune_steps has all the intelligence, and trying to adjust things later is just trouble. I think this is really quite closely tied to my previous complaints about the design for plan-time versus run-time pruning, just mutatis mutandis. The lack of any holistic, principled approach to which-quals-apply-when is what's biting us here. Moreover, now that I understand what's going on here a bit better, I am deathly afraid that there are yet other bugs hiding in this same area. What we've basically got is that analyze_partkey_exprs and perform_pruning_base_step are trying to reverse-engineer what gen_partprune_steps would have done differently if it used only the quals not containing exec params. They don't really have enough information available to do that correctly, and even if they did, the fact that the logic looks nothing like the same is going to be a fertile source of bugs and maintenance gotchas. In other words, as I compose this I'm talking myself into the idea that invoking gen_partprune_steps three times (for planner, executor startup, and executor runtime, with corresponding filters on which clauses can be used) is the only safe near-term fix. In the future we can look at cutting down the repetitive costs that entails. Thoughts? regards, tom lane
pgsql-bugs by date: