Re: inconsistent results querying table partitioned by date - Mailing list pgsql-bugs
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: inconsistent results querying table partitioned by date |
Date | |
Msg-id | 20190510.153337.236910404.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: inconsistent results querying table partitioned by date (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>) |
Responses |
Re: inconsistent results querying table partitioned by date
|
List | pgsql-bugs |
At Fri, 10 May 2019 14:37:34 +0900, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote in <49cd5235-cbe6-e686-8014-85c1e45efe56@lab.ntt.co.jp> > > because timestamp-against-timestamptz comparison is inherently only > > stable; the pruning code is way exceeding its authority by supposing > > that a comparison that holds at plan time will hold at runtime, > > even with a constant comparison value. > > I looked into it and the problem is not really that plan-time pruning is > comparing stable expressions against partition bounds. If it had, it > wouldn't have pruned dataid_201902 anyway, because its bounding range for > datadatetime is '2019-02-01' to '2019-03-01', which is clearly less than > '2019-05-09' (Tom's localtimestamp). > > The real problem seems to be with the way range partition pruning assumes > an operator strategy to perform pruning with. Quoted query's WHERE clause > looks something like this: (firstkey = CONSTANT AND secondkey < > STABLE_EXPRESSION). From this set of clauses, a (CONSTANT, > STABLE_EXPRESSION) tuple is formed to be compared against partition bounds > using row-comparison-like semantics. As things stand today, the actual > comparison function (partprune.c: get_matching_range_bounds()) receives > only the strategy of the last expression, which in this case is that of a > LESS operator. When the tuple (CONSTANT, STABLE_EXPRESSION) passes > through the last step to extract Datum values to be passed to > get_matching_range_bounds, it's correctly determined that > STABLE_EXPRESSION cannot be computed during planning and so the tuple is > truncated to just (CONSTANT-Datum), but the strategy to assume during > pruning is still that of the LESS operator, whereas now it should really > be EQUAL. With LESS semantics, get_matching_range_bounds() concludes that > no partition bounds are smaller than 1 (extracted from id=1 in the above > query), except the default partition, so it prunes 'dataid_201902'. I concluded the same. > I've attached a patch to fix that. Actually, I've attached two patches -- > the 1st one adds a test for the misbehaving case with *wrong* output > wherein a partition is incorrectly pruned, and the 2nd actually fixes the > bug and updates the output of the test added by the 1st patch. Divided > the patch this way just to show the bug clearly. But this seems a bit wrong. If the two partition keys were in reverse order, pruning still fails. CREATE TABLE dataid2 ( datadatetime timestamp without time zone NOT NULL, id integer not null, CONSTRAINT dataid2_pkey PRIMARY KEY (datadatetime, id) ) PARTITION BY RANGE (datadatetime, id); CREATE TABLE dataid2_201902 PARTITION OF dataid2 FOR VALUES FROM ('2019-02-01 00:00:00', 1) TO ('2019-03-01 00:00:00', 1); CREATE TABLE dataid2_default PARTITION OF dataid2 DEFAULT; insert into dataid2 values ('2019-02-24T00:00:00', 1); select * from dataid2 where id = 1 and datadatetime < (('2019-02-26T00:00:00'::timestamp::timestamp at time zone 'America/New_York'+ '2 days'::interval) at time zone 'UTC'); datadatetime | id --------------+---- (0 rows) This is wrong. The condition is divided into two part (id = 1) and (datadatetime < ..) and the latter reduces to nothing and the former remains unchanged. Pruning continues using id = 1 and (I suppose) but that is not partition_range_datum_bsearch()'s assumption. As the result all partitions (other than default) are gone. In passing I found a typo while looking this issue. | case BTLessStrategyNumber: | | /* | * Look for the greatest bound that is < or <= lookup value and | * set minoff to its offset. I think the "minoff" is typo of "maxoff". > > Seems to be equally broken in v11 and HEAD. I didn't try v10. > > v10 is fine, as it uses constraint exclusion. > > Attached patches apply to both v11 and HEAD. Mmm. This doesn't apply on head on my environment. > patching file src/test/regress/expected/partition_prune.out > Hunk #1 FAILED at 951. git rev-parse --short HEAD d0bbf871ca regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-bugs by date: