Re: Fix missing EvalPlanQual recheck for TID scans - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Fix missing EvalPlanQual recheck for TID scans |
Date | |
Msg-id | CAApHDvoHYhDn2UbwvCbuu_BXUkA2wf3RS+ZNxgzCHXkZ5_07FQ@mail.gmail.com Whole thread Raw |
In response to | Fix missing EvalPlanQual recheck for TID scans ("Sophie Alpert" <pg@sophiebits.com>) |
Responses |
Re: Fix missing EvalPlanQual recheck for TID scans
|
List | pgsql-hackers |
On Mon, 8 Sept 2025 at 10:31, Sophie Alpert <pg@sophiebits.com> wrote: > I learned that the TID Scan node does not properly implement EvalPlanQual recheck, which has already been noted in commentadded in 2002 (commit 6799a6ca21). This does seem to have the potential to manifest in observable and surprising wayswhen queries are filtering on `ctid`, like for optimistic concurrency control as in https://stackoverflow.com/q/78487441. Thanks for the report and the patch. I'll summarise this issue, as you provided an external link which might one day disappear: drop table if exists test_optimistic_lock; create table public.test_optimistic_lock ( id bigserial primary key, name varchar, rest_count integer, update_user varchar, update_time timestamp without time zone, create_time timestamp without time zone ); insert into public.test_optimistic_lock (name, rest_count, update_time, create_time) values ('Product1', 1, current_timestamp, current_timestamp); S1: begin; S1: select ctid, rest_count from public.test_optimistic_lock where name = 'Product1'; S2: begin; S2: select ctid, rest_count from public.test_optimistic_lock where name = 'Product1'; S1: update public.test_optimistic_lock set rest_count = rest_count - 1, update_user = 'A', update_time = current_timestamp where name = 'Product1' and ctid = '(0,1)'; S2: update public.test_optimistic_lock set rest_count = rest_count - 1, update_user = 'A', update_time = current_timestamp where name = 'Product1' and ctid = '(0,1)'; -- waits for S1 S1: commit: S2: Gets UPDATE 1 when it should get UPDATE 0!! I agree that this is a bug and we should fix it. The behaviour should match what you'd get if you ran it with "SET enable_tidscan = 0;". When that's done, the Seq Scan will have the ctid related expressions within the qual and it will correctly filter out the non-matching tuple. > Attached is an attempt from me at fixing this issue for TID Scan as well as TID Range Scan which appears to have the sameissue. I think this is generally pretty good. Here's a quick review: 1. This part is quite annoying: + if (node->tss_TidList == NULL) + TidListEval(node); Of course, it's required since ExecReScanTidScan() wiped out that list. Maybe we can add a boolean variable named tts_inScan to TidScanState after tss_isCurrentOf (so as it consumes the existing padding bytes and does not enlarge that struct), and have ExecReScanTidScan() set that to false rather than wipe out the tss_TidList. Then just reuse the existing list in TidRecheck(). That should save on the extra overhead of having to rebuild the list each time there's an EPQ recheck. 2. For TidRangeRecheck, I don't see why this part is needed: + if (!TidRangeEval(node)) + return false; The TID range is preserved already and shouldn't need to be recalculated. 3. In TidRecheck(), can you make "match" an "ItemPointer" and do: match = (ItemPointer) bsearch(...); 4. Can you put this comment above the "if"? + if (node->tss_isCurrentOf) + /* WHERE CURRENT OF always intends to resolve to the latest tuple */ + return true; What you've got there doesn't meet the style we use. Technically there should be braces because there are multiple lines (per our style (not C)), but it's a bit needless to do that as the comment makes the same amount of sense if it goes before the "if". 5. Can you make TidRangeRecheck() look like this? static bool TidRangeRecheck(TidRangeScanState *node, TupleTableSlot *slot) { /* Recheck the ctid is still within range */ if (ItemPointerCompare(&slot->tts_tid, &node->trss_mintid) < 0 || ItemPointerCompare(&slot->tts_tid, &node->trss_maxtid) > 0) return false; return true; } The reason being that it's closer to how heap_getnextslot_tidrange() does it, and I'd rather the conditions were kept as similar as possible to that function. 6. For the tests. It should be ok to make the Tid range scan test do ctid BETWEEN '(0,1)' AND '(0,1)'. I feel this is more aligned to the TID Range Scan version of what you're doing in the TID Scan test. I'll need to give the tests a closer inspection later. David
pgsql-hackers by date: