Re: BUG #17949: Adding an index introduces serialisation anomalies. - Mailing list pgsql-bugs
From | Dmitry Dolgov |
---|---|
Subject | Re: BUG #17949: Adding an index introduces serialisation anomalies. |
Date | |
Msg-id | 20230624135910.zufuollwp57otawo@ddolgov.remote.csb Whole thread Raw |
In response to | Re: BUG #17949: Adding an index introduces serialisation anomalies. (Dmitry Dolgov <9erthalion6@gmail.com>) |
Responses |
Re: BUG #17949: Adding an index introduces serialisation anomalies.
|
List | pgsql-bugs |
> On Fri, Jun 23, 2023 at 04:05:42PM +0200, Dmitry Dolgov wrote: > > I spent some more time trying to grok this today. FTR it reproduces > > faster without the extra tuple that repro I posted inserts after > > TRUNCATE (the point of that was to find out whether it was an > > empty-to-non-empty transition). I still don't know what's wrong but I > > am beginning to suspect the "fast" code. It seems as though, under > > high concurrency, we sometimes don't scan a recently inserted > > (invisible to our snapshot, but needed for SSI checks) tuple, but I > > don't yet know why. > > Yep, it's definitely something in the "fast" path. Testing the same, but > with an index having (fastupdate=off) works just fine for me. I've managed to resolve it, or at least reduce the chances for the issue to appear, via semi-randomly adding more CheckForSerializableConflictIn / PredicateLock around the new sublist that has to be created in ginHeapTupleFastInsert. I haven't seen the reproducer failing with this changeset after running it multiple times for a couple of minutes, where on the main branch, with the two fixes from Thomas included, it was failing within a couple of seconds. diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c @@ -198,6 +199,7 @@ makeSublist(Relation index, IndexTuple *tuples, int32 ntuples, /* * Write last page */ + CheckForSerializableConflictIn(index, NULL, BufferGetBlockNumber(curBuffer)); res->tail = BufferGetBlockNumber(curBuffer); res->tailFreeSize = writeListPage(index, curBuffer, @@ -273,6 +275,11 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector) separateList = true; LockBuffer(metabuffer, GIN_UNLOCK); } + else + { + CheckForSerializableConflictIn(index, NULL, metadata->head); + } } diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c @@ -140,7 +140,7 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack *stack, * Predicate lock entry leaf page, following pages will be locked by * moveRightIfItNeeded() */ - PredicateLockPage(btree->index, stack->buffer, snapshot); + PredicateLockPage(btree->index, BufferGetBlockNumber(stack->buffer), snapshot); for (;;) { @@ -1925,6 +1926,8 @@ gingetbitmap(IndexScanDesc scan, TIDBitmap *tbm) /* * Set up the scan keys, and check for unsatisfiable query. */ + PredicateLockRelation(scan->indexRelation, scan->xs_snapshot); ginFreeScanKeys(so); /* there should be no keys yet, but just to be Now the last PredicateLockRelation does look rather weird. But without it, or with this locking happening in scanPendingInsert (instead of locking the meta page), or without other changes in ginfast.c, the reproducer is still failing. This of course makes it not a proper solution by any mean, but hopefully it will help to understand the problem a bit better.
pgsql-bugs by date: