Re: BUG #18815: Logical replication worker Segmentation fault - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: BUG #18815: Logical replication worker Segmentation fault |
Date | |
Msg-id | 1072645.1739835476@sss.pgh.pa.us Whole thread Raw |
In response to | Re: BUG #18815: Logical replication worker Segmentation fault (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: BUG #18815: Logical replication worker Segmentation fault
Re: BUG #18815: Logical replication worker Segmentation fault |
List | pgsql-bugs |
I wrote: > Further to this ... I'd still really like to have a reproducer. > While brininsertcleanup is clearly being less robust than it should > be, I now suspect that there is another bug somewhere further down > the call stack. We're getting to this point via ExecCloseIndices, > and that should be paired with ExecOpenIndices, and that would have > created a fresh IndexInfo. So it looks a lot like some path in a > logrep worker is able to call ExecCloseIndices twice on the same > working data. That would probably lead to a "releasing a lock you > don't own" error if we weren't hitting this crash first. Hmm ... I tried modifying ExecCloseIndices to blow up if called twice, as in the attached. This gets through core regression just fine, but it blows up in three different subscription TAP tests, all with a stack trace matching Sergey's: #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f064bfe3e65 in __GI_abort () at abort.c:79 #2 0x00000000009e9253 in ExceptionalCondition ( conditionName=conditionName@entry=0xb8717b "indexDescs[i] != NULL", fileName=fileName@entry=0xb87139 "execIndexing.c", lineNumber=lineNumber@entry=249) at assert.c:66 #3 0x00000000006f0b13 in ExecCloseIndices ( resultRelInfo=resultRelInfo@entry=0x2f11c18) at execIndexing.c:249 #4 0x00000000006f86d8 in ExecCleanupTupleRouting (mtstate=0x2ef92d8, proute=0x2ef94e8) at execPartition.c:1273 #5 0x0000000000848cb6 in finish_edata (edata=0x2ef8f50) at worker.c:717 #6 0x000000000084d0a0 in apply_handle_insert (s=<optimized out>) at worker.c:2460 #7 apply_dispatch (s=<optimized out>) at worker.c:3389 #8 0x000000000084e494 in LogicalRepApplyLoop (last_received=25066600) at worker.c:3680 #9 start_apply (origin_startpos=0) at worker.c:4507 #10 0x000000000084e711 in run_apply_worker () at worker.c:4629 #11 ApplyWorkerMain (main_arg=<optimized out>) at worker.c:4798 #12 0x00000000008138f9 in BackgroundWorkerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at bgworker.c:842 The problem seems to be that apply_handle_insert_internal does ExecOpenIndices and then ExecCloseIndices, and then ExecCleanupTupleRouting does ExecCloseIndices again, which nicely explains why brininsertcleanup blows up if you happen to have a BRIN index involved. What it doesn't explain is how come we don't see other symptoms from the duplicate index_close calls, regardless of index type. I'd have expected an assertion failure from RelationDecrementReferenceCount, and/or an assertion failure for nonzero rd_refcnt at transaction end, and/or a "you don't own a lock of type X" gripe from LockRelease. We aren't getting any of those, but why not, if this code is as broken as I think it is? (On closer inspection, we seem to have about 99% broken relcache.c's ability to notice rd_refcnt being nonzero at transaction end, but the other two things should still be happening.) regards, tom lane diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c index 7c87f012c3..a264a2edbc 100644 --- a/src/backend/executor/execIndexing.c +++ b/src/backend/executor/execIndexing.c @@ -246,14 +246,15 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo) for (i = 0; i < numIndices; i++) { - if (indexDescs[i] == NULL) - continue; /* shouldn't happen? */ + Assert(indexDescs[i] != NULL); /* Give the index a chance to do some post-insert cleanup */ index_insert_cleanup(indexDescs[i], indexInfos[i]); /* Drop lock acquired by ExecOpenIndices */ index_close(indexDescs[i], RowExclusiveLock); + + indexDescs[i] = NULL; } /*
pgsql-bugs by date:
Previous
From: "David G. Johnston"Date:
Subject: Re: BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE
Next
From: Chris BSomethingDate:
Subject: Re: BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE