Home > mailing lists

Re: BUG #18815: Logical replication worker Segmentation fault - Mailing list pgsql-bugs

From	Tom Lane
Subject	Re: BUG #18815: Logical replication worker Segmentation fault
Date	February 18 02:37:56
Msg-id	1072645.1739835476@sss.pgh.pa.us Whole thread Raw
In response to	Re: BUG #18815: Logical replication worker Segmentation fault (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: BUG #18815: Logical replication worker Segmentation fault Re: BUG #18815: Logical replication worker Segmentation fault
List	pgsql-bugs

Tree view

I wrote:
> Further to this ... I'd still really like to have a reproducer.
> While brininsertcleanup is clearly being less robust than it should
> be, I now suspect that there is another bug somewhere further down
> the call stack.  We're getting to this point via ExecCloseIndices,
> and that should be paired with ExecOpenIndices, and that would have
> created a fresh IndexInfo.  So it looks a lot like some path in a
> logrep worker is able to call ExecCloseIndices twice on the same
> working data.  That would probably lead to a "releasing a lock you
> don't own" error if we weren't hitting this crash first.

Hmm ... I tried modifying ExecCloseIndices to blow up if called
twice, as in the attached.  This gets through core regression
just fine, but it blows up in three different subscription TAP
tests, all with a stack trace matching Sergey's:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f064bfe3e65 in __GI_abort () at abort.c:79
#2  0x00000000009e9253 in ExceptionalCondition (
    conditionName=conditionName@entry=0xb8717b "indexDescs[i] != NULL", 
    fileName=fileName@entry=0xb87139 "execIndexing.c", 
    lineNumber=lineNumber@entry=249) at assert.c:66
#3  0x00000000006f0b13 in ExecCloseIndices (
    resultRelInfo=resultRelInfo@entry=0x2f11c18) at execIndexing.c:249
#4  0x00000000006f86d8 in ExecCleanupTupleRouting (mtstate=0x2ef92d8, 
    proute=0x2ef94e8) at execPartition.c:1273
#5  0x0000000000848cb6 in finish_edata (edata=0x2ef8f50) at worker.c:717
#6  0x000000000084d0a0 in apply_handle_insert (s=<optimized out>)
    at worker.c:2460
#7  apply_dispatch (s=<optimized out>) at worker.c:3389
#8  0x000000000084e494 in LogicalRepApplyLoop (last_received=25066600)
    at worker.c:3680
#9  start_apply (origin_startpos=0) at worker.c:4507
#10 0x000000000084e711 in run_apply_worker () at worker.c:4629
#11 ApplyWorkerMain (main_arg=<optimized out>) at worker.c:4798
#12 0x00000000008138f9 in BackgroundWorkerMain (startup_data=<optimized out>, 
    startup_data_len=<optimized out>) at bgworker.c:842

The problem seems to be that apply_handle_insert_internal does
ExecOpenIndices and then ExecCloseIndices, and then
ExecCleanupTupleRouting does ExecCloseIndices again, which nicely
explains why brininsertcleanup blows up if you happen to have a BRIN
index involved.  What it doesn't explain is how come we don't see
other symptoms from the duplicate index_close calls, regardless of
index type.  I'd have expected an assertion failure from
RelationDecrementReferenceCount, and/or an assertion failure for
nonzero rd_refcnt at transaction end, and/or a "you don't own a lock
of type X" gripe from LockRelease.  We aren't getting any of those,
but why not, if this code is as broken as I think it is?

(On closer inspection, we seem to have about 99% broken relcache.c's
ability to notice rd_refcnt being nonzero at transaction end, but
the other two things should still be happening.)

            regards, tom lane

diff --git a/src/backend/executor/execIndexing.c b/src/backend/executor/execIndexing.c
index 7c87f012c3..a264a2edbc 100644
--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -246,14 +246,15 @@ ExecCloseIndices(ResultRelInfo *resultRelInfo)

     for (i = 0; i < numIndices; i++)
     {
-        if (indexDescs[i] == NULL)
-            continue;            /* shouldn't happen? */
+        Assert(indexDescs[i] != NULL);

         /* Give the index a chance to do some post-insert cleanup */
         index_insert_cleanup(indexDescs[i], indexInfos[i]);

         /* Drop lock acquired by ExecOpenIndices */
         index_close(indexDescs[i], RowExclusiveLock);
+
+        indexDescs[i] = NULL;
     }

     /*

pgsql-bugs by date:

From: "David G. Johnston"
Date: 18 February, 02:12:39
Subject: Re: BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE

From: Chris BSomething
Date: 18 February, 02:39:31
Subject: Re: BUG #18594: CASE WHEN ELSE failing to return the expected output when the same colum is used in WHEN and ELSE

Re: BUG #18815: Logical replication worker Segmentation fault - Mailing list pgsql-bugs

Previous

Next