Re: LWLock deadlock and gdb advice - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: LWLock deadlock and gdb advice
Date
Msg-id CAMkU=1zUc=h0oCZntaJaqqW7gxxVxCWsYq8DD2t7oHgsgVEsgA@mail.gmail.com
Whole thread Raw
In response to Re: LWLock deadlock and gdb advice  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
On Thu, Jul 16, 2015 at 12:03 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Both. Here's the patch.

Previously, LWLockAcquireWithVar set the variable associated with the lock atomically with acquiring it. Before the lwlock-scalability changes, that was straightforward because you held the spinlock anyway, but it's a lot harder/expensive now. So I changed the way acquiring a lock with a variable works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that the current lock holder has updated the variable. The LWLockAcquireWithVar function is gone - you now just use LWLockAcquire(), which always clears the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if you want to set the variable immediately. LWLockWaitForVar() always waits if the flag is not set, i.e. it will not return regardless of the variable's value, if the current lock-holder has not updated it yet.


I ran this for a while without casserts and it seems to work.  But with casserts, I get failures in the autovac process on the GIN index.

I don't see how this is related to the LWLock issue, but I didn't see it without your patch.  Perhaps the system just didn't survive long enough to uncover it without the patch (although it shows up pretty quickly).  It could just be an overzealous Assert, since the casserts off didn't show problems.

bt and bt full are shown below.

Cheers, 

Jeff

#0  0x0000003dcb632625 in raise () from /lib64/libc.so.6
#1  0x0000003dcb633e05 in abort () from /lib64/libc.so.6
#2  0x0000000000930b7a in ExceptionalCondition (
    conditionName=0x9a1440 "!(((PageHeader) (page))->pd_special >= (__builtin_offsetof (PageHeaderData, pd_linp)))", errorType=0x9a12bc "FailedAssertion",
    fileName=0x9a12b0 "ginvacuum.c", lineNumber=713) at assert.c:54
#3  0x00000000004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at ginvacuum.c:713

It now looks like this *is* unrelated to the LWLock issue.  The assert that it is tripping over was added just recently (302ac7f27197855afa8c) and so I had not been testing under its presence until now.  It looks like it is finding all-zero pages (index extended but then a crash before initializing the page?) and it doesn't like them.

(gdb) f 3
(gdb) p *(char[8192]*)(page)
$11 = '\000' <repeats 8191 times>

Presumably before this assert, such pages would just be permanently orphaned.

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: pg_dump quietly ignore missing tables - is it bug?
Next
From: Josh Berkus
Date:
Subject: Re: Implementation of global temporary tables?