Re: Condition variable live lock - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Condition variable live lock
Date
Msg-id CAEepm=1_S2Ly3Q53yViq29RVJmvaUw8hXs5_ekg_E1uHrNtXGQ@mail.gmail.com
Whole thread Raw
In response to Condition variable live lock  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: Condition variable live lock
List pgsql-hackers
On Fri, Dec 22, 2017 at 4:46 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
>         while (ConditionVariableSignal(cv))
>                 ++nwoken;
>
> The problem is that another backend can be woken up, determine that it
> would like to wait for the condition variable again, and then get
> itself added to the back of the wait queue *before the above loop has
> finished*, so this interprocess ping-pong isn't guaranteed to
> terminate.  It seems that we'll need something slightly smarter than
> the above to avoid that.

Here is one way to fix it: track the wait queue size and use that
number to limit the wakeup loop.  See attached.

That's unbackpatchable though, because it changes the size of struct
ConditionVariable, potentially breaking extensions compiled against an
earlier point release.  Maybe this problem won't really cause problems
in v10 anyway?  It requires a particular interaction pattern that
barrier.c produces but more typical client code might not: the awoken
backends keep re-adding themselves because they're waiting for
everyone (including the waker) to do something, but the waker is stuck
in that broadcast loop.

Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: "Bossart, Nathan"
Date:
Subject: Re: BUG #14941: Vacuum crashes
Next
From: Craig Ringer
Date:
Subject: Re: The pg_indent on on ftp is outdated