Re: test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)
Date
Msg-id CA+TgmobnmFzXGvNXXkmw65z882ppC-rnDAOYQZHy8T2cKzb48Q@mail.gmail.com
Whole thread Raw
In response to Re: test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)
Re: test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)
List pgsql-hackers
On Mon, Sep 29, 2014 at 3:37 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Yea :(. Note how signals are blocked in all the signal handlers and only
> unblocked for a very short time (the sleep).
>
> (stare at random shit for far too long)
>
> Ah. DetermineSleepTime(), which is called while signals are unblocked!,
> modifies BackgroundWorkerList. Previously that only iterated the list,
> without modifying it. That's already of quite debatable safety, but
> modifying it without having blocked signals is most definitely
> broken. The modification was introduced by 7f7485a0c...

Ouch.  OK, yeah, that's a bug.

> If you can manually run stuff on that machine, it'd be rather helpful if
> you could put a PG_SETMASK(&BlockSig);...PG_SETMASK(&UnBlockSig); in the
> HaveCrashedWorker() loop.

I'd do it the other way around, and adjust ServerLoop to put the
PG_SETMASK calls right around pg_usleep() and select().  But why futz
with anole?  Let's just check in the fix.  It'll either fix anole or
not, but we should fix the bug you found either way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: json (b) and null fields
Next
From: Tom Lane
Date:
Subject: Re: open items for 9.4