Thread: Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up

Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up

From

Andres Freund

Date:

26 January 2015, 21:25:16

Hi,

dbase_redo does:    if (InHotStandby)    {        /*         * Lock database while we resolve conflicts to ensure that
      * InitPostgres() cannot fully re-execute concurrently. This         * avoids backends re-connecting automatically
tosame database,         * which can happen in some cases.         */
LockSharedObjectForSession(DatabaseRelationId,xlrec->db_id, 0, AccessExclusiveLock);
ResolveRecoveryConflictWithDatabase(xlrec->db_id);   }
 

Unfortunately that Assert()s when there's a lock conflict because
e.g. another backend is currently connecting. That's because ProcSleep()
does a enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout) - and
there's no deadlock timeout (or lock timeout) handler registered.

I'm not sure if this is broken since 8bfd1a884 introducing those session
locks, or if it's caused by the new timeout infrastructure
(f34c68f09f34c68f09).

I guess the easiest way to fix this would be to make this a loop like
ResolveRecoveryConfictWithLock():


LOCKTAG        tag;

SET_LOCKTAG_OBJECT(tag,   InvalidOid,   DatabaseRelationId,   xlrec->db_id,   objsubid);

while (!lock_acquired)
{   while (CountDBBackends(dbid) > 0)   {       CancelDBBackends(dbid, PROCSIG_RECOVERY_CONFLICT_DATABASE, true);
       /*        * Wait awhile for them to die so that we avoid flooding an        * unresponsive backend when system
isheavily loaded.        */       pg_usleep(10000);   }
 

   if (LockAcquireExtended(&locktag, AccessExclusiveLock, true, true, false)           != LOCKACQUIRE_NOT_AVAIL)
  lock_acquired = true;
 
}

afaics, that should work? Not pretty, but probably easier than starting
to reason about the deadlock detector in the startup process.

We probably should also add a Assert(!InRecovery || sessionLock) to
LockAcquireExtended() - these kind of problems are otherwise hard to
find in a developer setting.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up

From

Michael Paquier

Date:

27 January 2015, 07:23:58

On Tue, Jan 27, 2015 at 6:24 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> Unfortunately that Assert()s when there's a lock conflict because
> e.g. another backend is currently connecting. That's because ProcSleep()
> does a enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout) - and
> there's no deadlock timeout (or lock timeout) handler registered.
Yes, that could logically happen if there is a lock conflicting as
RowExclusiveLock or lower lock can be taken in recovery.

> [...]
> afaics, that should work? Not pretty, but probably easier than starting
> to reason about the deadlock detector in the startup process.
Wouldn't it be cleaner to simply register a dedicated handler in
StartupXlog instead, obviously something else than DEADLOCK_TIMEOUT as
it is reserved for backend operations? For back-branches, we may even
consider using DEADLOCK_TIMEOUT..

> We probably should also add a Assert(!InRecovery || sessionLock) to
> LockAcquireExtended() - these kind of problems are otherwise hard to
> find in a developer setting.
So this means that locks other than session ones cannot be taken while
a node is in recovery, but RowExclusiveLock can be taken while in
recovery. Don't we have a problem with this assertion then?
-- 
Michael

Re: Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up

From

Andres Freund

Date:

27 January 2015, 08:18:05

On 2015-01-27 16:23:53 +0900, Michael Paquier wrote:
> On Tue, Jan 27, 2015 at 6:24 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > Unfortunately that Assert()s when there's a lock conflict because
> > e.g. another backend is currently connecting. That's because ProcSleep()
> > does a enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout) - and
> > there's no deadlock timeout (or lock timeout) handler registered.

> Yes, that could logically happen if there is a lock conflicting as
> RowExclusiveLock or lower lock can be taken in recovery.

I don't this specific lock (it's a object, not relation lock) can easily
be taken directly by a user except during authentication.

> > [...]
> > afaics, that should work? Not pretty, but probably easier than starting
> > to reason about the deadlock detector in the startup process.

> Wouldn't it be cleaner to simply register a dedicated handler in
> StartupXlog instead, obviously something else than DEADLOCK_TIMEOUT as
> it is reserved for backend operations? For back-branches, we may even
> consider using DEADLOCK_TIMEOUT..

What would that timeout handler actually do? Two problems:
a) We really don't want the startup process be killed/error out as that  likely means a postmaster restart. So the
defaultdeadlock detector  strategy is something that's really useless for us.

b) Pretty much all other (if not actually all other) heavyweight lock  acquisitions in the startup process acquire
locksusing dontWait =  true - which means that the deadlock detector isn't actually  run. That's essentially fine
becausewe simply kill everything in our  way. C.f. StandbyAcquireAccessExclusiveLock() et al.

There's a dedicated 'deadlock detector' like infrastructure around
ResolveRecoveryConflictWithBufferPin(), but it deals with a class of
deadlocks that's not handled in the deadlock detector anyway.

I think this isn't particularly pretty, but it seems to be working well
enough, and changing it would be pretty invasive. So keeping in line
with all that code seems to be easier.

> > We probably should also add a Assert(!InRecovery || sessionLock) to
> > LockAcquireExtended() - these kind of problems are otherwise hard to
> > find in a developer setting.

> So this means that locks other than session ones cannot be taken while
> a node is in recovery, but RowExclusiveLock can be taken while in
> recovery. Don't we have a problem with this assertion then?

Note that InRecovery doesn't mean what you probably think it means:
/** Are we doing recovery from XLOG?** This is only ever true in the startup process; it should be read as meaning*
"thisprocess is replaying WAL records", rather than "the system is in* recovery mode".  It should be examined primarily
byfunctions that need* to act differently when called from a WAL redo function (e.g., to skip WAL* logging).  To check
whetherthe system is in recovery regardless of which* process you're running in, use RecoveryInProgress() but only
aftershared* memory startup and lock initialization.*/

bool        InRecovery = false;

The assertion actually should be even stricter:
Assert(InRecovery || (sessionLock && dontWait));
i.e. we never acquire a heavyweight lock in the startup process unless
it's a session lock (as we don't have resource managers/a xact to track
locks) and we don't wait (as we don't have the deadlock detector
infrastructure set up).

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up

From

Michael Paquier

Date:

28 January 2015, 07:41:43

Andres Freund wrote:
> I think this isn't particularly pretty, but it seems to be working well
> enough, and changing it would be pretty invasive. So keeping in line
> with all that code seems to be easier.
OK, I'm convinced with this part to remove the call of
LockSharedObjectForSession that uses dontWait and replace it by a loop
in ResolveRecoveryConflictWithDatabase.

> Note that InRecovery doesn't mean what you probably think it means:
> [stuff]
> bool            InRecovery = false;
Yes, right. I misunderstood with RecoveryInProgress().

> The assertion actually should be even stricter:
> Assert(!InRecovery || (sessionLock && dontWait));
> i.e. we never acquire a heavyweight lock in the startup process unless
> it's a session lock (as we don't have resource managers/a xact to track
> locks) and we don't wait (as we don't have the deadlock detector
> infrastructure set up).
No problems with this assertion here.
-- 
Michael

Re: Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up

From

Robert Haas

Date:

29 January 2015, 16:02:02

On Wed, Jan 28, 2015 at 2:41 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Andres Freund wrote:
>> I think this isn't particularly pretty, but it seems to be working well
>> enough, and changing it would be pretty invasive. So keeping in line
>> with all that code seems to be easier.
> OK, I'm convinced with this part to remove the call of
> LockSharedObjectForSession that uses dontWait and replace it by a loop
> in ResolveRecoveryConflictWithDatabase.

That seems right to me, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up

From

Andres Freund

Date:

30 January 2015, 20:34:53

On 2015-01-29 11:01:51 -0500, Robert Haas wrote:
> On Wed, Jan 28, 2015 at 2:41 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > Andres Freund wrote:
> >> I think this isn't particularly pretty, but it seems to be working well
> >> enough, and changing it would be pretty invasive. So keeping in line
> >> with all that code seems to be easier.
> > OK, I'm convinced with this part to remove the call of
> > LockSharedObjectForSession that uses dontWait and replace it by a loop
> > in ResolveRecoveryConflictWithDatabase.
>
> That seems right to me, too.

It's slightly more complicated than that. The lock conflict should
actually be resolved using ResolveRecoveryConflictWithLock()... That,
combined with the race of connecting a actually already deleted database
(see the XXXs I removed) seem to make the approach in here.

Attached are two patches:
1) Infrastructure for attaching more kinds of locks on the startup
   process.
2) Use that infrastructure for database locks during replay.

I'm not sure 2) alone would be sufficient justification for 1), but the
nearby thread about basebackups also require similar infrastructure...

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up

From

Michael Paquier

Date:

03 February 2015, 05:18:09

On Sat, Jan 31, 2015 at 5:34 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2015-01-29 11:01:51 -0500, Robert Haas wrote:
>> On Wed, Jan 28, 2015 at 2:41 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>> > Andres Freund wrote:
>> >> I think this isn't particularly pretty, but it seems to be working well
>> >> enough, and changing it would be pretty invasive. So keeping in line
>> >> with all that code seems to be easier.
>> > OK, I'm convinced with this part to remove the call of
>> > LockSharedObjectForSession that uses dontWait and replace it by a loop
>> > in ResolveRecoveryConflictWithDatabase.
>>
>> That seems right to me, too.
>
> It's slightly more complicated than that. The lock conflict should
> actually be resolved using ResolveRecoveryConflictWithLock()... That,
> combined with the race of connecting a actually already deleted database
> (see the XXXs I removed) seem to make the approach in here.
>
> Attached are two patches:
> 1) Infrastructure for attaching more kinds of locks on the startup
>    process.
> 2) Use that infrastructure for database locks during replay.
>
> I'm not sure 2) alone would be sufficient justification for 1), but the
> nearby thread about basebackups also require similar infrastructure...

Some comments about patch 1:
-        * No locking is required here because we already acquired
-        * AccessExclusiveLock. Anybody trying to connect while we do this will
-        * block during InitPostgres() and then disconnect when they see the
-        * database has been removed.
+        * No locking is required here because we already acquired a
+        * AccessExclusiveLock on the database in dbase_redo().
Anybody trying to
+        * connect while we do this will block during InitPostgres() and then
+        * disconnect when they see the database has been removed.        */
This change looks unnecessary, I'd rather let it as-is.

-      "RecoveryLockList contains entry for lock no longer recorded by
lock manager: xid %u database %u relation %u",
-      lock->xid, lock->dbOid, lock->relOid);
+    "RecoveryLockList contains entry for lock no longer recorded by
lock manager: xid %u",
+     lock->xid);
This patch is making the information provided less verbose, and I
think that it is useful to have some information not only about the
lock held, but as well about the database and the relation.
Also, ISTM that StandbyAcquireLock should still use a database OID and
a relation OID instead of a only LOCKTAG, and SET_LOCKTAG_RELATION
should be set in StandbyAcquireLock while
ResolveRecoveryConflictWithLock is extended only with the lock mode as
new argument. (Patch 2 adds many calls to SET_LOCKTAG_RELATION btw
justidying to keep he API changes minimal).

There are some typos in the commit message:
s/shanges/changes
s/exlusive/exclusive

In patch 2, isn't it necessary to bump XLOG_PAGE_MAGIC?
-- 
Michael

Re: Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up

From

Andres Freund

Date:

03 February 2015, 09:41:30

On 2015-02-03 14:18:02 +0900, Michael Paquier wrote:
> -      "RecoveryLockList contains entry for lock no longer recorded by
> lock manager: xid %u database %u relation %u",
> -      lock->xid, lock->dbOid, lock->relOid);
> +    "RecoveryLockList contains entry for lock no longer recorded by
> lock manager: xid %u",
> +     lock->xid);
> This patch is making the information provided less verbose, and I
> think that it is useful to have some information not only about the
> lock held, but as well about the database and the relation.

It's debug4 or impossible stuff that lock.c already warned about - I
doubt anybody has ever actually looked at it in a long while, if
ever. If we really want to provide something more we can use something
like LOCK_PRINT() - but I really doubt it's worth neither the
notational, nor the verbosity overhead.

> Also, ISTM that StandbyAcquireLock should still use a database OID and
> a relation OID instead of a only LOCKTAG, and SET_LOCKTAG_RELATION
> should be set in StandbyAcquireLock while
> ResolveRecoveryConflictWithLock is extended only with the lock mode as
> new argument. (Patch 2 adds many calls to SET_LOCKTAG_RELATION btw
> justidying to keep he API changes minimal).

But there's now callers acquiring other locks than relation locks, like
dbase_redo() acquiring a object lock. And we need to acquire those via
the standby mechanism to avoid races around release.  We could add a
separate wrapper for relation locks, but imo the locktag move to the
callers saved about as many lines in some places as it cost in others.

> In patch 2, isn't it necessary to bump XLOG_PAGE_MAGIC?

I don't think so, there's no incompatible change.

Thanks for having a look!

Andres

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services