Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted. - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted. |
Date | |
Msg-id | CAD21AoAt=RJLVU3=DVZJDoC9WYOVErYLc1Ku=nzxx8JsAZXBfA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted. (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.
|
List | pgsql-hackers |
On Fri, Nov 17, 2017 at 11:00 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Thu, Nov 16, 2017 at 8:17 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> On Thu, Nov 16, 2017 at 1:11 PM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> + /* >>> + * Quick exit if session is not keeping around a non-exclusive backup >>> + * already started. >>> + */ >>> + if (sessionBackupState != SESSION_BACKUP_NON_EXCLUSIVE) >>> + return; >>> I think that it would be more solid to use SESSION_BACKUP_NONE for the >>> comparison, and complete the assertion after the quick exit as follows >>> as this code path should never be taken for an exclusive backup: >> >> Agreed. > > I have spent some time doing an extra lookup with tests involving one > and two sessions doing backups checking for failure code paths while > waiting for archives: > - One session with non-exclusive backup. > - One session with exclusive backup. > - One session is exclusive and the second is non-exclusive > - Both are exclusive. > Also double-checked on the way the logic around the cleanup callback > and sessionBackupState during startup, and those look clean to me. One > thing that was bothering me is that the callback > nonexclusive_base_backup_cleanup is called after do_pg_start_backup() > finishes. But between the moment sessionBackupState is set and the > callback is registered there is no CHECK_FOR_INTERRUPTS or even elog() > calls so even if the process starting a non-exclusive backup is tried > to be terminated between the moment the session lock is set and the > callback is registered things are handled correctly. > > So I think that we are basically safe for backups running with the SQL > interface. Thank you for double checking! > However, things are not completely clean for base backups taken using > the replication protocol. While monitoring more the code, I have > noticed that perform_base_backup() calls do_pg_stop_backup() *without* > taking any cleanup action. So if a base backup is interrupted, say > with SIGTERM, while do_pg_stop_backup() is called and before the > session lock is updated then it is possible to finish with > inconsistent in-memory counters. Oops. Good catch! I'd missed this case. > No need to play with breakpoints and signals in this case, using > something like that is enough to create inconsistent counters. > --- a/src/backend/replication/basebackup.c > +++ b/src/backend/replication/basebackup.c > @@ -327,6 +327,8 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir) > } > PG_END_ENSURE_ERROR_CLEANUP(base_backup_cleanup, (Datum) 0); > > + elog(ERROR, "base backups don't decrement counters here, stupid!"); > + > endptr = do_pg_stop_backup(labelfile->data, !opt->nowait, &endtli); > > A simple fix for this one is to call do_pg_stop_backup() before > PG_END_ENSURE_ERROR_CLEANUP(). By doing so, we also make the callback > cleanup logic consistent with what is done for the SQL equivalent > where the callback is removed after finishing going through > do_pg_stop_backup(). A comment would be adapted here, say something > like "finish the backup while still holding the cleanup callback to > avoid inconsistent in-memory data should the this call fail before > sessionBackupState is updated." > > For the start phase, the current logic is fine, because in the case of > the SQL interface the cleanup callback is registered after finishing > do_pg_start_backup(). > > What do you think? I agree with your approach. It makes sense to me. Attached updated patch. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
pgsql-hackers by date: