Re: pg_basebackup -x stream from the standby gets stuck - Mailing list pgsql-hackers
From | Magnus Hagander |
---|---|
Subject | Re: pg_basebackup -x stream from the standby gets stuck |
Date | |
Msg-id | CABUevEyqSUb4E1RrzJGe7e_M6yoaNg6kN1YBVeV76DX75DP81w@mail.gmail.com Whole thread Raw |
In response to | Re: pg_basebackup -x stream from the standby gets stuck (Fujii Masao <masao.fujii@gmail.com>) |
Responses |
Re: pg_basebackup -x stream from the standby gets stuck
|
List | pgsql-hackers |
On Tue, Feb 28, 2012 at 09:22, Fujii Masao <masao.fujii@gmail.com> wrote: > On Thu, Feb 23, 2012 at 1:02 AM, Magnus Hagander <magnus@hagander.net> wrote: >> On Tue, Feb 7, 2012 at 12:30, Fujii Masao <masao.fujii@gmail.com> wrote: >>> Hi, >>> >>> http://www.depesz.com/2012/02/03/waiting-for-9-2-pg_basebackup-from-slave/ >>>> =$ time pg_basebackup -D /home/pgdba/slave2/ -F p -x stream -c fast -P -v -h 127.0.0.1 -p 5921 -U replication >>>> xlog start point: 2/AC4E2600 >>>> pg_basebackup: starting background WAL receiver >>>> 692447/692447 kB (100%), 1/1 tablespace >>>> xlog end point: 2/AC4E2600 >>>> pg_basebackup: waiting for background process to finish streaming... >>>> pg_basebackup: base backup completed >>>> >>>> real 3m56.237s >>>> user 0m0.224s >>>> sys 0m0.936s >>>> >>>> (time is long because this is only test database with no traffic, so I had to make some inserts for it to finish) >>> >>> The above article points out the problem of pg_basebackup from the standby: >>> when "-x stream" is specified, pg_basebackup from the standby gets stuck if >>> there is no traffic in the database. >>> >>> When "-x stream" is specified, pg_basebackup forks the background process >>> for receiving WAL records during backup, takes an online backup and waits for >>> the background process to end. The forked background process keeps receiving >>> WAL records, and whenever it reaches end of WAL file, it checks whether it has >>> already received all WAL files required for the backup, and exits if yes. Which >>> means that at least one WAL segment switch is required for pg_basebackup with >>> "-x stream" option to end. >>> >>> In the backup from the master, WAL file switch always occurs at both start and >>> end of backup (i.e., in do_pg_start_backup() and do_pg_stop_backup()), so the >>> above logic works fine even if there is no traffic. OTOH, in the backup from the >>> standby, while there is no traffic, WAL file switch is not performed at all. So >>> in that case, there is no chance that the background process reaches end of WAL >>> file, check whether all required WAL arrives and exit. At the end, pg_basebackup >>> gets stuck. >>> >>> To fix the problem, I'd propose to change the background process so that it >>> checks whether all required WAL has arrived, every time data is received, even >>> if end of WAL file is not reached. Patch attached. Comments? >> >> This seems like a good thing in general. >> >> Why does it need to modify pg_receivexlog, though? I thought only >> pg_basebackup had tihs issue? >> >> I guess it is because of the change of the API to >> stream_continue_callback only? > > Yes, that's the reason why I changed continue_streaming() in pg_receivexlog.c. > > But the reason why I changed segment_callback() in pg_receivexlog.c is not the > same. I did that because previously segment_finish_callback is called > only at the > end of WAL segment but in the patch it can be called at the middle of segment. > OTOH, segment_callback() must emit a verbose message only when current > WAL segment is complete. So I had to add the check of whether current WAL > segment is partial or complete into segment_callback(). Yeah, I caught that. >> Looking at it after your patch, >> stream_continue_callback and segment_finish_callback are the same. >> Should we perhaps just fold them into a single >> stream_continue_callback? Since you had to move the "detect segment >> end" to the caller anyway? > > No. I think we cannot do that because in pg_receivexlog they are not the same. But couldn't they be made the same by making the same check as you put in for the verbose message above? >> Another question related to this - since we clearly don't need the >> xlog switch in this case, should we make it conditional on the master >> as well, so we don't switch unnecessarily there as well? > > Maybe. At the end of backup, we force WAL segment switch, to ensure all required > WAL files have been archived. So theoretically if WAL archiving is not enabled, > we can skip WAL segment switch. But some backup tools might depend on this > behavior.... I was thinking we could keep doing it in pg_stop_backup(), but avoid doing it when using pg_basebackup only... > In standby-only backup, we always skip WAL segment switch. So there is > no guarantee > that all WAL files required for the backup are archived at the end of > backup. This > limitation is documented. Right. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
pgsql-hackers by date: