Thread: Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
On Mon, Jun 11, 2012 at 5:37 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Mon, Jun 11, 2012 at 3:24 AM, Magnus Hagander <magnus@hagander.net> wrote: >> On Sun, Jun 10, 2012 at 6:08 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> On Sun, Jun 10, 2012 at 11:45 PM, Magnus Hagander <magnus@hagander.net> wrote: >>>> On Sun, Jun 10, 2012 at 4:29 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>>> On Sun, Jun 10, 2012 at 11:10 PM, Magnus Hagander <magnus@hagander.net> wrote: >>>>>> On Sun, Jun 10, 2012 at 4:08 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>>>>> On Sun, Jun 10, 2012 at 10:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>>>>>> On Sun, Jun 10, 2012 at 9:25 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>>>>>>> On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander <magnus@hagander.net> wrote: >>>>>>>>>> On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>>>>>>>> Fujii Masao <masao.fujii@gmail.com> writes: >>>>>>>>>>>> This seems a bug. I think we should prevent pg_basebackup from >>>>>>>>>>>> becoming synchronous standby. Thought? >>>>>>>>>>> >>>>>>>>>>> Absolutely. If we have replication clients that are not actually >>>>>>>>>>> capable of being standbys, there *must* be a way for the master >>>>>>>>>>> to know that. >>>>>>>>>> >>>>>>>>>> I thought we fixed this already by sending InvalidXlogRecPtr as flush >>>>>>>>>> location? And that this only applied in 9.2? >>>>>>>>>> >>>>>>>>>> Are you saying we picked pg_basebackup *in backup mode* (not log >>>>>>>>>> streaming) as synchronous standby? >>>>>>>>> >>>>>>>>> Yes. >>>>>>>>> >>>>>>>>>> If so then yes, that is >>>>>>>>>> *definitely* a bug that should be fixed. We should never select a >>>>>>>>>> connection that's not even streaming log as standby! >>>>>>>>> >>>>>>>>> Agreed. Attached patch prevents pg_basebackup from becoming sync >>>>>>>>> standby. Also this patch fixes another problem: currently only walsender >>>>>>>>> which reaches STREAMING state can become sync walsender. OTOH, >>>>>>>>> sync walsender thinks that walsender with higher priority will be sync one >>>>>>>>> whether its state is STREAMING, and switches to potential sync walsender. >>>>>>>>> So when the standby with higher priority connects to the master, we >>>>>>>>> might have no sync standby until it reaches the STREAMING state. >>>>>>>>> To fix this problem, the patch switches walsender's state from sync to >>>>>>>>> potential *after* walsender with higher priority has reached the >>>>>>>>> STREAMING state. >>>>>>>>> >>>>>>>>> We also should not select (1) background stream process forked from >>>>>>>>> pg_basebackup and (2) pg_receivexlog as sync standby because they >>>>>>>>> don't send back replication progress. To address this, I'm thinking to >>>>>>>>> introduce new option "NOSYNC" in "START_REPLICATION" command >>>>>>>>> as follows, and to change (1) and (2) so that they specify NOSYNC. >>>>>>>>> >>>>>>>>> START_REPLICATION XXX/XXX [NOSYNC] >>>>>>>>> >>>>>>>>> If the standby specifies NOSYNC option, it's never assigned as sync >>>>>>>>> standby even if its name is in synchronous_standby_names. Thought? >>>>>>>> >>>>>>>> The standby which always sends InvalidXLogRecPtr back should not >>>>>>>> become sync one. So instead of NOSYNC option, by checking whether >>>>>>>> InvalidXLogRecPtr is sent, we can avoid problematic sync standby. >>>>>>> >>>>>>> We should not do this because Magnus is proposing the patch >>>>>>> (http://archives.postgresql.org/pgsql-hackers/2012-06/msg00348.php) >>>>>>> which breaks the above assumption at all. So we should introduce >>>>>>> something like NOSYNC option. >>>>>> >>>>>> Wouldn't the better choice there in that case be to give a switch to >>>>>> pg_receivexlog if you *want* it to be able to become a sync replica, >>>>>> and by default disallow it? And then keep the backend just treating >>>>>> InvalidXlogRecPtr as don't-become-sync-replica. >>>>> >>>>> I don't object to making pg_receivexlog as sync standby at all. So at least >>>>> for me, that switch is not necessary. What I'm worried about is the >>>>> background stream process forked from pg_basebackup. I think that >>>>> it should not run as sync standby but sending back its replication progress >>>>> seems helpful because a user can see the progress from pg_stat_replication. >>>>> So I'm thinking that something like NOSYNC option is required. >>>> >>>> On principle, no. By default, yes. >>>> >>>> How about: >>>> pg_basebackup background: *never* sends flush location, and therefor >>>> won't become sync replica >>>> pg_receivexlog *optionally* sends flush location. by defualt own't >>>> become sync replica, but can be made so with a switch >>> >>> Wouldn't a user who sees NULL in flush_location from pg_stat_replication >>> misunderstand that pg_receivexlog (in default mode) and pg_basebackup >>> background don't flush WAL files at all? >> >> That sounds like a "documentable issue". >> >> But maybe you're right, and we need the "never become sync" as a flag. > > You agreed to add something like NOSYNC option into START_REPLICATION command? I'm on the fence. I was hoping somebody else would chime in with an opinion as well. I just realized this thread is on -admin. Moving it to -hackers so more of the right people will spot it. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Tue, Jun 12, 2012 at 12:47 AM, Magnus Hagander <magnus@hagander.net> wrote: > On Mon, Jun 11, 2012 at 5:37 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Mon, Jun 11, 2012 at 3:24 AM, Magnus Hagander <magnus@hagander.net> wrote: >>> On Sun, Jun 10, 2012 at 6:08 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>> On Sun, Jun 10, 2012 at 11:45 PM, Magnus Hagander <magnus@hagander.net> wrote: >>>>> On Sun, Jun 10, 2012 at 4:29 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>>>> On Sun, Jun 10, 2012 at 11:10 PM, Magnus Hagander <magnus@hagander.net> wrote: >>>>>>> On Sun, Jun 10, 2012 at 4:08 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>>>>>> On Sun, Jun 10, 2012 at 10:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>>>>>>> On Sun, Jun 10, 2012 at 9:25 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>>>>>>>> On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander <magnus@hagander.net> wrote: >>>>>>>>>>> On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>>>>>>>>> Fujii Masao <masao.fujii@gmail.com> writes: >>>>>>>>>>>>> This seems a bug. I think we should prevent pg_basebackup from >>>>>>>>>>>>> becoming synchronous standby. Thought? >>>>>>>>>>>> >>>>>>>>>>>> Absolutely. If we have replication clients that are not actually >>>>>>>>>>>> capable of being standbys, there *must* be a way for the master >>>>>>>>>>>> to know that. >>>>>>>>>>> >>>>>>>>>>> I thought we fixed this already by sending InvalidXlogRecPtr as flush >>>>>>>>>>> location? And that this only applied in 9.2? >>>>>>>>>>> >>>>>>>>>>> Are you saying we picked pg_basebackup *in backup mode* (not log >>>>>>>>>>> streaming) as synchronous standby? >>>>>>>>>> >>>>>>>>>> Yes. >>>>>>>>>> >>>>>>>>>>> If so then yes, that is >>>>>>>>>>> *definitely* a bug that should be fixed. We should never select a >>>>>>>>>>> connection that's not even streaming log as standby! >>>>>>>>>> >>>>>>>>>> Agreed. Attached patch prevents pg_basebackup from becoming sync >>>>>>>>>> standby. Also this patch fixes another problem: currently only walsender >>>>>>>>>> which reaches STREAMING state can become sync walsender. OTOH, >>>>>>>>>> sync walsender thinks that walsender with higher priority will be sync one >>>>>>>>>> whether its state is STREAMING, and switches to potential sync walsender. >>>>>>>>>> So when the standby with higher priority connects to the master, we >>>>>>>>>> might have no sync standby until it reaches the STREAMING state. >>>>>>>>>> To fix this problem, the patch switches walsender's state from sync to >>>>>>>>>> potential *after* walsender with higher priority has reached the >>>>>>>>>> STREAMING state. >>>>>>>>>> >>>>>>>>>> We also should not select (1) background stream process forked from >>>>>>>>>> pg_basebackup and (2) pg_receivexlog as sync standby because they >>>>>>>>>> don't send back replication progress. To address this, I'm thinking to >>>>>>>>>> introduce new option "NOSYNC" in "START_REPLICATION" command >>>>>>>>>> as follows, and to change (1) and (2) so that they specify NOSYNC. >>>>>>>>>> >>>>>>>>>> START_REPLICATION XXX/XXX [NOSYNC] >>>>>>>>>> >>>>>>>>>> If the standby specifies NOSYNC option, it's never assigned as sync >>>>>>>>>> standby even if its name is in synchronous_standby_names. Thought? >>>>>>>>> >>>>>>>>> The standby which always sends InvalidXLogRecPtr back should not >>>>>>>>> become sync one. So instead of NOSYNC option, by checking whether >>>>>>>>> InvalidXLogRecPtr is sent, we can avoid problematic sync standby. >>>>>>>> >>>>>>>> We should not do this because Magnus is proposing the patch >>>>>>>> (http://archives.postgresql.org/pgsql-hackers/2012-06/msg00348.php) >>>>>>>> which breaks the above assumption at all. So we should introduce >>>>>>>> something like NOSYNC option. >>>>>>> >>>>>>> Wouldn't the better choice there in that case be to give a switch to >>>>>>> pg_receivexlog if you *want* it to be able to become a sync replica, >>>>>>> and by default disallow it? And then keep the backend just treating >>>>>>> InvalidXlogRecPtr as don't-become-sync-replica. >>>>>> >>>>>> I don't object to making pg_receivexlog as sync standby at all. So at least >>>>>> for me, that switch is not necessary. What I'm worried about is the >>>>>> background stream process forked from pg_basebackup. I think that >>>>>> it should not run as sync standby but sending back its replication progress >>>>>> seems helpful because a user can see the progress from pg_stat_replication. >>>>>> So I'm thinking that something like NOSYNC option is required. >>>>> >>>>> On principle, no. By default, yes. >>>>> >>>>> How about: >>>>> pg_basebackup background: *never* sends flush location, and therefor >>>>> won't become sync replica >>>>> pg_receivexlog *optionally* sends flush location. by defualt own't >>>>> become sync replica, but can be made so with a switch >>>> >>>> Wouldn't a user who sees NULL in flush_location from pg_stat_replication >>>> misunderstand that pg_receivexlog (in default mode) and pg_basebackup >>>> background don't flush WAL files at all? >>> >>> That sounds like a "documentable issue". >>> >>> But maybe you're right, and we need the "never become sync" as a flag. >> >> You agreed to add something like NOSYNC option into START_REPLICATION command? > > I'm on the fence. I was hoping somebody else would chime in with an > opinion as well. +1 Regards, -- Fujii Masao
On Mon, Jun 11, 2012 at 6:06 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Tue, Jun 12, 2012 at 12:47 AM, Magnus Hagander <magnus@hagander.net> wrote: >> On Mon, Jun 11, 2012 at 5:37 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> On Mon, Jun 11, 2012 at 3:24 AM, Magnus Hagander <magnus@hagander.net> wrote: >>>> On Sun, Jun 10, 2012 at 6:08 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>>> On Sun, Jun 10, 2012 at 11:45 PM, Magnus Hagander <magnus@hagander.net> wrote: >>>>>> On Sun, Jun 10, 2012 at 4:29 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>>>>> On Sun, Jun 10, 2012 at 11:10 PM, Magnus Hagander <magnus@hagander.net> wrote: >>>>>>>> On Sun, Jun 10, 2012 at 4:08 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>>>>>>> On Sun, Jun 10, 2012 at 10:34 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>>>>>>>> On Sun, Jun 10, 2012 at 9:25 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>>>>>>>>> On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander <magnus@hagander.net> wrote: >>>>>>>>>>>> On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>>>>>>>>>> Fujii Masao <masao.fujii@gmail.com> writes: >>>>>>>>>>>>>> This seems a bug. I think we should prevent pg_basebackup from >>>>>>>>>>>>>> becoming synchronous standby. Thought? >>>>>>>>>>>>> >>>>>>>>>>>>> Absolutely. If we have replication clients that are not actually >>>>>>>>>>>>> capable of being standbys, there *must* be a way for the master >>>>>>>>>>>>> to know that. >>>>>>>>>>>> >>>>>>>>>>>> I thought we fixed this already by sending InvalidXlogRecPtr as flush >>>>>>>>>>>> location? And that this only applied in 9.2? >>>>>>>>>>>> >>>>>>>>>>>> Are you saying we picked pg_basebackup *in backup mode* (not log >>>>>>>>>>>> streaming) as synchronous standby? >>>>>>>>>>> >>>>>>>>>>> Yes. >>>>>>>>>>> >>>>>>>>>>>> If so then yes, that is >>>>>>>>>>>> *definitely* a bug that should be fixed. We should never select a >>>>>>>>>>>> connection that's not even streaming log as standby! >>>>>>>>>>> >>>>>>>>>>> Agreed. Attached patch prevents pg_basebackup from becoming sync >>>>>>>>>>> standby. Also this patch fixes another problem: currently only walsender >>>>>>>>>>> which reaches STREAMING state can become sync walsender. OTOH, >>>>>>>>>>> sync walsender thinks that walsender with higher priority will be sync one >>>>>>>>>>> whether its state is STREAMING, and switches to potential sync walsender. >>>>>>>>>>> So when the standby with higher priority connects to the master, we >>>>>>>>>>> might have no sync standby until it reaches the STREAMING state. >>>>>>>>>>> To fix this problem, the patch switches walsender's state from sync to >>>>>>>>>>> potential *after* walsender with higher priority has reached the >>>>>>>>>>> STREAMING state. >>>>>>>>>>> >>>>>>>>>>> We also should not select (1) background stream process forked from >>>>>>>>>>> pg_basebackup and (2) pg_receivexlog as sync standby because they >>>>>>>>>>> don't send back replication progress. To address this, I'm thinking to >>>>>>>>>>> introduce new option "NOSYNC" in "START_REPLICATION" command >>>>>>>>>>> as follows, and to change (1) and (2) so that they specify NOSYNC. >>>>>>>>>>> >>>>>>>>>>> START_REPLICATION XXX/XXX [NOSYNC] >>>>>>>>>>> >>>>>>>>>>> If the standby specifies NOSYNC option, it's never assigned as sync >>>>>>>>>>> standby even if its name is in synchronous_standby_names. Thought? >>>>>>>>>> >>>>>>>>>> The standby which always sends InvalidXLogRecPtr back should not >>>>>>>>>> become sync one. So instead of NOSYNC option, by checking whether >>>>>>>>>> InvalidXLogRecPtr is sent, we can avoid problematic sync standby. >>>>>>>>> >>>>>>>>> We should not do this because Magnus is proposing the patch >>>>>>>>> (http://archives.postgresql.org/pgsql-hackers/2012-06/msg00348.php) >>>>>>>>> which breaks the above assumption at all. So we should introduce >>>>>>>>> something like NOSYNC option. >>>>>>>> >>>>>>>> Wouldn't the better choice there in that case be to give a switch to >>>>>>>> pg_receivexlog if you *want* it to be able to become a sync replica, >>>>>>>> and by default disallow it? And then keep the backend just treating >>>>>>>> InvalidXlogRecPtr as don't-become-sync-replica. >>>>>>> >>>>>>> I don't object to making pg_receivexlog as sync standby at all. So at least >>>>>>> for me, that switch is not necessary. What I'm worried about is the >>>>>>> background stream process forked from pg_basebackup. I think that >>>>>>> it should not run as sync standby but sending back its replication progress >>>>>>> seems helpful because a user can see the progress from pg_stat_replication. >>>>>>> So I'm thinking that something like NOSYNC option is required. >>>>>> >>>>>> On principle, no. By default, yes. >>>>>> >>>>>> How about: >>>>>> pg_basebackup background: *never* sends flush location, and therefor >>>>>> won't become sync replica >>>>>> pg_receivexlog *optionally* sends flush location. by defualt own't >>>>>> become sync replica, but can be made so with a switch >>>>> >>>>> Wouldn't a user who sees NULL in flush_location from pg_stat_replication >>>>> misunderstand that pg_receivexlog (in default mode) and pg_basebackup >>>>> background don't flush WAL files at all? >>>> >>>> That sounds like a "documentable issue". >>>> >>>> But maybe you're right, and we need the "never become sync" as a flag. >>> >>> You agreed to add something like NOSYNC option into START_REPLICATION command? >> >> I'm on the fence. I was hoping somebody else would chime in with an >> opinion as well. > > +1 Nobody else with any opinion on this? :( -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On 11 June 2012 23:47, Magnus Hagander <magnus@hagander.net> wrote >> You agreed to add something like NOSYNC option into START_REPLICATION command? > > I'm on the fence. I was hoping somebody else would chime in with an > opinion as well. Why would you add it to synchronous_standby_names and then explicitly ignore it? I don't see why you'd want this. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus@hagander.net> wrote: >>>> You agreed to add something like NOSYNC option into START_REPLICATION command? >>> >>> I'm on the fence. I was hoping somebody else would chime in with an >>> opinion as well. >> >> +1 > > Nobody else with any opinion on this? :( I don't think we really need a NOSYNC flag at this point. Just not setting the flush location in clients that make a point of flushing in a timely fashion seems fine. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus@hagander.net> wrote: >>>>> You agreed to add something like NOSYNC option into START_REPLICATION command? >>>> >>>> I'm on the fence. I was hoping somebody else would chime in with an >>>> opinion as well. >>> >>> +1 >> >> Nobody else with any opinion on this? :( > > I don't think we really need a NOSYNC flag at this point. Just not > setting the flush location in clients that make a point of flushing in > a timely fashion seems fine. Okay, I'm in the minority, so I'm writing the patch that way. WIP patch attached. In the patch, pg_basebackup background process and pg_receivexlog always return invalid location as flush one, and will never become sync standby even if their name is in synchronous_standby_names. The timing of their sending the reply depends on the standby_message_timeout specified in -s option. So the write position may lag behind the true position. pg_receivexlog accepts new option -S (better option character?). If this option is specified, pg_receivexlog returns true flush position, and can become sync standby. It sends back the reply to the master each time the write position changes or the timeout passes. If synchronous_commit is set to remote_write, synchronous replication to pg_receivexlog would work well. The patch needs more documentation. But I think that it's worth reviewing the code in advance, so I attached the WIP patch. Comments? Objections? The patch is based on current HEAD, i.e., 9.3dev. If the patch is applied, we need to write the backport version of the patch for 9.2. Regards, -- Fujii Masao
Attachment
On 27 June 2012 18:24, Fujii Masao <masao.fujii@gmail.com> wrote: > will never become sync standby even > if their name is in synchronous_standby_names. I don't understand why you'd want that. What is wrong with removing the name from synchronous_standby_names if you don't like it? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jun 27, 2012 at 7:58 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 27 June 2012 18:24, Fujii Masao <masao.fujii@gmail.com> wrote: > >> will never become sync standby even >> if their name is in synchronous_standby_names. > > I don't understand why you'd want that. > > What is wrong with removing the name from synchronous_standby_names if > you don't like it? I believe that's just fallout -the main point is that we don't want to become a sync standby when synchronous_standby_names is set to '*'. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus@hagander.net> wrote: >>>>>> You agreed to add something like NOSYNC option into START_REPLICATION command? >>>>> >>>>> I'm on the fence. I was hoping somebody else would chime in with an >>>>> opinion as well. >>>> >>>> +1 >>> >>> Nobody else with any opinion on this? :( >> >> I don't think we really need a NOSYNC flag at this point. Just not >> setting the flush location in clients that make a point of flushing in >> a timely fashion seems fine. > > Okay, I'm in the minority, so I'm writing the patch that way. WIP > patch attached. > > In the patch, pg_basebackup background process and pg_receivexlog always > return invalid location as flush one, and will never become sync standby even > if their name is in synchronous_standby_names. The timing of their sending That doesn't match with the patch, afaics. The patch always sets the correct write location, which means it can become a remote_write synchronous standby, no? It will only send it back when timeout expires, but it will be sent back. I wonder if that might actually be a more reasonable mode of operation in general: * always send back the write position, at the write interval * always send back the flush position, when we're flushing (meaning when we switch xlog) have an option that makes it possible to: * always send back the write position as soon as it changes (making for a reasonable remote_write sync standby) * actually flush the log after each write instead of end of file (making for a reasonable full sync standby) meaning you'd have something like "pg_receivexlog --sync=write" and "pg_receivexlog --sync=flush" controlling it instead. And deal with the "user put * in synchronous_standby_names and accidentally got pg_receivexlog as the sync standby" by more clearly warning people not to use * for that parameter... Since it's simply dangerous :) > the reply depends on the standby_message_timeout specified in -s option. So > the write position may lag behind the true position. > > pg_receivexlog accepts new option -S (better option character?). If this option > is specified, pg_receivexlog returns true flush position, and can become sync > standby. It sends back the reply to the master each time the write position > changes or the timeout passes. If synchronous_commit is set to remote_write, > synchronous replication to pg_receivexlog would work well. Yeah, I hadn't considered the remote_write mode, but I guess that's why you have to track the current write position across loads, which first confused me. Looking at some other usecases for this, I wonder if we should also force a status message whenever we switch xlog files, even if we aren't running in sync mode, even if the timeout hasn't expired. I think that would be a reasonable thing to do, since you often want to track things based on files. > The patch needs more documentation. But I think that it's worth reviewing the > code in advance, so I attached the WIP patch. Comments? Objections? Looking at the code, what exactly prompts the changes to the backend side? That seems unrelated? Are we actually considering picking a standby with InvalidXlogRecPtr as a sync standby today? Isn't it enough to just send the proper write and flush locations from the frontend? > The patch is based on current HEAD, i.e., 9.3dev. If the patch is applied, > we need to write the backport version of the patch for 9.2. Oh, conflicts with Heikkis xlog patches, right? Ugh. But yeah. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Thanks for the review! On Fri, Jun 29, 2012 at 7:22 PM, Magnus Hagander <magnus@hagander.net> wrote: > On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus@hagander.net> wrote: >>>>>>> You agreed to add something like NOSYNC option into START_REPLICATION command? >>>>>> >>>>>> I'm on the fence. I was hoping somebody else would chime in with an >>>>>> opinion as well. >>>>> >>>>> +1 >>>> >>>> Nobody else with any opinion on this? :( >>> >>> I don't think we really need a NOSYNC flag at this point. Just not >>> setting the flush location in clients that make a point of flushing in >>> a timely fashion seems fine. >> >> Okay, I'm in the minority, so I'm writing the patch that way. WIP >> patch attached. >> >> In the patch, pg_basebackup background process and pg_receivexlog always >> return invalid location as flush one, and will never become sync standby even >> if their name is in synchronous_standby_names. The timing of their sending > > That doesn't match with the patch, afaics. The patch always sets the > correct write location, which means it can become a remote_write > synchronous standby, no? It will only send it back when timeout > expires, but it will be sent back. No. Though correct write location is sent back, they don't become sync standby because flush location is always invalid. While flush location is invalid, the master will never regard the remote server as sync one even if synchronous_commit is set to remote_write. > > I wonder if that might actually be a more reasonable mode of operation > in general: > > * always send back the write position, at the write interval > * always send back the flush position, when we're flushing (meaning > when we switch xlog) > > have an option that makes it possible to: > * always send back the write position as soon as it changes (making > for a reasonable remote_write sync standby) > * actually flush the log after each write instead of end of file > (making for a reasonable full sync standby) > > meaning you'd have something like "pg_receivexlog --sync=write" and > "pg_receivexlog --sync=flush" controlling it instead. Yeah, in this way, pg_receivexlog can become sync even if synchronous_commit is on, which seems more useful. But I'm thinking that the synchronous pg_receivexlog stuff should be postponed to 9.3 because its patch seems to become too big to apply at this beta stage. So, in 9.2, to fix the problem, what about just applying the simple patch which prevents pg_basebackup background process and pg_receivexlog from becoming sync standby whatever synchronous_standby_names and synchronous_commit are set to? > And deal with the "user put * in synchronous_standby_names and > accidentally got pg_receivexlog as the sync standby" by more clearly > warning people not to use * for that parameter... Since it's simply > dangerous :) Yep. >> the reply depends on the standby_message_timeout specified in -s option. So >> the write position may lag behind the true position. >> >> pg_receivexlog accepts new option -S (better option character?). If this option >> is specified, pg_receivexlog returns true flush position, and can become sync >> standby. It sends back the reply to the master each time the write position >> changes or the timeout passes. If synchronous_commit is set to remote_write, >> synchronous replication to pg_receivexlog would work well. > > Yeah, I hadn't considered the remote_write mode, but I guess that's > why you have to track the current write position across loads, which > first confused me. The patch has to track the current write location to decide whether to send back the reply to the master, IOW to know whether the write location has changed, IOW to know whether we've already sent the reply about the latest write location. > Looking at some other usecases for this, I wonder if we should also > force a status message whenever we switch xlog files, even if we > aren't running in sync mode, even if the timeout hasn't expired. I > think that would be a reasonable thing to do, since you often want to > track things based on files. You mean that the pg_receivexlog should send back the correct flush location whenever it switches xlog files? >> The patch needs more documentation. But I think that it's worth reviewing the >> code in advance, so I attached the WIP patch. Comments? Objections? > > Looking at the code, what exactly prompts the changes to the backend > side? That seems unrelated? Are we actually considering picking a > standby with InvalidXlogRecPtr as a sync standby today? > > Isn't it enough to just send the proper write and flush locations from > the frontend? No, unless I'm missing something. The problem that we should address first is that the master can pick up pg_basebackup background process and pg_receivexlog as a sync standby even if they always return an invalid write and flush positions. Since they don't send any correct write and flush positions, if they are accidentally regarded as sync standby, all transactions can get blocked infinitely. So the patch needed to change the walsender code so that it doesn't pick up the remote server as sync one while its flush position is invalid. >> The patch is based on current HEAD, i.e., 9.3dev. If the patch is applied, >> we need to write the backport version of the patch for 9.2. > > Oh, conflicts with Heikkis xlog patches, right? Ugh. But yeah. Yep. Regards, -- Fujii Masao
On Sun, Jul 1, 2012 at 7:14 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > > On Fri, Jun 29, 2012 at 7:22 PM, Magnus Hagander <magnus@hagander.net> wrote: >> On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>>> On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus@hagander.net> wrote: >>>>>>>> You agreed to add something like NOSYNC option into START_REPLICATION command? >>>>>>> >>>>>>> I'm on the fence. I was hoping somebody else would chime in with an >>>>>>> opinion as well. >>>>>> >>>>>> +1 >>>>> >>>>> Nobody else with any opinion on this? :( >>>> >>>> I don't think we really need a NOSYNC flag at this point. Just not >>>> setting the flush location in clients that make a point of flushing in >>>> a timely fashion seems fine. >>> >>> Okay, I'm in the minority, so I'm writing the patch that way. WIP >>> patch attached. >>> >>> In the patch, pg_basebackup background process and pg_receivexlog always >>> return invalid location as flush one, and will never become sync standby even >>> if their name is in synchronous_standby_names. The timing of their sending >> >> That doesn't match with the patch, afaics. The patch always sets the >> correct write location, which means it can become a remote_write >> synchronous standby, no? It will only send it back when timeout >> expires, but it will be sent back. > > No. Though correct write location is sent back, they don't become sync standby > because flush location is always invalid. While flush location is > invalid, the master > will never regard the remote server as sync one even if synchronous_commit is > set to remote_write. Oh. I wasn't aware of that part. >> I wonder if that might actually be a more reasonable mode of operation >> in general: >> >> * always send back the write position, at the write interval >> * always send back the flush position, when we're flushing (meaning >> when we switch xlog) >> >> have an option that makes it possible to: >> * always send back the write position as soon as it changes (making >> for a reasonable remote_write sync standby) >> * actually flush the log after each write instead of end of file >> (making for a reasonable full sync standby) >> >> meaning you'd have something like "pg_receivexlog --sync=write" and >> "pg_receivexlog --sync=flush" controlling it instead. > > Yeah, in this way, pg_receivexlog can become sync even if > synchronous_commit is on, which seems more useful. But > I'm thinking that the synchronous pg_receivexlog stuff should > be postponed to 9.3 because its patch seems to become too > big to apply at this beta stage. So, in 9.2, to fix the problem, > what about just applying the simple patch which prevents > pg_basebackup background process and pg_receivexlog from > becoming sync standby whatever synchronous_standby_names > and synchronous_commit are set to? Agreed. With the addition that we should set the write location, because that's very useful and per what you said above should be perfectly safe. >> And deal with the "user put * in synchronous_standby_names and >> accidentally got pg_receivexlog as the sync standby" by more clearly >> warning people not to use * for that parameter... Since it's simply >> dangerous :) > > Yep. What would be good wording? Something along the line of "Using the * entry is not recommended since it can lead to unexpected results when new standbys are added" or something like that? >>> the reply depends on the standby_message_timeout specified in -s option. So >>> the write position may lag behind the true position. >>> >>> pg_receivexlog accepts new option -S (better option character?). If this option >>> is specified, pg_receivexlog returns true flush position, and can become sync >>> standby. It sends back the reply to the master each time the write position >>> changes or the timeout passes. If synchronous_commit is set to remote_write, >>> synchronous replication to pg_receivexlog would work well. >> >> Yeah, I hadn't considered the remote_write mode, but I guess that's >> why you have to track the current write position across loads, which >> first confused me. > > The patch has to track the current write location to decide whether to send > back the reply to the master, IOW to know whether the write location > has changed, IOW to know whether we've already sent the reply about > the latest write location. Yeha, makes perfect sense. >> Looking at some other usecases for this, I wonder if we should also >> force a status message whenever we switch xlog files, even if we >> aren't running in sync mode, even if the timeout hasn't expired. I >> think that would be a reasonable thing to do, since you often want to >> track things based on files. > > You mean that the pg_receivexlog should send back the correct flush > location whenever it switches xlog files? No, I mean just send back a status message. Meaning that without specifiying the sync modes per above, it would send back the *write* location. This would be useful for tracking xlog filenames between master and pg_receivexlog, without extra delay. >>> The patch needs more documentation. But I think that it's worth reviewing the >>> code in advance, so I attached the WIP patch. Comments? Objections? >> >> Looking at the code, what exactly prompts the changes to the backend >> side? That seems unrelated? Are we actually considering picking a >> standby with InvalidXlogRecPtr as a sync standby today? >> >> Isn't it enough to just send the proper write and flush locations from >> the frontend? > > No, unless I'm missing something. > > The problem that we should address first is that the master can pick up > pg_basebackup background process and pg_receivexlog as a sync > standby even if they always return an invalid write and flush positions. > Since they don't send any correct write and flush positions, if they are > accidentally regarded as sync standby, all transactions can get blocked > infinitely. So the patch needed to change the walsender code so that it > doesn't pick up the remote server as sync one while its flush position > is invalid. Yeah, that is clearly wrong. I think I missed this behaviour, and got confused by the fact that the patch was trying to fix two different things - only one of which I was aware of. So yes, per above, let's isolate out this part as one patch and get that into 9.2, along with the "set the proper write location", but leave everything else for 9.3. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Mon, Jul 2, 2012 at 4:01 AM, Magnus Hagander <magnus@hagander.net> wrote: > On Sun, Jul 1, 2012 at 7:14 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> >> On Fri, Jun 29, 2012 at 7:22 PM, Magnus Hagander <magnus@hagander.net> wrote: >>> On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>> On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>>>> On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus@hagander.net> wrote: >>>>>>>>> You agreed to add something like NOSYNC option into START_REPLICATION command? >>>>>>>> >>>>>>>> I'm on the fence. I was hoping somebody else would chime in with an >>>>>>>> opinion as well. >>>>>>> >>>>>>> +1 >>>>>> >>>>>> Nobody else with any opinion on this? :( >>>>> >>>>> I don't think we really need a NOSYNC flag at this point. Just not >>>>> setting the flush location in clients that make a point of flushing in >>>>> a timely fashion seems fine. >>>> >>>> Okay, I'm in the minority, so I'm writing the patch that way. WIP >>>> patch attached. >>>> >>>> In the patch, pg_basebackup background process and pg_receivexlog always >>>> return invalid location as flush one, and will never become sync standby even >>>> if their name is in synchronous_standby_names. The timing of their sending >>> >>> That doesn't match with the patch, afaics. The patch always sets the >>> correct write location, which means it can become a remote_write >>> synchronous standby, no? It will only send it back when timeout >>> expires, but it will be sent back. >> >> No. Though correct write location is sent back, they don't become sync standby >> because flush location is always invalid. While flush location is >> invalid, the master >> will never regard the remote server as sync one even if synchronous_commit is >> set to remote_write. > > Oh. I wasn't aware of that part. > > >>> I wonder if that might actually be a more reasonable mode of operation >>> in general: >>> >>> * always send back the write position, at the write interval >>> * always send back the flush position, when we're flushing (meaning >>> when we switch xlog) >>> >>> have an option that makes it possible to: >>> * always send back the write position as soon as it changes (making >>> for a reasonable remote_write sync standby) >>> * actually flush the log after each write instead of end of file >>> (making for a reasonable full sync standby) >>> >>> meaning you'd have something like "pg_receivexlog --sync=write" and >>> "pg_receivexlog --sync=flush" controlling it instead. >> >> Yeah, in this way, pg_receivexlog can become sync even if >> synchronous_commit is on, which seems more useful. But >> I'm thinking that the synchronous pg_receivexlog stuff should >> be postponed to 9.3 because its patch seems to become too >> big to apply at this beta stage. So, in 9.2, to fix the problem, >> what about just applying the simple patch which prevents >> pg_basebackup background process and pg_receivexlog from >> becoming sync standby whatever synchronous_standby_names >> and synchronous_commit are set to? > > Agreed. > > With the addition that we should set the write location, because > that's very useful and per what you said above should be perfectly > safe. > > >>> And deal with the "user put * in synchronous_standby_names and >>> accidentally got pg_receivexlog as the sync standby" by more clearly >>> warning people not to use * for that parameter... Since it's simply >>> dangerous :) >> >> Yep. > > What would be good wording? Something along the line of "Using the * > entry is not recommended since it can lead to unexpected results when > new standbys are added" or something like that? > > >>>> the reply depends on the standby_message_timeout specified in -s option. So >>>> the write position may lag behind the true position. >>>> >>>> pg_receivexlog accepts new option -S (better option character?). If this option >>>> is specified, pg_receivexlog returns true flush position, and can become sync >>>> standby. It sends back the reply to the master each time the write position >>>> changes or the timeout passes. If synchronous_commit is set to remote_write, >>>> synchronous replication to pg_receivexlog would work well. >>> >>> Yeah, I hadn't considered the remote_write mode, but I guess that's >>> why you have to track the current write position across loads, which >>> first confused me. >> >> The patch has to track the current write location to decide whether to send >> back the reply to the master, IOW to know whether the write location >> has changed, IOW to know whether we've already sent the reply about >> the latest write location. > > Yeha, makes perfect sense. > > >>> Looking at some other usecases for this, I wonder if we should also >>> force a status message whenever we switch xlog files, even if we >>> aren't running in sync mode, even if the timeout hasn't expired. I >>> think that would be a reasonable thing to do, since you often want to >>> track things based on files. >> >> You mean that the pg_receivexlog should send back the correct flush >> location whenever it switches xlog files? > > No, I mean just send back a status message. Meaning that without > specifiying the sync modes per above, it would send back the *write* > location. This would be useful for tracking xlog filenames between > master and pg_receivexlog, without extra delay. > >>>> The patch needs more documentation. But I think that it's worth reviewing the >>>> code in advance, so I attached the WIP patch. Comments? Objections? >>> >>> Looking at the code, what exactly prompts the changes to the backend >>> side? That seems unrelated? Are we actually considering picking a >>> standby with InvalidXlogRecPtr as a sync standby today? >>> >>> Isn't it enough to just send the proper write and flush locations from >>> the frontend? >> >> No, unless I'm missing something. >> >> The problem that we should address first is that the master can pick up >> pg_basebackup background process and pg_receivexlog as a sync >> standby even if they always return an invalid write and flush positions. >> Since they don't send any correct write and flush positions, if they are >> accidentally regarded as sync standby, all transactions can get blocked >> infinitely. So the patch needed to change the walsender code so that it >> doesn't pick up the remote server as sync one while its flush position >> is invalid. > > Yeah, that is clearly wrong. I think I missed this behaviour, and got > confused by the fact that the patch was trying to fix two different > things - only one of which I was aware of. > > So yes, per above, let's isolate out this part as one patch and get > that into 9.2, along with the "set the proper write location", but > leave everything else for 9.3. Agreed. The attached patch always sets the correct write location and prevents the remote server sending back invalid flush location from becoming sync standby. Regards, -- Fujii Masao
Attachment
On Mon, Jul 2, 2012 at 8:17 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Mon, Jul 2, 2012 at 4:01 AM, Magnus Hagander <magnus@hagander.net> wrote: >> On Sun, Jul 1, 2012 at 7:14 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> >>> On Fri, Jun 29, 2012 at 7:22 PM, Magnus Hagander <magnus@hagander.net> wrote: >>>> On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >>>>> On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>>>>> On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus@hagander.net> wrote: >>>>>>>>>> You agreed to add something like NOSYNC option into START_REPLICATION command? >>>>>>>>> >>>>>>>>> I'm on the fence. I was hoping somebody else would chime in with an >>>>>>>>> opinion as well. >>>>>>>> >>>>>>>> +1 >>>>>>> >>>>>>> Nobody else with any opinion on this? :( >>>>>> >>>>>> I don't think we really need a NOSYNC flag at this point. Just not >>>>>> setting the flush location in clients that make a point of flushing in >>>>>> a timely fashion seems fine. >>>>> >>>>> Okay, I'm in the minority, so I'm writing the patch that way. WIP >>>>> patch attached. >>>>> >>>>> In the patch, pg_basebackup background process and pg_receivexlog always >>>>> return invalid location as flush one, and will never become sync standby even >>>>> if their name is in synchronous_standby_names. The timing of their sending >>>> >>>> That doesn't match with the patch, afaics. The patch always sets the >>>> correct write location, which means it can become a remote_write >>>> synchronous standby, no? It will only send it back when timeout >>>> expires, but it will be sent back. >>> >>> No. Though correct write location is sent back, they don't become sync standby >>> because flush location is always invalid. While flush location is >>> invalid, the master >>> will never regard the remote server as sync one even if synchronous_commit is >>> set to remote_write. >> >> Oh. I wasn't aware of that part. >> >> >>>> I wonder if that might actually be a more reasonable mode of operation >>>> in general: >>>> >>>> * always send back the write position, at the write interval >>>> * always send back the flush position, when we're flushing (meaning >>>> when we switch xlog) >>>> >>>> have an option that makes it possible to: >>>> * always send back the write position as soon as it changes (making >>>> for a reasonable remote_write sync standby) >>>> * actually flush the log after each write instead of end of file >>>> (making for a reasonable full sync standby) >>>> >>>> meaning you'd have something like "pg_receivexlog --sync=write" and >>>> "pg_receivexlog --sync=flush" controlling it instead. >>> >>> Yeah, in this way, pg_receivexlog can become sync even if >>> synchronous_commit is on, which seems more useful. But >>> I'm thinking that the synchronous pg_receivexlog stuff should >>> be postponed to 9.3 because its patch seems to become too >>> big to apply at this beta stage. So, in 9.2, to fix the problem, >>> what about just applying the simple patch which prevents >>> pg_basebackup background process and pg_receivexlog from >>> becoming sync standby whatever synchronous_standby_names >>> and synchronous_commit are set to? >> >> Agreed. >> >> With the addition that we should set the write location, because >> that's very useful and per what you said above should be perfectly >> safe. >> >> >>>> And deal with the "user put * in synchronous_standby_names and >>>> accidentally got pg_receivexlog as the sync standby" by more clearly >>>> warning people not to use * for that parameter... Since it's simply >>>> dangerous :) >>> >>> Yep. >> >> What would be good wording? Something along the line of "Using the * >> entry is not recommended since it can lead to unexpected results when >> new standbys are added" or something like that? >> >> >>>>> the reply depends on the standby_message_timeout specified in -s option. So >>>>> the write position may lag behind the true position. >>>>> >>>>> pg_receivexlog accepts new option -S (better option character?). If this option >>>>> is specified, pg_receivexlog returns true flush position, and can become sync >>>>> standby. It sends back the reply to the master each time the write position >>>>> changes or the timeout passes. If synchronous_commit is set to remote_write, >>>>> synchronous replication to pg_receivexlog would work well. >>>> >>>> Yeah, I hadn't considered the remote_write mode, but I guess that's >>>> why you have to track the current write position across loads, which >>>> first confused me. >>> >>> The patch has to track the current write location to decide whether to send >>> back the reply to the master, IOW to know whether the write location >>> has changed, IOW to know whether we've already sent the reply about >>> the latest write location. >> >> Yeha, makes perfect sense. >> >> >>>> Looking at some other usecases for this, I wonder if we should also >>>> force a status message whenever we switch xlog files, even if we >>>> aren't running in sync mode, even if the timeout hasn't expired. I >>>> think that would be a reasonable thing to do, since you often want to >>>> track things based on files. >>> >>> You mean that the pg_receivexlog should send back the correct flush >>> location whenever it switches xlog files? >> >> No, I mean just send back a status message. Meaning that without >> specifiying the sync modes per above, it would send back the *write* >> location. This would be useful for tracking xlog filenames between >> master and pg_receivexlog, without extra delay. >> >>>>> The patch needs more documentation. But I think that it's worth reviewing the >>>>> code in advance, so I attached the WIP patch. Comments? Objections? >>>> >>>> Looking at the code, what exactly prompts the changes to the backend >>>> side? That seems unrelated? Are we actually considering picking a >>>> standby with InvalidXlogRecPtr as a sync standby today? >>>> >>>> Isn't it enough to just send the proper write and flush locations from >>>> the frontend? >>> >>> No, unless I'm missing something. >>> >>> The problem that we should address first is that the master can pick up >>> pg_basebackup background process and pg_receivexlog as a sync >>> standby even if they always return an invalid write and flush positions. >>> Since they don't send any correct write and flush positions, if they are >>> accidentally regarded as sync standby, all transactions can get blocked >>> infinitely. So the patch needed to change the walsender code so that it >>> doesn't pick up the remote server as sync one while its flush position >>> is invalid. >> >> Yeah, that is clearly wrong. I think I missed this behaviour, and got >> confused by the fact that the patch was trying to fix two different >> things - only one of which I was aware of. >> >> So yes, per above, let's isolate out this part as one patch and get >> that into 9.2, along with the "set the proper write location", but >> leave everything else for 9.3. > > Agreed. The attached patch always sets the correct write location and > prevents the remote server sending back invalid flush location from > becoming sync standby. Thanks, applied. I also put the prevent invalid flush location from becoming a sync standby part on 9.1 (required only a minor change - GetXLogReplayRecPtr() didn't take an argument back then). -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/