Thread: pg_ctl promote wait
It would be nice if pg_ctl promote supported the -w (wait) option. How could pg_ctl determine when the promotion has finished? We could query pg_is_in_recovery(), but that would require database access, which we got rid of in pg_ctl. We could check when recovery.conf is gone or recovery.done appears. This could work, but I think some people are trying to get rid of these files, so building a feature on that might be a problem? (I think the latest news on that was that the files might be different in this hypothetical future but there would still be a file to prevent re-recovery on restart.) Other ideas?
On 18 February 2016 at 02:47, Peter Eisentraut <peter_e@gmx.net> wrote:
--
It would be nice if pg_ctl promote supported the -w (wait) option.
How could pg_ctl determine when the promotion has finished?
We could query pg_is_in_recovery(), but that would require database
access, which we got rid of in pg_ctl.
We could check when recovery.conf is gone or recovery.done appears.
This could work, but I think some people are trying to get rid of these
files, so building a feature on that might be a problem? (I think the
latest news on that was that the files might be different in this
hypothetical future but there would still be a file to prevent
re-recovery on restart.)
Other ideas?
We use a trigger file under the covers, so a promotion complete file seems useful also, but crappy.
Perhaps we should have a Server Status file that shows Standby or Master, obviously not updated on crash.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2016-02-17 21:47:50 -0500, Peter Eisentraut wrote: > It would be nice if pg_ctl promote supported the -w (wait) option. > > How could pg_ctl determine when the promotion has finished? How about looking into pg_control? ControlFileData->state ought to have the correct information. Regards, Andres
On 18 February 2016 at 08:33, Andres Freund <andres@anarazel.de> wrote:
--
Hi,
On 2016-02-17 21:47:50 -0500, Peter Eisentraut wrote:
> It would be nice if pg_ctl promote supported the -w (wait) option.
>
> How could pg_ctl determine when the promotion has finished?
How about looking into pg_control? ControlFileData->state ought to have
the correct information.
+1
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Feb 18, 2016 at 5:45 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 18 February 2016 at 08:33, Andres Freund <andres@anarazel.de> wrote: >> >> Hi, >> >> On 2016-02-17 21:47:50 -0500, Peter Eisentraut wrote: >> > It would be nice if pg_ctl promote supported the -w (wait) option. +1 >> > How could pg_ctl determine when the promotion has finished? >> >> How about looking into pg_control? ControlFileData->state ought to have >> the correct information. > > > +1 One concern is that there can be a "time" after the pg_control's state is changed to DB_IN_PRODUCTION and before the server is able to start accepting normal (not read-only) connections. So if users try to start write transaction just after pg_ctl promote -w ends, they might get an error because the server is still in recovery, i.e., the startup process is running. Regards, -- Fujii Masao
On 2/18/16 3:33 AM, Andres Freund wrote: > Hi, > > On 2016-02-17 21:47:50 -0500, Peter Eisentraut wrote: >> It would be nice if pg_ctl promote supported the -w (wait) option. >> >> How could pg_ctl determine when the promotion has finished? > > How about looking into pg_control? ControlFileData->state ought to have > the correct information. Is it safe to read pg_control externally without a lock? pg_controldata will just report a CRC error and proceed, and if you're not sure you can just run it again. But if a promotion fails every so often because of concurrent pg_control writes, that would make this feature annoying.
On 2/19/16 10:06 AM, Fujii Masao wrote: > One concern is that there can be a "time" after the pg_control's state > is changed to DB_IN_PRODUCTION and before the server is able to > start accepting normal (not read-only) connections. So if users try to > start write transaction just after pg_ctl promote -w ends, they might > get an error because the server is still in recovery, i.e., the startup > process is running. I think that window would be acceptable. If not, then the way to deal with it would seem to be to create an entirely new mechanism to communicate with pg_ctl (e.g., a file) and call that at the very end of StartupXLOG(). I'm not sure that that is worth it.
On 2016-02-19 13:48:52 -0500, Peter Eisentraut wrote: > Is it safe to read pg_control externally without a lock? pg_controldata > will just report a CRC error and proceed, and if you're not sure you can > just run it again. But if a promotion fails every so often because of > concurrent pg_control writes, that would make this feature annoying. Yes, the OS should give sufficient guarantees here: If write() is interrupted by a signal before it writes any data, it shall return −1 with errno set to [EINTR]. If write() is interrupted by a signal after it successfully writes some data, it shall return the number of bytes written. If the value of nbyte is greater than {SSIZE_MAX}, the result is implementation-defined. After a write() to a regular file has successfully returned: * Any successful read() from each byte position in the file that was modified by that write shall return the dataspecified by the write() for that position until such byte positions are again modified. We currently assume that all "small" writes succeed in one go (and throw errors if not). Thus the guarantees by read/write are sufficient. Regards, Andres
On 2016-02-20 00:06:09 +0900, Fujii Masao wrote: > One concern is that there can be a "time" after the pg_control's state > is changed to DB_IN_PRODUCTION and before the server is able to > start accepting normal (not read-only) connections. So if users try to > start write transaction just after pg_ctl promote -w ends, they might > get an error because the server is still in recovery, i.e., the startup > process is running. It might make sense to have one more state type, which is used when ending recovery, but before full access is allowed. - Andres
Peter Eisentraut <peter_e@gmx.net> writes: > Is it safe to read pg_control externally without a lock? pg_controldata > will just report a CRC error and proceed, and if you're not sure you can > just run it again. But if a promotion fails every so often because of > concurrent pg_control writes, that would make this feature annoying. Just retry the read till you don't get a CRC error. regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes: > On 2/19/16 10:06 AM, Fujii Masao wrote: >> One concern is that there can be a "time" after the pg_control's state >> is changed to DB_IN_PRODUCTION and before the server is able to >> start accepting normal (not read-only) connections. So if users try to >> start write transaction just after pg_ctl promote -w ends, they might >> get an error because the server is still in recovery, i.e., the startup >> process is running. > I think that window would be acceptable. > If not, then the way to deal with it would seem to be to create an > entirely new mechanism to communicate with pg_ctl (e.g., a file) and > call that at the very end of StartupXLOG(). I'm not sure that that is > worth it. I see no need for an additional mechanism. Just watch pg_control until you see DB_IN_PRODUCTION state there, then switch over to the same connection probing that "pg_ctl start -w" uses. regards, tom lane
On 2016-02-19 15:09:58 -0500, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > On 2/19/16 10:06 AM, Fujii Masao wrote: > >> One concern is that there can be a "time" after the pg_control's state > >> is changed to DB_IN_PRODUCTION and before the server is able to > >> start accepting normal (not read-only) connections. So if users try to > >> start write transaction just after pg_ctl promote -w ends, they might > >> get an error because the server is still in recovery, i.e., the startup > >> process is running. > > > I think that window would be acceptable. > > > If not, then the way to deal with it would seem to be to create an > > entirely new mechanism to communicate with pg_ctl (e.g., a file) and > > call that at the very end of StartupXLOG(). I'm not sure that that is > > worth it. > > I see no need for an additional mechanism. Just watch pg_control until > you see DB_IN_PRODUCTION state there, then switch over to the same > connection probing that "pg_ctl start -w" uses. That's afaics not sufficient if the standby was using hot standby, as that'll let -w succeed immediately, no? Andres
Andres Freund <andres@anarazel.de> writes: > On 2016-02-19 15:09:58 -0500, Tom Lane wrote: >> I see no need for an additional mechanism. Just watch pg_control until >> you see DB_IN_PRODUCTION state there, then switch over to the same >> connection probing that "pg_ctl start -w" uses. > That's afaics not sufficient if the standby was using hot standby, as > that'll let -w succeed immediately, no? Oh, now I get Fujii-san's point. Yes, that wouldn't prove anything about whether you could do a write transaction immediately. regards, tom lane
On 2/19/16 3:09 PM, Tom Lane wrote: > I see no need for an additional mechanism. Just watch pg_control until > you see DB_IN_PRODUCTION state there, then switch over to the same > connection probing that "pg_ctl start -w" uses. Here is a patch set around that idea. The subsequent discussion mentioned that there might still be a window between end of waiting and when read-write queries would be accepted. I don't know how big that window would be in practice and would be interested in some testing and feedback.
Attachment
On Mon, Feb 29, 2016 at 10:30 AM, Peter Eisentraut <peter_e@gmx.net> wrote: > On 2/19/16 3:09 PM, Tom Lane wrote: >> I see no need for an additional mechanism. Just watch pg_control until >> you see DB_IN_PRODUCTION state there, then switch over to the same >> connection probing that "pg_ctl start -w" uses. > > Here is a patch set around that idea. > > The subsequent discussion mentioned that there might still be a window > between end of waiting and when read-write queries would be accepted. I > don't know how big that window would be in practice and would be > interested in some testing and feedback. Here is some input for 0001 (useful taken independently): +$node_primary->append_conf( + "postgresql.conf", qq( +wal_level = hot_standby +max_wal_senders = 2 +wal_keep_segments = 20 +hot_standby = on +) + ); That's more or less allows_streaming => 1 in $node_primary->init. +$node_standby->append_conf( + "recovery.conf", qq( +primary_conninfo='$connstr_primary' +standby_mode=on +recovery_target_timeline='latest' +) + ); Here you could just use enable_streaming => 1 when calling init_from_backup. +$node_standby->command_like(['psql', '-X', '-A', '-t', '-c', 'SELECT pg_is_in_recovery()'], $node_standby->psql instead of a direct command? The result is returned directly with the call to the routine. +$node_standby->command_like(['psql', '-X', '-A', '-t', '-c', 'SELECT pg_is_in_recovery()'], + qr/^f$/, + 'promoted standby is not in recovery'); Once again $node_standby->psql? +sleep 3; # needs a moment to react + +$node_standby->command_like(['psql', '-X', '-A', '-t', '-c', 'SELECT pg_is_in_recovery()'], + qr/^f$/, + 'promoted standby is not in recovery'); sleep() is something we should try to avoid as much as possible in our tests. On slow platforms, take hamster, promote is surely going to take longer than that to be processed by the standby node and put it out of recovery. I would suggest using $node_standby->poll_query_until('SELECT pg_is_in_recovery()') to validate the end of the test. -- Michael
On Mon, Feb 29, 2016 at 4:28 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > I would suggest using > $node_standby->poll_query_until('SELECT pg_is_in_recovery()') to > validate the end of the test. Meh. SELECT NOT pg_is_in_recovery(). This will wait until the query returns true. -- Michael
On Mon, Feb 29, 2016 at 4:29 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Mon, Feb 29, 2016 at 4:28 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> I would suggest using >> $node_standby->poll_query_until('SELECT pg_is_in_recovery()') to >> validate the end of the test. > > Meh. SELECT NOT pg_is_in_recovery(). This will wait until the query > returns true. Here are some comments about 0002 + if ((fd = open(control_file_path, O_RDONLY | PG_BINARY, 0)) == -1) + { + fprintf(stderr, _("%s: could not open file \"%s\" for reading: %s\n"), + progname, control_file_path, strerror(errno)); + exit(1); + } [...] Most of the logic of get_control_dbstate() is a mimic of the recently-introduced get_controlfile() in controldata_utils.c of dc7d70e. I think that we had better use that, and that we had better emit an error should an incorrect control file be found while running those pg_ctl commands as the control file present had better have a correct CRC all the time or something wrong is going on. So this would lead to this logic for get_control_dbstate(): control_file_data = get_controlfile(pg_data, progname); res = control_file_data->state; pfree(control_file_data); Except that, 0002 is a good thing to have, switching from the presence of recovery.conf to what is in the control data file is definitely more robust, a lot of things happen from when recovery.conf is renamed to recovery.done until WAL is enabled for backends, particularly the end of recovery checkpoint and the cleanup of the WAL segments of the previous timeline. And now for 0003... +$node_standby->command_like(['psql', '-X', '-A', '-t', '-c', 'SELECT pg_is_in_recovery()'], + qr/^t$/, + 'standby is in recovery'); [...] +$node_standby->command_like(['psql', '-X', '-A', '-t', '-c', 'SELECT pg_is_in_recovery()'], + qr/^f$/, + 'promoted standby is not in recovery'); for those two you can use $node_standby->psql_safe to get back a query result. > The subsequent discussion mentioned that there might still be a window > between end of waiting and when read-write queries would be accepted. I > don't know how big that window would be in practice and would be > interested in some testing and feedback. And so... Based on the previous discussion, there is an interval of time between the moment the update of the control file is done and the point where backends are allowed to emit WAL. I am really worrying about this interval of time actually, as once pg_ctl exits client applications should be guaranteed to connect to the server but the current patch would not be failure-proof, and I imagine that particularly on CPU-constrained environments this is going to become unstable. Particularly I expect that slow machines are likely going to fail in the last test of 003_promote.pl as designed (I am away from home now so I have not been able to test that unfortunately on my own stuff but that's possible) because pg_is_in_recovery is controlled by SharedRecoveryInProgress, so it may be possible that pg_is_in_recovery() returns false while the control file status is DB_IN_PRODUCTION. The main factor that can contribute to a larger window is a higher number of 2PC transactions that need to be loaded back to shared memory after scanning pg_twophase. If we are going to have a reliable promote wait mode for pg_ctl, I think that we had better first reduce this window, something that could be done is to update SharedRecoveryInProgress while holding an exclusive lock on ControlFileLock, with this flow for example. See for example the patch attached, we can reduce this window to zero for backends if some of them refer to ControlFile in shared memory thanks to ControlFileLock. For clients, there will still be a small window during which backends could write WAL and the control file status is ARCHIVE_RECOVERY on disk. If we want to have a reliable promote wait mode for pg_ctl, I think that we had better do something like the attached first. Thoughts? Looking at where is used the shared memory structure of ControlFile, one thing to worry about is CreateRestartPoint but its code paths are already using ControlFileLock when looking at the status file, so they are safe with this logic. -- Michael
Attachment
Hi Peter, On 3/9/16 3:08 PM, Michael Paquier wrote: > Here are some comments about 0002 <...> > I think that we had better do something like the attached first. > Thoughts? It's been a week since Michael reviewed this patch. Could you please comment on his proposed changes as soon as possible? Thanks, -- -David david@pgmasters.net
On 3/16/16 12:19 PM, David Steele wrote: > Hi Peter, > > On 3/9/16 3:08 PM, Michael Paquier wrote: > >> Here are some comments about 0002 > <...> >> I think that we had better do something like the attached first. >> Thoughts? > > It's been a week since Michael reviewed this patch. Could you please > comment on his proposed changes as soon as possible? Bump. -- -David david@pgmasters.net
On Wed, Mar 23, 2016 at 1:47 AM, David Steele <david@pgmasters.net> wrote: > On 3/16/16 12:19 PM, David Steele wrote: >> Hi Peter, >> >> On 3/9/16 3:08 PM, Michael Paquier wrote: >> >>> Here are some comments about 0002 >> <...> >>> I think that we had better do something like the attached first. >>> Thoughts? >> >> It's been a week since Michael reviewed this patch. Could you please >> comment on his proposed changes as soon as possible? > > Bump. Feature freeze is getting close by. Once the deadline is reached, I guess that we had better return the patch. -- Michael
On 04/07/2016 01:22 AM, Michael Paquier wrote: > On Wed, Mar 23, 2016 at 1:47 AM, David Steele <david@pgmasters.net> wrote: >> On 3/16/16 12:19 PM, David Steele wrote: >>> Hi Peter, >>> >>> On 3/9/16 3:08 PM, Michael Paquier wrote: >>> >>>> Here are some comments about 0002 >>> <...> >>>> I think that we had better do something like the attached first. >>>> Thoughts? >>> >>> It's been a week since Michael reviewed this patch. Could you please >>> comment on his proposed changes as soon as possible? >> >> Bump. > > Feature freeze is getting close by. Once the deadline is reached, I > guess that we had better return the patch. I have closed the patch as returned with feedback. There were a number of interesting points raised that will need some time for consideration and testing.
Updated patch set for pg_ctl promote -w, incorporating AFAICT all of the previous reviews, in particular Michael Paquier's changes to the StartupXLOG() ordering, and rebasing on top of src/common/controldata_utils.c. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Wed, Aug 3, 2016 at 9:35 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > Updated patch set for pg_ctl promote -w, incorporating AFAICT all of the > previous reviews, in particular Michael Paquier's changes to the > StartupXLOG() ordering, and rebasing on top of > src/common/controldata_utils.c. Glad to see a new set of patches here: 1) From 0001 - ok($result, "@$cmd exit code 0"); - is($stderr, '', "@$cmd no stderr"); + ok($result, "$test_name: exit code 0"); + is($stderr, '', "$test_name: no stderr"); Okay with that, there is anyway a log mentioning the command anyway. Perhaps you'd want to backpatch that? 2) From 0002: +command_fails_like([ 'pg_ctl', 'promote', '-D', "$tempdir/nonexistent" ], Any code using src/port/getopt_long.c (Windows!) is going to fail on that. The argument that is not preceded by an option name needs to be put at the end of the command. So for example this command needs to be changed to that: [ 'pg_ctl', '-D', "$tempdir/nonexistent", 'promote' ] +$node_standby->poll_query_until('postgres', 'SELECT NOT pg_is_in_recovery()'); + +is($node_standby->safe_psql('postgres', 'SELECT pg_is_in_recovery()'), + 'f', 'promoted standby is not in recovery'); We could just check that poll_query_until returns 1 here. This would save running one query. 3) From 0003 In do_stop, this patches makes the wait happen for a maximum of wait_seconds * 2, once when getting the control file information, and once when waiting for the server to shut down. This is not a good idea, and the idea of putting a wait argument in get_controlfile does not seem a good interface to me. I'd rather see get_controlfile be extended with a flag saying no_error_on_failure and keep the wait logic within pg_ctl. The flag would do the following when enabled: - for frontends, do not issue a WARNING message and return NULL instead. - for backends, do not issue an ERROR and return NULL instead. 4) Patch 0004, no real comments :) I am glad you picked up this idea. 5) Patch 0005: Looks good to me. I just noticed that similar to 0002 regarding the ordering of those arguments: +command_ok([ 'pg_ctl', 'promote', '-w', '-D', $node_standby->data_dir ], + 'pg_ctl promote -w of standby runs'); Thanks, -- Michael
On 8/5/16 12:14 AM, Michael Paquier wrote: > In do_stop, this patches makes the wait happen for a maximum of > wait_seconds * 2, once when getting the control file information, and > once when waiting for the server to shut down. That's not how I read it. get_controlfile() will decrease the wait_seconds argument by how much wait time it has used. The wait for shutdown will then only use as much seconds as are left. > This is not a good > idea, and the idea of putting a wait argument in get_controlfile does > not seem a good interface to me. I'd rather see get_controlfile be > extended with a flag saying no_error_on_failure and keep the wait > logic within pg_ctl. I guess we could write a wrapper function in pg_ctl that encapsulated the wait logic. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Aug 6, 2016 at 10:43 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 8/5/16 12:14 AM, Michael Paquier wrote: >> In do_stop, this patches makes the wait happen for a maximum of >> wait_seconds * 2, once when getting the control file information, and >> once when waiting for the server to shut down. > > That's not how I read it. get_controlfile() will decrease the > wait_seconds argument by how much wait time it has used. The wait for > shutdown will then only use as much seconds as are left. Ah, right. The reference to wait_seconds gets decremented. >> This is not a good >> idea, and the idea of putting a wait argument in get_controlfile does >> not seem a good interface to me. I'd rather see get_controlfile be >> extended with a flag saying no_error_on_failure and keep the wait >> logic within pg_ctl. > > I guess we could write a wrapper function in pg_ctl that encapsulated > the wait logic. That's what I would do. -- Michael
On 8/7/16 9:44 PM, Michael Paquier wrote: >>> This is not a good >>> >> idea, and the idea of putting a wait argument in get_controlfile does >>> >> not seem a good interface to me. I'd rather see get_controlfile be >>> >> extended with a flag saying no_error_on_failure and keep the wait >>> >> logic within pg_ctl. >> > >> > I guess we could write a wrapper function in pg_ctl that encapsulated >> > the wait logic. > That's what I would do. New patches, incorporating your suggestions. I moved some of the error handling out of get_controlfile() and back into the callers, because it was getting too weird that that function knew so much about the callers' intentions. That way we don't actually have to change the signature. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Thu, Aug 11, 2016 at 3:24 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 8/7/16 9:44 PM, Michael Paquier wrote: >>>> This is not a good >>>> >> idea, and the idea of putting a wait argument in get_controlfile does >>>> >> not seem a good interface to me. I'd rather see get_controlfile be >>>> >> extended with a flag saying no_error_on_failure and keep the wait >>>> >> logic within pg_ctl. >>> > >>> > I guess we could write a wrapper function in pg_ctl that encapsulated >>> > the wait logic. >> That's what I would do. > > New patches, incorporating your suggestions. Thanks for the new set! > I moved some of the error handling out of get_controlfile() and back > into the callers, because it was getting too weird that that function > knew so much about the callers' intentions. That way we don't actually > have to change the signature. I have looked at them and the changes are looking fine for me. So I have switched the patch as ready for committer, aka you. Just a nit: + if (wait_seconds > 0) + { + sleep(1); + wait_seconds--; + continue; + } This may be better this pg_usleep() instead of sleep(). -- Michael
On 8/11/16 9:28 AM, Michael Paquier wrote: > I have looked at them and the changes are looking fine for me. So I > have switched the patch as ready for committer, aka you. > > Just a nit: > + if (wait_seconds > 0) > + { > + sleep(1); > + wait_seconds--; > + continue; > + } > This may be better this pg_usleep() instead of sleep(). Committed with that adjustment. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Sep 22, 2016 at 1:25 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 8/11/16 9:28 AM, Michael Paquier wrote: >> I have looked at them and the changes are looking fine for me. So I >> have switched the patch as ready for committer, aka you. >> >> Just a nit: >> + if (wait_seconds > 0) >> + { >> + sleep(1); >> + wait_seconds--; >> + continue; >> + } >> This may be better this pg_usleep() instead of sleep(). > > Committed with that adjustment. > Commit e7010ce seems to forget to change help message of pg_ctl. Attached patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Fri, Sep 23, 2016 at 12:16 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Thu, Sep 22, 2016 at 1:25 AM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> On 8/11/16 9:28 AM, Michael Paquier wrote: >> Committed with that adjustment. Thanks... > Commit e7010ce seems to forget to change help message of pg_ctl. > Attached patch. And right, we missed that bit. -- Michael
On 9/22/16 11:16 AM, Masahiko Sawada wrote: > Commit e7010ce seems to forget to change help message of pg_ctl. > Attached patch. Fixed, thanks. I also added the -t option and reformatted a bit. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services