Thread: Function to promote standby servers
Providing SQL access for administrative tasks seems to be a good thing, see ALTER SYSTEM and pg_reload_conf(). In that vein, I propose a function pg_promote() to promote physical standby servers. If there are no fundamental objections, I'll add it to the next commitfest. Yours, Laurenz Albe
Attachment
On Thu, Sep 20, 2018 at 07:59:02AM +0200, Laurenz Albe wrote: > Providing SQL access for administrative tasks seems to be a > good thing, see ALTER SYSTEM and pg_reload_conf(). > > In that vein, I propose a function pg_promote() to promote > physical standby servers. No fundamental issues from me regarding the concept of being able to trigger a promotion remotely, so +1. Do we want this capability as well for fallback_promote? My gut tells me no, and that we ought to drop this option at some point in the future, still that's worth mentioning. You may want to move PROMOTE_SIGNAL_FILE in a position where xlogfuncs.c could use it. -- Michael
Attachment
Michael Paquier wrote: > > In that vein, I propose a function pg_promote() to promote > > physical standby servers. > > No fundamental issues from me regarding the concept of being able to > trigger a promotion remotely, so +1. Do we want this capability as well > for fallback_promote? My gut tells me no, and that we ought to drop > this option at some point in the future, still that's worth mentioning. > > You may want to move PROMOTE_SIGNAL_FILE in a position where xlogfuncs.c > could use it. Good, idea; updated patch attached. Yours, Laurenz Albe
Attachment
On Thu, Oct 4, 2018 at 8:26 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > Michael Paquier wrote: > > > In that vein, I propose a function pg_promote() to promote > > > physical standby servers. +1 > > > > No fundamental issues from me regarding the concept of being able to > > trigger a promotion remotely, so +1. Do we want this capability as well > > for fallback_promote? My gut tells me no, and that we ought to drop > > this option at some point in the future, still that's worth mentioning. > > > > You may want to move PROMOTE_SIGNAL_FILE in a position where xlogfuncs.c > > could use it. > > Good, idea; updated patch attached. > Maybe the patch needs regression tests for the new function. And I'd suggest to make the function name more clear by changing to pg_promote_server(), pg_promote_standby() and so on. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Masahiko Sawada wrote: > Maybe the patch needs regression tests for the new function. And I'd > suggest to make the function name more clear by changing to > pg_promote_server(), pg_promote_standby() and so on. Thanks for the review. The attached patch has regression tests - I though it would be good to change some of the existing tests that run standby promotion to use the SQL function instead of pg_ctl. I have left the name though -- as far as I can tell, "promote" has no other meaning in PostgreSQL than standby promotion, and I believe it is only good to be more verbose if that avoids confusion. Yours, Laurenz Albe
Masahiko Sawada wrote: > Maybe the patch needs regression tests for the new function. And I'd > suggest to make the function name more clear by changing to > pg_promote_server(), pg_promote_standby() and so on. Thanks for the review. The attached patch has regression tests - I though it would be good to change some of the existing tests that run standby promotion to use the SQL function instead of pg_ctl. I have left the name though -- as far as I can tell, "promote" has no other meaning in PostgreSQL than standby promotion, and I believe it is only good to be more verbose if that avoids confusion. Yours, Laurenz Albe
Attachment
On Mon, Oct 08, 2018 at 07:36:50PM +0200, Laurenz Albe wrote: > The attached patch has regression tests - I though it would be good > to change some of the existing tests that run standby promotion to > use the SQL function instead of pg_ctl. > > I have left the name though -- as far as I can tell, "promote" has > no other meaning in PostgreSQL than standby promotion, and I believe > it is only good to be more verbose if that avoids confusion. I am fine with pg_promote. > diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c > index 7375a78ffc..3a1f49e83a 100644 > --- a/src/backend/access/transam/xlog.c > +++ b/src/backend/access/transam/xlog.c > @@ -81,8 +81,6 @@ extern uint32 bootstrap_data_checksum_version; > /* File path names (all relative to $PGDATA) */ > #define RECOVERY_COMMAND_FILE "recovery.conf" > #define RECOVERY_COMMAND_DONE "recovery.done" > -#define PROMOTE_SIGNAL_FILE "promote" > -#define FALLBACK_PROMOTE_SIGNAL_FILE "fallback_promote" Perhaps we could just move the whole set to xlog.h. > +Datum > +pg_promote(PG_FUNCTION_ARGS) > +{ > + FILE *promote_file; > + > + if (!superuser()) > + ereport(ERROR, > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > + errmsg("must be superuser to promote standby servers"))); Let's remove this restriction at code level, and instead use function-level ACLs so as it is possible to allow non-superusers to trigger a promotion as well, say a dedicated role. We could add a system role for this purpose, but I am not sure that it is worth the addition yet. > + while ($self->safe_psql('postgres', "SELECT pg_is_in_recovery()") != 'f') > + { > + sleep 1; > + } > + return; This could go on infinitely, locking down buildfarm machines silently if a commit is not careful. What I would suggest to do instead is to not touch PostgresNode.pm at all, and to just modify one test to trigger promotion with the SQL function. Then from the test, you should add a comment that triggerring promotion from SQL is wanted instead of the internal routine, and then please add a call to poll_query_until() in the same way as what 6deb52b2 has removed. As of now, this function returns true if promotion has been triggered, and false if not. However it seems to me that we should have something more consistent with "pg_ctl promote", so there are actually three possible states: 1) Node is in recovery, with promotion triggered. pg_promote() returns true in case of success in creating the trigger file. 2) Node is in recovery, with promotion triggered. pg_promote() returns false in case of failure in creating the trigger file. 3) Node is not in recovery, where I think a call to pg_promote should trigger an error. This is not handled in the patch. An other thing which has value is to implement a "wait" mode for this feature, or actually a nowait mode. You could simply do what pg_ctl does with its logic in get_control_dbstate() by looking at the control file state. I think that waiting for the promotion to happen should be the default. At the end this patch needs a bit more work. -- Michael
Attachment
On Mon, Oct 08, 2018 at 07:36:50PM +0200, Laurenz Albe wrote: > The attached patch has regression tests - I though it would be good > to change some of the existing tests that run standby promotion to > use the SQL function instead of pg_ctl. > > I have left the name though -- as far as I can tell, "promote" has > no other meaning in PostgreSQL than standby promotion, and I believe > it is only good to be more verbose if that avoids confusion. I am fine with pg_promote. > diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c > index 7375a78ffc..3a1f49e83a 100644 > --- a/src/backend/access/transam/xlog.c > +++ b/src/backend/access/transam/xlog.c > @@ -81,8 +81,6 @@ extern uint32 bootstrap_data_checksum_version; > /* File path names (all relative to $PGDATA) */ > #define RECOVERY_COMMAND_FILE "recovery.conf" > #define RECOVERY_COMMAND_DONE "recovery.done" > -#define PROMOTE_SIGNAL_FILE "promote" > -#define FALLBACK_PROMOTE_SIGNAL_FILE "fallback_promote" Perhaps we could just move the whole set to xlog.h. > +Datum > +pg_promote(PG_FUNCTION_ARGS) > +{ > + FILE *promote_file; > + > + if (!superuser()) > + ereport(ERROR, > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > + errmsg("must be superuser to promote standby servers"))); Let's remove this restriction at code level, and instead use function-level ACLs so as it is possible to allow non-superusers to trigger a promotion as well, say a dedicated role. We could add a system role for this purpose, but I am not sure that it is worth the addition yet. > + while ($self->safe_psql('postgres', "SELECT pg_is_in_recovery()") != 'f') > + { > + sleep 1; > + } > + return; This could go on infinitely, locking down buildfarm machines silently if a commit is not careful. What I would suggest to do instead is to not touch PostgresNode.pm at all, and to just modify one test to trigger promotion with the SQL function. Then from the test, you should add a comment that triggerring promotion from SQL is wanted instead of the internal routine, and then please add a call to poll_query_until() in the same way as what 6deb52b2 has removed. As of now, this function returns true if promotion has been triggered, and false if not. However it seems to me that we should have something more consistent with "pg_ctl promote", so there are actually three possible states: 1) Node is in recovery, with promotion triggered. pg_promote() returns true in case of success in creating the trigger file. 2) Node is in recovery, with promotion triggered. pg_promote() returns false in case of failure in creating the trigger file. 3) Node is not in recovery, where I think a call to pg_promote should trigger an error. This is not handled in the patch. An other thing which has value is to implement a "wait" mode for this feature, or actually a nowait mode. You could simply do what pg_ctl does with its logic in get_control_dbstate() by looking at the control file state. I think that waiting for the promotion to happen should be the default. At the end this patch needs a bit more work. -- Michael
Michael Paquier wrote: > > diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c > > index 7375a78ffc..3a1f49e83a 100644 > > --- a/src/backend/access/transam/xlog.c > > +++ b/src/backend/access/transam/xlog.c > > @@ -81,8 +81,6 @@ extern uint32 bootstrap_data_checksum_version; > > /* File path names (all relative to $PGDATA) */ > > #define RECOVERY_COMMAND_FILE "recovery.conf" > > #define RECOVERY_COMMAND_DONE "recovery.done" > > -#define PROMOTE_SIGNAL_FILE "promote" > > -#define FALLBACK_PROMOTE_SIGNAL_FILE "fallback_promote" > > Perhaps we could just move the whole set to xlog.h. Done. > > +Datum > > +pg_promote(PG_FUNCTION_ARGS) > > +{ > > + FILE *promote_file; > > + > > + if (!superuser()) > > + ereport(ERROR, > > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > > + errmsg("must be superuser to promote standby servers"))); > > Let's remove this restriction at code level, and instead use > function-level ACLs so as it is possible to allow non-superusers to > trigger a promotion as well, say a dedicated role. We could add a > system role for this purpose, but I am not sure that it is worth the > addition yet. Agreed, done. > > + while ($self->safe_psql('postgres', "SELECT pg_is_in_recovery()") != 'f') > > + { > > + sleep 1; > > + } > > + return; > > This could go on infinitely, locking down buildfarm machines silently if > a commit is not careful. What I would suggest to do instead is to not > touch PostgresNode.pm at all, and to just modify one test to trigger > promotion with the SQL function. Then from the test, you should add a > comment that triggerring promotion from SQL is wanted instead of the > internal routine, and then please add a call to poll_query_until() in > the same way as what 6deb52b2 has removed. I have modified recovery/t/004_timeline_switch.pl and recovery/t/009_twophase.pl accordingly: one calls the function with "wait => true", the other uses "wait => false" and waits as you suggested. > As of now, this function returns true if promotion has been triggered, > and false if not. However it seems to me that we should have something > more consistent with "pg_ctl promote", so there are actually three > possible states: > 1) Node is in recovery, with promotion triggered. pg_promote() returns > true in case of success in creating the trigger file. > 2) Node is in recovery, with promotion triggered. pg_promote() returns > false in case of failure in creating the trigger file. > 3) Node is not in recovery, where I think a call to pg_promote should > trigger an error. This is not handled in the patch. So far I had returned "false" in the last case, but I am fine with throwing an error in that case. Done. > An other thing which has value is to implement a "wait" mode for this > feature, or actually a nowait mode. You could simply do what pg_ctl > does with its logic in get_control_dbstate() by looking at the control > file state. I think that waiting for the promotion to happen should be > the default. I have implemented that. > At the end this patch needs a bit more work. Yes, it did. Thanks for the thorough review! Yours, Laurenz Albe
Attachment
Just curious to know, is promotion via function only allowed for hot-standby or works for any standby?
On Mon, Oct 15, 2018 at 7:16 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Michael Paquier wrote:
> > diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> > index 7375a78ffc..3a1f49e83a 100644
> > --- a/src/backend/access/transam/xlog.c
> > +++ b/src/backend/access/transam/xlog.c
> > @@ -81,8 +81,6 @@ extern uint32 bootstrap_data_checksum_version;
> > /* File path names (all relative to $PGDATA) */
> > #define RECOVERY_COMMAND_FILE "recovery.conf"
> > #define RECOVERY_COMMAND_DONE "recovery.done"
> > -#define PROMOTE_SIGNAL_FILE "promote"
> > -#define FALLBACK_PROMOTE_SIGNAL_FILE "fallback_promote"
>
> Perhaps we could just move the whole set to xlog.h.
Done.
> > +Datum
> > +pg_promote(PG_FUNCTION_ARGS)
> > +{
> > + FILE *promote_file;
> > +
> > + if (!superuser())
> > + ereport(ERROR,
> > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> > + errmsg("must be superuser to promote standby servers")));
>
> Let's remove this restriction at code level, and instead use
> function-level ACLs so as it is possible to allow non-superusers to
> trigger a promotion as well, say a dedicated role. We could add a
> system role for this purpose, but I am not sure that it is worth the
> addition yet.
Agreed, done.
> > + while ($self->safe_psql('postgres', "SELECT pg_is_in_recovery()") != 'f')
> > + {
> > + sleep 1;
> > + }
> > + return;
>
> This could go on infinitely, locking down buildfarm machines silently if
> a commit is not careful. What I would suggest to do instead is to not
> touch PostgresNode.pm at all, and to just modify one test to trigger
> promotion with the SQL function. Then from the test, you should add a
> comment that triggerring promotion from SQL is wanted instead of the
> internal routine, and then please add a call to poll_query_until() in
> the same way as what 6deb52b2 has removed.
I have modified recovery/t/004_timeline_switch.pl and recovery/t/009_twophase.pl
accordingly: one calls the function with "wait => true", the other
uses "wait => false" and waits as you suggested.
> As of now, this function returns true if promotion has been triggered,
> and false if not. However it seems to me that we should have something
> more consistent with "pg_ctl promote", so there are actually three
> possible states:
> 1) Node is in recovery, with promotion triggered. pg_promote() returns
> true in case of success in creating the trigger file.
> 2) Node is in recovery, with promotion triggered. pg_promote() returns
> false in case of failure in creating the trigger file.
> 3) Node is not in recovery, where I think a call to pg_promote should
> trigger an error. This is not handled in the patch.
So far I had returned "false" in the last case, but I am fine with
throwing an error in that case. Done.
> An other thing which has value is to implement a "wait" mode for this
> feature, or actually a nowait mode. You could simply do what pg_ctl
> does with its logic in get_control_dbstate() by looking at the control
> file state. I think that waiting for the promotion to happen should be
> the default.
I have implemented that.
> At the end this patch needs a bit more work.
Yes, it did. Thanks for the thorough review!
Yours,
Laurenz Albe
Ashwin Agrawal wrote: > Just curious to know, is promotion via function only allowed for > hot-standby or works for any standby? Only for hot standby - how do you want to execute a function on a standby server that doesn't allow connections? Yours, Laurenz Albe
On Tue, Oct 16, 2018 at 09:49:20AM +0200, Laurenz Albe wrote: > Only for hot standby - how do you want to execute a function on a standby > server that doesn't allow connections? In most deployments hot standby will be enabled, and wal_level uses the same value for archive and hot_standby for some time now, so that does not sound like a huge issue to me. -- Michael
Attachment
On Mon, Oct 15, 2018 at 04:16:02PM +0200, Laurenz Albe wrote: > Michael Paquier wrote: >> Let's remove this restriction at code level, and instead use >> function-level ACLs so as it is possible to allow non-superusers to >> trigger a promotion as well, say a dedicated role. We could add a >> system role for this purpose, but I am not sure that it is worth the >> addition yet. > > Agreed, done. >> As of now, this function returns true if promotion has been triggered, >> and false if not. However it seems to me that we should have something >> more consistent with "pg_ctl promote", so there are actually three >> possible states: >> 1) Node is in recovery, with promotion triggered. pg_promote() returns >> true in case of success in creating the trigger file. >> 2) Node is in recovery, with promotion triggered. pg_promote() returns >> false in case of failure in creating the trigger file. >> 3) Node is not in recovery, where I think a call to pg_promote should >> trigger an error. This is not handled in the patch. > > So far I had returned "false" in the last case, but I am fine with > throwing an error in that case. Done. Thanks, that looks correct to me. I think that consistency with pg_ctl is quite important, as this is a feature present for many releases now. >> At the end this patch needs a bit more work. > > Yes, it did. Thanks for the thorough review! + /* wait for up to a minute for promotion */ + for (i = 0; i < WAITS_PER_SECOND * WAIT_SECONDS; ++i) + { + if (!RecoveryInProgress()) + PG_RETURN_BOOL(true); + + pg_usleep(1000000L / WAITS_PER_SECOND); + } I would recommend to avoid pg_usleep and instead use a WaitLatch() or similar to generate a wait event. The wait can then also be seen in pg_stat_activity, which is useful for monitoring purposes. Using RecoveryInProgress is indeed doable, and that's more simple than what I thought first. Something I missed to mention in the previous review: the timeout should be manually enforceable, with a default at 60s. Is the function marked as strict? Per the code it should be, I am not able to test now though. You are missing REVOKE EXECUTE ON FUNCTION pg_promote() in system_views.sql, or any users could trigger a promotion, no? -- Michael
Attachment
Michael Paquier wrote: > + /* wait for up to a minute for promotion */ > + for (i = 0; i < WAITS_PER_SECOND * WAIT_SECONDS; ++i) > + { > + if (!RecoveryInProgress()) > + PG_RETURN_BOOL(true); > + > + pg_usleep(1000000L / WAITS_PER_SECOND); > + } > I would recommend to avoid pg_usleep and instead use a WaitLatch() or > similar to generate a wait event. The wait can then also be seen in > pg_stat_activity, which is useful for monitoring purposes. Using > RecoveryInProgress is indeed doable, and that's more simple than what I > thought first. Agreed, done. I have introduced a new wait event, because I couldn't find one that fit. > Something I missed to mention in the previous review: the timeout should > be manually enforceable, with a default at 60s. Ok, added as a new parameter "wait_seconds". > Is the function marked as strict? Per the code it should be, I am not > able to test now though. Yes, it is. > You are missing REVOKE EXECUTE ON FUNCTION pg_promote() in > system_views.sql, or any users could trigger a promotion, no? You are right *blush*. Fixed. Yours, Laurenz Albe
Attachment
I wrote: > Fixed. Here is another version, with a fix in pg_proc.dat, an improved comment and "wait_seconds" exercised in the regression test. Yours, Laurenz Albe
Attachment
On Sat, Oct 20, 2018 at 06:24:28AM +0200, Laurenz Albe wrote: > Here is another version, with a fix in pg_proc.dat, an improved comment > and "wait_seconds" exercised in the regression test. Thanks for the new version. This looks pretty good to me. I'll see if I can review it once and then commit. > - WAIT_EVENT_SYNC_REP > + WAIT_EVENT_SYNC_REP, > + WAIT_EVENT_PROMOTE > } WaitEventIPC; Those are kept in alphabetical order. Individual wait events are also documented with a description. -- Michael
Attachment
On Sat, Oct 20, 2018 at 01:48:56PM +0900, Michael Paquier wrote: > On Sat, Oct 20, 2018 at 06:24:28AM +0200, Laurenz Albe wrote: >> Here is another version, with a fix in pg_proc.dat, an improved comment >> and "wait_seconds" exercised in the regression test. > > Thanks for the new version. This looks pretty good to me. I'll see if > I can review it once and then commit. > >> - WAIT_EVENT_SYNC_REP >> + WAIT_EVENT_SYNC_REP, >> + WAIT_EVENT_PROMOTE >> } WaitEventIPC; > > Those are kept in alphabetical order. Individual wait events are also > documented with a description. Regarding the documentation, wouldn't it be more adapted to list the new function under the section "Recovery Control Functions"? Not only does the new function signal the postmaster, but it also creates the promotion trigger file. Anyway, I have been looking in depth at the patch, and I finish with the attached. Here are some notes: - pg_ctl returns an error if it cannot create the promote trigger file and if it doesn't close it. pg_promote should do the same. - WL_TIMEOUT could have been reached, leading to no further retries (remove for example the part related to the trigger file creation and try to trigger the promotion, the wait time is incorrect). - Documentation has been reworked. - The new wait event is documented. - a WARNING is returned if the signal to the postmaster is not sent, which is something I think makes most sense as we do the same for other signals. - position including unistd.h was not correct in xlogfuncs.c. - Timeouts for the tests are made a tad longer. Some buildfarm machines don't like a promotion wait of 10s. - a catalog version bump is included, just a personal reminder.. - Indentatio has been run. Laurenz, what do you think? -- Michael
Attachment
On Mon, Oct 22, 2018 at 3:01 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Sat, Oct 20, 2018 at 01:48:56PM +0900, Michael Paquier wrote: > > On Sat, Oct 20, 2018 at 06:24:28AM +0200, Laurenz Albe wrote: > >> Here is another version, with a fix in pg_proc.dat, an improved comment > >> and "wait_seconds" exercised in the regression test. > > > > Thanks for the new version. This looks pretty good to me. I'll see if > > I can review it once and then commit. > > > >> - WAIT_EVENT_SYNC_REP > >> + WAIT_EVENT_SYNC_REP, > >> + WAIT_EVENT_PROMOTE > >> } WaitEventIPC; > > > > Those are kept in alphabetical order. Individual wait events are also > > documented with a description. > > Regarding the documentation, wouldn't it be more adapted to list the new > function under the section "Recovery Control Functions"? Not only does > the new function signal the postmaster, but it also creates the > promotion trigger file. > > Anyway, I have been looking in depth at the patch, and I finish with the > attached. Here are some notes: > - pg_ctl returns an error if it cannot create the promote trigger file > and if it doesn't close it. pg_promote should do the same. > - WL_TIMEOUT could have been reached, leading to no further retries > (remove for example the part related to the trigger file creation and > try to trigger the promotion, the wait time is incorrect). > - Documentation has been reworked. > - The new wait event is documented. > - a WARNING is returned if the signal to the postmaster is not sent, > which is something I think makes most sense as we do the same for other > signals. > - position including unistd.h was not correct in xlogfuncs.c. > - Timeouts for the tests are made a tad longer. Some buildfarm machines > don't like a promotion wait of 10s. > - a catalog version bump is included, just a personal reminder.. > - Indentatio has been run. > Thank you for workig on this. There is one review comment for the latest patch. + if (FreeFile(promote_file)) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not write file \"%s\": %m", + PROMOTE_SIGNAL_FILE))); Maybe we should unlink PROMOTE_SIGNAL_FILE before erroring. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Masahiko Sawada wrote: > On Mon, Oct 22, 2018 at 3:01 PM Michael Paquier <michael@paquier.xyz> wrote: > > Regarding the documentation, wouldn't it be more adapted to list the new > > function under the section "Recovery Control Functions"? Not only does > > the new function signal the postmaster, but it also creates the > > promotion trigger file. Hmm, yes, that's probably the first place where people would look. I guess the implementation lead me to categorize it as a "signaling function", and it also wouldn't be wrong there. > > Anyway, I have been looking in depth at the patch, and I finish with the > > attached. Here are some notes: > > [...] > > - WL_TIMEOUT could have been reached, leading to no further retries > > (remove for example the part related to the trigger file creation and > > try to trigger the promotion, the wait time is incorrect). Ok. > > - Documentation has been reworked. > > - The new wait event is documented. Thanks. > > - position including unistd.h was not correct in xlogfuncs.c. > > - Timeouts for the tests are made a tad longer. Some buildfarm machines > > don't like a promotion wait of 10s. > > - a catalog version bump is included, just a personal reminder.. > > - Indentatio has been run. Thanks. > Thank you for workig on this. There is one review comment for the latest patch. > > + if (FreeFile(promote_file)) > + ereport(ERROR, > + (errcode_for_file_access(), > + errmsg("could not write file \"%s\": %m", > + PROMOTE_SIGNAL_FILE))); > > Maybe we should unlink PROMOTE_SIGNAL_FILE before erroring. Yes, that cannot hurt. Yours, Laurenz Albe
On Mon, Oct 22, 2018 at 11:45:30AM +0200, Laurenz Albe wrote: > Masahiko Sawada wrote: >> Thank you for workig on this. There is one review comment for the latest patch. >> >> + if (FreeFile(promote_file)) >> + ereport(ERROR, >> + (errcode_for_file_access(), >> + errmsg("could not write file \"%s\": %m", >> + PROMOTE_SIGNAL_FILE))); >> >> Maybe we should unlink PROMOTE_SIGNAL_FILE before erroring. > > Yes, that cannot hurt. If FreeFile() fails, unlink() would most likely fail for the same reason. Please note that if unlink() happens before issuing the ERROR, saving errno would be necessary. That's not a huge issue anyway, if a failure happens, the operator would retry the operation. If there is a crash, the file gets removed at the end of recovery. If there are no objections, I'll look at this patch again by the end of this week in order to get it committed. -- Michael
Attachment
Michael Paquier wrote: > If there are no > objections, I'll look at this patch again by the end of this week in > order to get it committed. No objections from me; on the contrary, I would like to thank you for your effort here. I appreciate that you probably spent more time tutoring me than it would have taken you to write this yourself. Yours, Laurenz Albe
On Tue, Oct 23, 2018 at 09:42:16AM +0200, Laurenz Albe wrote: > No objections from me; on the contrary, I would like to thank you for > your effort here. I appreciate that you probably spent more time > tutoring me than it would have taken you to write this yourself. You're welcome. Happy that it helped. -- Michael
Attachment
On Wed, Oct 24, 2018 at 08:50:43AM +0900, Michael Paquier wrote: > On Tue, Oct 23, 2018 at 09:42:16AM +0200, Laurenz Albe wrote: > > No objections from me; on the contrary, I would like to thank you for > > your effort here. I appreciate that you probably spent more time > > tutoring me than it would have taken you to write this yourself. > > You're welcome. Happy that it helped. And committed. I double-checked the code, and tweaked a bit the tests so as the test using wait_mode = false is removed as it did not seem worth the extra cycles. I also added a check on the return value of pg_promote when using the wait mode. Another thing was that the function needs to be parallel-restricted. -- Michael
Attachment
Hi On 10/25/2018 09:47 AM, Michael Paquier wrote: > And committed. I double-checked the code, and tweaked a bit the tests > so as the test using wait_mode = false is removed as it did not seem > worth the extra cycles. I also added a check on the return value of > pg_promote when using the wait mode. Another thing was that the > function needs to be parallel-restricted. Documentation for this [*] says "Returns true if promotion is successful and false otherwise", which is not correct if "wait" is false, as it will always return TRUE. [*] https://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL Attached patch contains a suggested rewording to clarify this. Regards Ian Barwick -- Ian Barwick http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Hi On 10/25/2018 09:47 AM, Michael Paquier wrote: > And committed. I double-checked the code, and tweaked a bit the tests > so as the test using wait_mode = false is removed as it did not seem > worth the extra cycles. I also added a check on the return value of > pg_promote when using the wait mode. Another thing was that the > function needs to be parallel-restricted. Documentation for this [*] says "Returns true if promotion is successful and false otherwise", which is not correct if "wait" is false, as it will always return TRUE. [*] https://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL Attached patch contains a suggested rewording to clarify this. Regards Ian Barwick -- Ian Barwick http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Fri, Oct 26, 2018 at 11:36:00AM +0900, Ian Barwick wrote: > Documentation for this [*] says "Returns true if promotion is > successful and false otherwise", which is not correct if "wait" is > false, as it will always return TRUE. Yes, in the case where the promotion has been initiated. > + <literal>false</literal> otherwise. If <parameter>wait</parameter> > + is set to <literal>false</literal>, the function returns <literal>true</literal> > + immediately after sending the promotion signal to the postmaster. Or we could use "the function returns true immediately after initiating the promotion by sending the promotion signal to the postmaster"? As a native speaker which one feels more natural to you? -- Michael
Attachment
On Fri, Oct 26, 2018 at 01:51:24PM +0900, Michael Paquier wrote: > Or we could use "the function returns true immediately after initiating > the promotion by sending the promotion signal to the postmaster"? As a > native speaker which one feels more natural to you? Sorry for the time it took. After pondering on it, I have committed the improvements from your version. -- Michael
Attachment
On 11/19/2018 01:22 PM, Michael Paquier wrote: > On Fri, Oct 26, 2018 at 01:51:24PM +0900, Michael Paquier wrote: >> Or we could use "the function returns true immediately after initiating >> the promotion by sending the promotion signal to the postmaster"? As a >> native speaker which one feels more natural to you? > > Sorry for the time it took. After pondering on it, I have committed the > improvements from your version. Thanks, looks good (and apologies for the delay in responding from my side). Regards Ian Barwick -- Ian Barwick http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Nov 28, 2018 at 10:06:34AM +0900, Ian Barwick wrote: > Thanks, looks good (and apologies for the delay in responding from my > side). Thanks for double-checking, Ian. I took my time as well ;) (Hope to see you face-to-face in a couple of days around Akihabara.) -- Michael