Thread: "pg_ctl promote" exit status
Hello The "pg_ctl promote" command returns an exit code of 1 when the server is not in standby mode, and the same exit code of 1 when the server isn't started at all. The only difference at the time being is the string output at the time, which FYI are... pg_ctl: cannot promote server; server is not in standby mode ...and... pg_ctl: PID file "/var/lib/pgsql/9.1/data/postmaster.pid" does not exist Is server running? ...respectively. I am in the process of developing a clustering solution around luci and rgmanager (in Red Hat EL 6) and for the time being, am basing it off the string output. Maybe each different exit reason should have a unique exit code, whatever my logic and approach to solving this problem be? Thanks
On Tue, Oct 23, 2012 at 6:39 AM, Dhruv Ahuja <dhruvahuja@gmail.com> wrote: > The "pg_ctl promote" command returns an exit code of 1 when the server > is not in standby mode, and the same exit code of 1 when the server > isn't started at all. The only difference at the time being is the > string output at the time, which FYI are... > > pg_ctl: cannot promote server; server is not in standby mode > > ...and... > > pg_ctl: PID file "/var/lib/pgsql/9.1/data/postmaster.pid" does not exist > Is server running? > > ...respectively. > > I am in the process of developing a clustering solution around luci > and rgmanager (in Red Hat EL 6) and for the time being, am basing it > off the string output. Maybe each different exit reason should have a > unique exit code, whatever my logic and approach to solving this > problem be? That doesn't seem like a bad idea. Got a patch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Oct 23, 2012 at 12:29:11PM -0400, Robert Haas wrote: > On Tue, Oct 23, 2012 at 6:39 AM, Dhruv Ahuja <dhruvahuja@gmail.com> wrote: > > The "pg_ctl promote" command returns an exit code of 1 when the server > > is not in standby mode, and the same exit code of 1 when the server > > isn't started at all. The only difference at the time being is the > > string output at the time, which FYI are... > > > > pg_ctl: cannot promote server; server is not in standby mode > > > > ...and... > > > > pg_ctl: PID file "/var/lib/pgsql/9.1/data/postmaster.pid" does not exist > > Is server running? > > > > ...respectively. > > > > I am in the process of developing a clustering solution around luci > > and rgmanager (in Red Hat EL 6) and for the time being, am basing it > > off the string output. Maybe each different exit reason should have a > > unique exit code, whatever my logic and approach to solving this > > problem be? > > That doesn't seem like a bad idea. Got a patch? > The Linux Standard Base Core Specification 3.1 says this should return '3'. [1] [1] http://refspecs.freestandards.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/iniscrptact.html -- Mr. Aaron W. Swenson Gentoo Linux Developer Email : titanofold@gentoo.org GnuPG FP : 2C00 7719 4F85 FB07 A49C 0E31 5713 AA03 D1BB FDA0 GnuPG ID : D1BBFDA0
Attachment
May I propose the attached patch.
Points to note and possibly discuss:
(a) Only exit codes in do_* functions have been changed.
(b) The link to, and the version of, LSB specifications has been updated.
(c) A significant change is the exit code of do_stop() on stopping a stopped server. Previous return is 1. Proposed return is 0. If this is accepted, I would highly suggest a mention in the Release Notes.
(d) The exit code that raised this issue was the return of promoting a promoted server. If promotion fails because the server is running but not as standby, should that be considered a case of starting a started service, or an application specific failure? I am equally weighted to opt for the former, but have proposed differently in the patch.
On 23 October 2012 17:29, Robert Haas <robertmhaas@gmail.com> wrote:
That doesn't seem like a bad idea. Got a patch?On Tue, Oct 23, 2012 at 6:39 AM, Dhruv Ahuja <dhruvahuja@gmail.com> wrote:
> The "pg_ctl promote" command returns an exit code of 1 when the server
> is not in standby mode, and the same exit code of 1 when the server
> isn't started at all. The only difference at the time being is the
> string output at the time, which FYI are...
>
> pg_ctl: cannot promote server; server is not in standby mode
>
> ...and...
>
> pg_ctl: PID file "/var/lib/pgsql/9.1/data/postmaster.pid" does not exist
> Is server running?
>
> ...respectively.
>
> I am in the process of developing a clustering solution around luci
> and rgmanager (in Red Hat EL 6) and for the time being, am basing it
> off the string output. Maybe each different exit reason should have a
> unique exit code, whatever my logic and approach to solving this
> problem be?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Don't think the attachment made it in the last mail. Attaching now.
On 25 January 2013 18:33, Dhruv Ahuja <dhruvahuja@gmail.com> wrote:
May I propose the attached patch.Points to note and possibly discuss:(a) Only exit codes in do_* functions have been changed.(b) The link to, and the version of, LSB specifications has been updated.(c) A significant change is the exit code of do_stop() on stopping a stopped server. Previous return is 1. Proposed return is 0. If this is accepted, I would highly suggest a mention in the Release Notes.(d) The exit code that raised this issue was the return of promoting a promoted server. If promotion fails because the server is running but not as standby, should that be considered a case of starting a started service, or an application specific failure? I am equally weighted to opt for the former, but have proposed differently in the patch.On 23 October 2012 17:29, Robert Haas <robertmhaas@gmail.com> wrote:That doesn't seem like a bad idea. Got a patch?On Tue, Oct 23, 2012 at 6:39 AM, Dhruv Ahuja <dhruvahuja@gmail.com> wrote:
> The "pg_ctl promote" command returns an exit code of 1 when the server
> is not in standby mode, and the same exit code of 1 when the server
> isn't started at all. The only difference at the time being is the
> string output at the time, which FYI are...
>
> pg_ctl: cannot promote server; server is not in standby mode
>
> ...and...
>
> pg_ctl: PID file "/var/lib/pgsql/9.1/data/postmaster.pid" does not exist
> Is server running?
>
> ...respectively.
>
> I am in the process of developing a clustering solution around luci
> and rgmanager (in Red Hat EL 6) and for the time being, am basing it
> off the string output. Maybe each different exit reason should have a
> unique exit code, whatever my logic and approach to solving this
> problem be?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment
On 1/12/13 3:30 PM, Aaron W. Swenson wrote: > The Linux Standard Base Core Specification 3.1 says this should return > '3'. [1] > > [1] http://refspecs.freestandards.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/iniscrptact.html The LSB spec doesn't say anything about a "promote" action. And for the stop and reload actions that you tried to change, 3 is "unimplemented". There is an ongoing discussion about the exit status of the stop action under <https://commitfest.postgresql.org/action/patch_view?id=1045>, so let's keep this item about the "promote" action.
On Fri, Jan 25, 2013 at 01:54:06PM -0500, Peter Eisentraut wrote: > On 1/12/13 3:30 PM, Aaron W. Swenson wrote: > > The Linux Standard Base Core Specification 3.1 says this should return > > '3'. [1] > > > > [1] http://refspecs.freestandards.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/iniscrptact.html > > The LSB spec doesn't say anything about a "promote" action. > > And for the stop and reload actions that you tried to change, 3 is > "unimplemented". > > There is an ongoing discussion about the exit status of the stop action > under <https://commitfest.postgresql.org/action/patch_view?id=1045>, so > let's keep this item about the "promote" action. You are right. Had I read a little further down, it seems that the exit status should actually be 7. -- Mr. Aaron W. Swenson Gentoo Linux Developer Email : titanofold@gentoo.org GnuPG FP : 2C00 7719 4F85 FB07 A49C 0E31 5713 AA03 D1BB FDA0 GnuPG ID : D1BBFDA0
On 26.01.2013 23:44, Aaron W. Swenson wrote: > On Fri, Jan 25, 2013 at 01:54:06PM -0500, Peter Eisentraut wrote: >> On 1/12/13 3:30 PM, Aaron W. Swenson wrote: >>> The Linux Standard Base Core Specification 3.1 says this should return >>> '3'. [1] >>> >>> [1] http://refspecs.freestandards.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/iniscrptact.html >> >> The LSB spec doesn't say anything about a "promote" action. >> >> And for the stop and reload actions that you tried to change, 3 is >> "unimplemented". >> >> There is an ongoing discussion about the exit status of the stop action >> under<https://commitfest.postgresql.org/action/patch_view?id=1045>, so >> let's keep this item about the "promote" action. > > You are right. Had I read a little further down, it seems that the > exit status should actually be 7. Not sure if that LSB section is relevant anyway. It specifies the exit codes for init scripts, but pg_ctl is not an init script. - Heikki
Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > Not sure if that LSB section is relevant anyway. It specifies the > exit codes for init scripts, but pg_ctl is not an init script. Except that when I went to the trouble of wrapping pg_ctl with an init script which was thoroughly LSB compliant (according to my reading) and offered it to the community, everyone said that rather than have such a complicated script it would be better to change pg_ctl to include that logic and exit with an LSB compliant exit code. -Kevin
On 1/26/13 4:44 PM, Aaron W. Swenson wrote: > You are right. Had I read a little further down, it seems that the > exit status should actually be 7. 7 is OK for "not running", but what should we use when the server is not in standby mode? Using the idempotent argument that we are discussing for the stop action, promoting a server that is not a standby should be a noop and exit successfully. Not sure if that is what we want, though.
Kevin Grittner <kgrittn@ymail.com> writes: > Heikki Linnakangas <hlinnakangas@vmware.com> wrote: >> Not sure if that LSB section is relevant anyway. It specifies the >> exit codes for init scripts, but pg_ctl is not an init script. > Except that when I went to the trouble of wrapping pg_ctl with an > init script which was thoroughly LSB compliant (according to my > reading) and offered it to the community, everyone said that rather > than have such a complicated script it would be better to change > pg_ctl to include that logic and exit with an LSB compliant exit > code. Right. The start and stop actions are commonly used in initscripts so it'd be handy if the exit codes for those didn't need to be remapped. On the other hand, it's not at all clear to me that anyone would try to put the promote action into an initscript, or that LSB would have anything to say about the exit codes for such a nonstandard action anyway. regards, tom lane
On Mon, Jan 28, 2013 at 09:46:32AM -0500, Peter Eisentraut wrote: > On 1/26/13 4:44 PM, Aaron W. Swenson wrote: > > You are right. Had I read a little further down, it seems that the > > exit status should actually be 7. > > 7 is OK for "not running", but what should we use when the server is not > in standby mode? Using the idempotent argument that we are discussing > for the stop action, promoting a server that is not a standby should be > a noop and exit successfully. Not sure if that is what we want, though. I looked at all the LSB return codes listed here and mapped them to pg_ctl error situations: https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html Patch attached. I did not touch the start/stop return codes. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Attachment
On 6/28/13 10:50 PM, Bruce Momjian wrote: > On Mon, Jan 28, 2013 at 09:46:32AM -0500, Peter Eisentraut wrote: >> On 1/26/13 4:44 PM, Aaron W. Swenson wrote: >>> You are right. Had I read a little further down, it seems that the >>> exit status should actually be 7. >> >> 7 is OK for "not running", but what should we use when the server is not >> in standby mode? Using the idempotent argument that we are discussing >> for the stop action, promoting a server that is not a standby should be >> a noop and exit successfully. Not sure if that is what we want, though. > > I looked at all the LSB return codes listed here and mapped them to > pg_ctl error situations: > > https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html > > Patch attached. I did not touch the start/stop return codes. Approximately none of these changes seem correct to me. For example, why is failing to open the PID file 6, or failing to start the server 7?
On Mon, Jul 1, 2013 at 10:11:23AM -0400, Peter Eisentraut wrote: > On 6/28/13 10:50 PM, Bruce Momjian wrote: > > On Mon, Jan 28, 2013 at 09:46:32AM -0500, Peter Eisentraut wrote: > >> On 1/26/13 4:44 PM, Aaron W. Swenson wrote: > >>> You are right. Had I read a little further down, it seems that the > >>> exit status should actually be 7. > >> > >> 7 is OK for "not running", but what should we use when the server is not > >> in standby mode? Using the idempotent argument that we are discussing > >> for the stop action, promoting a server that is not a standby should be > >> a noop and exit successfully. Not sure if that is what we want, though. > > > > I looked at all the LSB return codes listed here and mapped them to > > pg_ctl error situations: > > > > https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html > > > > Patch attached. I did not touch the start/stop return codes. > > Approximately none of these changes seem correct to me. For example, > why is failing to open the PID file 6, or failing to start the server 7? Well, according to that URL, we have: 6 program is not configured7 program is not running I just updated the pg_ctl.c comments to at least point to a valid URL for this. I think we can just call this item closed because I am still unclear if these return codes should be returned by pg_ctl or the start/stop script. Anyway, while I do think pg_ctl could pass a little more information back about failure via its return code, I am unclear if LSB is the right approach. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 7/1/13 12:47 PM, Bruce Momjian wrote: >> Approximately none of these changes seem correct to me. For example, >> why is failing to open the PID file 6, or failing to start the server 7? > > Well, according to that URL, we have: > > 6 program is not configured > 7 program is not running There is also 4 user had insufficient privilege > I just updated the pg_ctl.c comments to at least point to a valid URL > for this. I think we can just call this item closed because I am still > unclear if these return codes should be returned by pg_ctl or the > start/stop script. > > Anyway, while I do think pg_ctl could pass a little more information > back about failure via its return code, I am unclear if LSB is the right > approach. Yeah, a lot of these things are unclear and not used in practice, so it's probably better to stick to exit code 1, unless there is a clear use case. The "status" case is different, because there the exit code can be passed out by the init script directly.