Thread: Server is not getting started with log level as debug5 on master after commit 3147ac
Server is not getting started with log level as debug5 on master after commit 3147ac
From
Amit Kapila
Date:
In master branch, server is not getting started with log level as debug5. Simple steps to reproduce the problem: a. initdb -D ..\..\Data b. change log_min_messages = debug5 in postgresql.conf c. start server (pg_ctl start -D ..\..\Data) -- server doesn't get started Relevant message on server console: DEBUG: mapped win32 error code 2 to 2 FATAL: could not open recovery command file "recovery.conf": No error This problem occurs only in Windows, but the cause of problem is generic. The problem occurs when during startup, code tries to open recovery.conf in below function: readRecoveryCommandFile(void) { .. fd = AllocateFile(RECOVERY_COMMAND_FILE, "r"); if (fd == NULL) { if (errno == ENOENT) return; /* not there, so no archive recovery */ ereport(FATAL, (errcode_for_file_access(), errmsg("could not open recovery command file \"%s\": %m", RECOVERY_COMMAND_FILE))); .. } It is expected that if file doesn't exist, errno should be ENOENT which is correct and was working fine until commit: 3147acd63e0135aff9a6c4b01d861251925d97d9 Use improved vsnprintf calling logic in more places. The reason is that after this commit, function appendStringInfoVA has started calling pvsnprintf() which initializes errno to 0. As function appendStringInfoVA gets called in error path (errmsg_internal()->EVALUATE_MESSAGE->appendStringInfoVA), it can reset previous errno if there is any, as it happens for function _dosmaperr(). _dosmaperr(unsigned long e) { ... errno = doserrors[i].doserr; #ifndef FRONTEND ereport(DEBUG5, (errmsg_internal("mapped win32 error code %lu to %d", e, errno))); .. } Here after setting errno, it calls errmsg_internal() which internally resets errno to zero, so the callers of this function who are expecting proper errno can fail which is what happens in current issue. I could think of below possible ways to fix the problem: a. in function pvsnprintf(), save the errno before setting it to 0 and then before exiting function reset it back to saved errno. I think this is sane because in function pvsnprintf, if any error occurs due to which errno is changed, it will not return control, so errno will not be required by callers. b. we can change the callers like _dosmaperr() who have responsibility to save errno for their callers. Patch with approach a) is attached with mail, we can change code as per approach b) or any other better method as well, but for now I have prepared patch with approach a), as I could not see any problem with it. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: Server is not getting started with log level as debug5 on master after commit 3147ac
From
Amit Kapila
Date:
On Fri, Nov 22, 2013 at 9:30 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > In master branch, server is not getting started with log level as debug5. > > Simple steps to reproduce the problem: > a. initdb -D ..\..\Data > b. change log_min_messages = debug5 in postgresql.conf > c. start server (pg_ctl start -D ..\..\Data) -- server doesn't get started > > Relevant message on server console: > DEBUG: mapped win32 error code 2 to 2 > FATAL: could not open recovery command file "recovery.conf": No error > > This problem occurs only in Windows, but the cause of problem is generic. > > I could think of below possible ways to fix the problem: > a. in function pvsnprintf(), save the errno before setting it to 0 and > then before exiting function reset it back to saved errno. I think > this is sane because in function pvsnprintf, if any error occurs due > to which errno is changed, it will not return control, so errno will > not be required by callers. > b. we can change the callers like _dosmaperr() who have responsibility > to save errno for their callers. > > Patch with approach a) is attached with mail, we can change code as > per approach b) or any other better method as well, but for now I have > prepared patch with approach a), as I could not see any problem with > it. Again looking at it, I think better fix would be to restore 'errno' from 'edata->saved_errno' in errfinish() incase the control returns back to caller (elevel <= WARNING). It seems to me that this fix is required anyway because if we return from errfinish (ereport/elog) to caller, it should restore back 'errno', as caller might need to take some action based on errno. Patch to restore 'errno' in errfinish() is attached with this mail. Any better ideas for fixing this issue? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: Re: Server is not getting started with log level as debug5 on master after commit 3147ac
From
Tom Lane
Date:
Amit Kapila <amit.kapila16@gmail.com> writes: > On Fri, Nov 22, 2013 at 9:30 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> I could think of below possible ways to fix the problem: >> a. in function pvsnprintf(), save the errno before setting it to 0 and >> then before exiting function reset it back to saved errno. I think >> this is sane because in function pvsnprintf, if any error occurs due >> to which errno is changed, it will not return control, so errno will >> not be required by callers. >> b. we can change the callers like _dosmaperr() who have responsibility >> to save errno for their callers. >> >> Patch with approach a) is attached with mail, we can change code as >> per approach b) or any other better method as well, but for now I have >> prepared patch with approach a), as I could not see any problem with >> it. > Again looking at it, I think better fix would be to restore 'errno' > from 'edata->saved_errno' in errfinish() incase the control returns > back to caller (elevel <= WARNING). It seems to me that this fix is > required anyway because if we return from errfinish (ereport/elog) to > caller, it should restore back 'errno', as caller might need to take > some action based on errno. > Patch to restore 'errno' in errfinish() is attached with this mail. I think this is pretty misguided. In general, there should be no expectation that a function call doesn't stomp on errno unless it's specifically documented not to --- which surely ereport() never has been. So generally it's the responsibility of any code that needs errno to be preserved to do so itself. In particular, in the case at hand: if (doserrors[i].winerr == e) { errno = doserrors[i].doserr; #ifndef FRONTEND ereport(DEBUG5, (errmsg_internal("mapped win32 error code %lu to %d", e, errno))); #elif FRONTEND_DEBUG fprintf(stderr, _("mapped win32 error code %lu to %d"), e, errno); #endif return; } even if we were to do what you suggest to make the ereport() call preserve errno, the FRONTEND_DEBUG code path would still be utterly broken, because fprintf() is certainly capable of changing errno. So basically, _dosmaperr() is broken and always has been, and it's surprising we've not seen evidence of that before. It needs to not try to set the real errno variable till it's done logging about it. Normally I avoid touching Windows-specific code for lack of ability to test it, but in this case the needed fix seems pretty clear, so I'll go make it. regards, tom lane
Re: Re: Server is not getting started with log level as debug5 on master after commit 3147ac
From
Amit Kapila
Date:
On Sun, Nov 24, 2013 at 4:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Kapila <amit.kapila16@gmail.com> writes: >> On Fri, Nov 22, 2013 at 9:30 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> Again looking at it, I think better fix would be to restore 'errno' >> from 'edata->saved_errno' in errfinish() incase the control returns >> back to caller (elevel <= WARNING). It seems to me that this fix is >> required anyway because if we return from errfinish (ereport/elog) to >> caller, it should restore back 'errno', as caller might need to take >> some action based on errno. >> Patch to restore 'errno' in errfinish() is attached with this mail. > > I think this is pretty misguided. In general, there should be no > expectation that a function call doesn't stomp on errno unless it's > specifically documented not to --- which surely ereport() never has > been. So generally it's the responsibility of any code that needs > errno to be preserved to do so itself. In particular, in the case > at hand: > > > So basically, _dosmaperr() is broken and always has been, and it's > surprising we've not seen evidence of that before. It needs to not > try to set the real errno variable till it's done logging about it. > > Normally I avoid touching Windows-specific code for lack of ability > to test it, but in this case the needed fix seems pretty clear, so > I'll go make it. Thanks, I have verified that the problem reported above is fixed. I think that still this kind of problems can be there at other places in code. I checked few places and suspecting secure_read() can also have similar problem: secure_read() { .. errno = 0; n = SSL_read(port->ssl, ptr, len); err = SSL_get_error(port->ssl, n); switch (err) { .. .. case SSL_ERROR_SSL: ereport(COMMERROR, (errcode(ERRCODE_PROTOCOL_VIOLATION), errmsg("SSL error: %s", SSLerrmessage()))); /* fall through */ .. } In case SSL_read sets errno, ereport will reset it and caller of secure_read() like pq_getbyte_if_available() who perform actions based on errno, can lead to some problem. pq_getbyte_if_available(unsigned char *c) { .. r = secure_read(MyProcPort, c, 1); if (r < 0) { if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR) r = 0; .. } In general it is responsibility of caller to take care of errno handling, but I am not sure it is taken care well at all places in code and the chances of such problems were less earlier because there was less chance that ereport would reset errno, but now it will definitely do so. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Re: Server is not getting started with log level as debug5 on master after commit 3147ac
From
Tom Lane
Date:
Amit Kapila <amit.kapila16@gmail.com> writes: > I think that still this kind of problems can be there at other > places in code. I checked few places and suspecting secure_read() can > also have similar problem: > case SSL_ERROR_SSL: > ereport(COMMERROR, > (errcode(ERRCODE_PROTOCOL_VIOLATION), > errmsg("SSL error: %s", SSLerrmessage()))); > /* fall through */ Note that what it "falls through" to is "errno = ECONNRESET", so the caller will see a well-defined value of errno after this. Even without the ereport call, I'd think that was necessary because SSL_get_error isn't documented to return a meaningful value of errno except in the SSL_ERROR_SYSCALL case. > In general it is responsibility of caller to take care of errno > handling, but I am not sure it is taken care well at all places in > code and the chances of such problems were less earlier because there > was less chance that ereport would reset errno, but now it will > definitely do so. [ shrug... ] To my mind, this is a *good* thing, because now we will more easily find and fix any callers that are broken. They were broken anyway, it's just that the circumstances might've been difficult to reproduce. As an example consider the possibility that ereport might previously have stomped on errno only with unusual log_line_prefix settings. regards, tom lane
Re: Re: Server is not getting started with log level as debug5 on master after commit 3147ac
From
Tom Lane
Date:
I wrote: > Amit Kapila <amit.kapila16@gmail.com> writes: >> In general it is responsibility of caller to take care of errno >> handling, but I am not sure it is taken care well at all places in >> code and the chances of such problems were less earlier because there >> was less chance that ereport would reset errno, but now it will >> definitely do so. > [ shrug... ] To my mind, this is a *good* thing, because now we will > more easily find and fix any callers that are broken. They were broken > anyway, it's just that the circumstances might've been difficult to > reproduce. As an example consider the possibility that ereport might > previously have stomped on errno only with unusual log_line_prefix > settings. ... and having said that, there *are* bugs here, which have got nothing to do with ereport(). client_read_ended() for instance is not being careful to preserve errno :-( regards, tom lane
Re: Re: Server is not getting started with log level as debug5 on master after commit 3147ac
From
Amit Kapila
Date:
On Sun, Nov 24, 2013 at 10:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Amit Kapila <amit.kapila16@gmail.com> writes: >> I think that still this kind of problems can be there at other >> places in code. I checked few places and suspecting secure_read() can >> also have similar problem: > >> case SSL_ERROR_SSL: >> ereport(COMMERROR, >> (errcode(ERRCODE_PROTOCOL_VIOLATION), >> errmsg("SSL error: %s", SSLerrmessage()))); >> /* fall through */ > > Note that what it "falls through" to is "errno = ECONNRESET", so the > caller will see a well-defined value of errno after this. Even without > the ereport call, I'd think that was necessary because SSL_get_error > isn't documented to return a meaningful value of errno except in the > SSL_ERROR_SYSCALL case. Yes, that is right. Idea is that there can be more occurrences which we might need to handle and the same seems to be taken care in your latest commit. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com