Thread: pg_ctl reload breaks our client
Yesterday a client application broke immediately after we issued a pg_ctl reload command. The only change we had made was to pg_hba.conf to enable trusted connections from localhost. My question is, how should the client application be affected by such a reload? My impression was that the client should be totally unaware of a reload, but reality does not bear this out. Any ideas/informed responses will be welcomed. I suspect that this has uncovered a bug in our client but without knowing what the client experience shuold be, it's hard to narrow down where the bug may lie. Thanks. __ Marc Munro
Attachment
On Fri, Sep 16, 2005 at 01:28:13PM -0700, Marc Munro wrote: > Yesterday a client application broke immediately after we issued a > pg_ctl reload command. How did the client break? What behavior did it exhibit? Were there any errors in the server's logs? Can you duplicate the problem? What version of PostgreSQL are you using, and on what platform? -- Michael Fuhr
Michael, It is Postgres 7.3.6. The client is a multi-threaded C++ client. The breakage was that one group of connections simply stopped. Others contined without problem. It is not clear exactly what was going on. Nothing in our application logs gives us any clue to this. As for reproducibility, it has hapenned before in test environments when we have bounced the datanase. This is not too shocking as I would expect the client to notice this :-) It is a little more shocking when it's a reload. Or maybe I have simply misunderstood what reload does. I am simply looking for clues here and don't expect definitive answers. That's why I was a little vague. Am I right though, in thinking that a reload shuold be pretty much invisible to the client, or will certain operations fail and require a re-try? __ Marc On Fri, 2005-09-16 at 14:40 -0600, Michael Fuhr wrote: > On Fri, Sep 16, 2005 at 01:28:13PM -0700, Marc Munro wrote: > > Yesterday a client application broke immediately after we issued a > > pg_ctl reload command. > > How did the client break? What behavior did it exhibit? Were there > any errors in the server's logs? Can you duplicate the problem? > What version of PostgreSQL are you using, and on what platform? >
Attachment
On Fri, Sep 16, 2005 at 02:16:29PM -0700, Marc Munro wrote: > It is Postgres 7.3.6. The client is a multi-threaded C++ client. The > breakage was that one group of connections simply stopped. Others > contined without problem. It is not clear exactly what was going on. How did the connections "stop"? Were the connections broken, causing queries to fail? Or did queries block and never return? Or something else? What was happening that shouldn't happen, or what wasn't happening that should happen? If the connections were still active but not returning, did you do a process trace on the connection's postmaster or attach a debugger to it to see what it was doing? Could the timing of the problem have been coincidence? Have you ever seen the problem without a reload? How often do you see the problem after a reload? Do you know for certain that the application was working immediately before the reload and not working immediately after it? What operating system are you using? > Nothing in our application logs gives us any clue to this. What about the postmaster logs? > As for reproducibility, it has hapenned before in test environments when > we have bounced the datanase. This is not too shocking as I would > expect the client to notice this :-) It is a little more shocking when > it's a reload. Or maybe I have simply misunderstood what reload does. Can you reproduce the problem with a reload? A stop and start will terminate client connections, but a reload shouldn't. -- Michael Fuhr
Michael, Many thanks for your response; it is much appreciated. My responses are embedded below: On Fri, 2005-09-16 at 17:10 -0600, Michael Fuhr wrote: > On Fri, Sep 16, 2005 at 02:16:29PM -0700, Marc Munro wrote: > > It is Postgres 7.3.6. The client is a multi-threaded C++ client. The > > breakage was that one group of connections simply stopped. Others > > contined without problem. It is not clear exactly what was going on. > > How did the connections "stop"? Were the connections broken, causing > queries to fail? Or did queries block and never return? Or something > else? What was happening that shouldn't happen, or what wasn't > happening that should happen? From the server side, there were simply connections (1 or 2) that appeared idle. From the client side it looked like a query had been initiated but the client thread was stuck in a library call (as near as we can tell). This, vague though it is, is as much as I know right now. We were unable to do much debugging as it is a production system and the priority was to get it back up. > If the connections were still active but not returning, did you do > a process trace on the connection's postmaster or attach a debugger > to it to see what it was doing? No, time pressure prevented this. > Could the timing of the problem have been coincidence? Have you > ever seen the problem without a reload? How often do you see the > problem after a reload? Do you know for certain that the application > was working immediately before the reload and not working immediately > after it? It *could* be coincidence, but the problem began within 5 seconds of the reload. Coincidence is unlikely. > What operating system are you using? Linux 2.4.20 smp i686 > > > Nothing in our application logs gives us any clue to this. > > What about the postmaster logs? Ah, now there's another story. Unavailable I'm afraid. Resolving that is also on my priority list. > > As for reproducibility, it has hapenned before in test environments when > > we have bounced the datanase. This is not too shocking as I would > > expect the client to notice this :-) It is a little more shocking when > > it's a reload. Or maybe I have simply misunderstood what reload does. > > Can you reproduce the problem with a reload? A stop and start will > terminate client connections, but a reload shouldn't. This is not currently seen as a priority (the work-around of "don't do that" is seen as sufficient). I'm simply hoping to get someone to say for sure that the client app should not be able to tell that a reload has happened. At that point I may be able to raise the priority of this issue. I would certainly like to do more investigation. If postgresql hackers are interested in this strange event (please tell me for sure that it *is* strange) that may also help me to get the necessary resources to run more tests. Thanks again. __ Marc Munro
Attachment
On Fri, Sep 16, 2005 at 04:34:46PM -0700, Marc Munro wrote: > On Fri, 2005-09-16 at 17:10 -0600, Michael Fuhr wrote: > > Can you reproduce the problem with a reload? A stop and start will > > terminate client connections, but a reload shouldn't. > > This is not currently seen as a priority (the work-around of "don't do > that" is seen as sufficient). I'm simply hoping to get someone to say > for sure that the client app should not be able to tell that a reload > has happened. At that point I may be able to raise the priority of this > issue. As far as I know clients shouldn't notice a reload (which is effected via a SIGHUP); I just did some tests and didn't see any problems. However, I don't know much about the inner workings of PostgreSQL so I can't say for sure. Maybe one of the developers will comment. -- Michael Fuhr
Marc.... > Yesterday a client application broke immediately after we issued a > pg_ctl reload command. The only change we had made was to pg_hba.conf > to enable trusted connections from localhost. Can you change pg_hba.conf back to what it had been prior and do a reload again and check if the clients start working? I've gotten confused and shot myself in the foot when setting pg_hba.conf a few times, myself. brew ========================================================================== Strange Brew (brew@theMode.com) Check out my Stock Option Covered Call website http://www.callpix.com and my Musician's Online Database Exchange http://www.TheMode.com ==========================================================================
Michael Fuhr <mike@fuhr.org> writes: > On Fri, Sep 16, 2005 at 04:34:46PM -0700, Marc Munro wrote: >> On Fri, 2005-09-16 at 17:10 -0600, Michael Fuhr wrote: >> This is not currently seen as a priority (the work-around of "don't do >> that" is seen as sufficient). I'm simply hoping to get someone to say >> for sure that the client app should not be able to tell that a reload >> has happened. At that point I may be able to raise the priority of this >> issue. > As far as I know clients shouldn't notice a reload (which is effected > via a SIGHUP); I just did some tests and didn't see any problems. Existing client connections should not be able to notice a reload that changes pg_hba.conf or pg_ident.conf; however they definitely *should* notice a reload that changes postgresql.conf (at least for parameters that aren't overridden by other cases, such as a SET in the current session). So the blanket statement Marc is making is simply wrong. Whether there is a bug here is impossible to say given the limited amount of information provided. I'd not expect a reload to cause an existing connection to become totally dysfunctional, which is what Marc seems to be claiming ... but without more evidence or a test case, there's not much to be done. regards, tom lane
Tom, Thanks for this. I am going to push for reproducing the problem in a test environment. If I have any success, I will follow up with results and more details. __ Marc On Sat, 2005-09-17 at 01:23 -0400, Tom Lane wrote: > Michael Fuhr <mike@fuhr.org> writes: > > On Fri, Sep 16, 2005 at 04:34:46PM -0700, Marc Munro wrote: > >> On Fri, 2005-09-16 at 17:10 -0600, Michael Fuhr wrote: > >> This is not currently seen as a priority (the work-around of "don't do > >> that" is seen as sufficient). I'm simply hoping to get someone to say > >> for sure that the client app should not be able to tell that a reload > >> has happened. At that point I may be able to raise the priority of this > >> issue. > > > As far as I know clients shouldn't notice a reload (which is effected > > via a SIGHUP); I just did some tests and didn't see any problems. > > Existing client connections should not be able to notice a reload that > changes pg_hba.conf or pg_ident.conf; however they definitely *should* > notice a reload that changes postgresql.conf (at least for parameters > that aren't overridden by other cases, such as a SET in the current > session). So the blanket statement Marc is making is simply wrong. > > Whether there is a bug here is impossible to say given the limited > amount of information provided. I'd not expect a reload to cause > an existing connection to become totally dysfunctional, which is > what Marc seems to be claiming ... but without more evidence or > a test case, there's not much to be done. > > regards, tom lane