Thread: Proper cleanup at backend exit
I have been looking into the problem reported by Massimo, and also seen by me, that backends sometimes exit without cleaning up entries they made in the pg_listener table. I have fixed several contributing bugs (having to do with the limited size of the on_shmem_exit table), but I still see the problem occasionally. I just discovered one fairly fundamental reason why there's still a problem: test=> listen test; LISTEN test=> begin; BEGIN test=> \q $ (start new psql) test=> select * from pg_listener; relname|listenerpid|notification -------+-----------+------------ test | 20024| 0 (1 row) Oops. It would seem that async.c is *trying* to clean up, but the changes are thrown away because they are considered part of a never-committed transaction :-(. More generally, I believe that a backend killed in the middle of any individual SQL statement will fail to remove its LISTEN entries, because it will think that any database changes made just before exit ought to be discarded. If we want a backend to clean out its pg_listen entries at shutdown, it looks to me like backend exit will have to work like this: 1. Abort the current transaction, if any, and exit the current transaction block, if any. 2. Perform LISTEN cleanup within a new transaction. Commit this transaction. 3. Continue with closing up shop. I don't know whether there are any other things besides LISTEN cleanup that really need to be done, not discarded, during backend exit. If there are, then we probably ought to restructure the exit sequence to handle this cleanly. But I'm worried about doing this much processing during backend exit --- if you're trying to kill a backend that's gone wrong, the last thing you want is for it to start doing new transactions. (BTW, that last thought also says that we *still* won't be able to get rid of pg_listen entries 100% reliably. In a panic exit scenario, ie after one backend has crashed, none of the backends will clean up their pg_listen entries. We dare not tamper with that. At least I dare not.) I'm half tempted to say that we should rip out the cleanup code and not even try to remove pg_listener entries at backend exit. They'll go away anyway over time, because the next notifier to try to notify that relation name will delete the entries when he discovers he can't signal that backend PID anymore. So cleanup-at-exit is not at all a critical function, and it's hard to justify a bunch of new complexity just for this. Comments? This depends a lot on whether you think LISTEN cleanup is a unique case, or whether there will be other standard cleanup activities that a backend ought to perform. (Is there such a thing as an ON EXIT hook in SQL?) regards, tom lane
> > I have been looking into the problem reported by Massimo, and also seen > by me, that backends sometimes exit without cleaning up entries they > made in the pg_listener table. I have fixed several contributing bugs > (having to do with the limited size of the on_shmem_exit table), but could you explain this to us ? > I still see the problem occasionally. > > I just discovered one fairly fundamental reason why there's still a > problem: > > test=> listen test; > LISTEN > test=> begin; > BEGIN > test=> \q > $ > > (start new psql) > > test=> select * from pg_listener; > relname|listenerpid|notification > -------+-----------+------------ > test | 20024| 0 > (1 row) > > Oops. It would seem that async.c is *trying* to clean up, but the > changes are thrown away because they are considered part of a > never-committed transaction :-(. More generally, I believe that > a backend killed in the middle of any individual SQL statement will > fail to remove its LISTEN entries, because it will think that any > database changes made just before exit ought to be discarded. This is correct in some sense. Any change made but not commited should never be seen in the table, so the tuple is not deleted. > If we want a backend to clean out its pg_listen entries at shutdown, > it looks to me like backend exit will have to work like this: > > 1. Abort the current transaction, if any, and exit the current > transaction block, if any. > > 2. Perform LISTEN cleanup within a new transaction. Commit this > transaction. > > 3. Continue with closing up shop. > > I don't know whether there are any other things besides LISTEN cleanup > that really need to be done, not discarded, during backend exit. If > there are, then we probably ought to restructure the exit sequence to > handle this cleanly. But I'm worried about doing this much processing > during backend exit --- if you're trying to kill a backend that's gone > wrong, the last thing you want is for it to start doing new > transactions. > > (BTW, that last thought also says that we *still* won't be able to get > rid of pg_listen entries 100% reliably. In a panic exit scenario, > ie after one backend has crashed, none of the backends will clean up > their pg_listen entries. We dare not tamper with that. At least > I dare not.) > > I'm half tempted to say that we should rip out the cleanup code and > not even try to remove pg_listener entries at backend exit. They'll > go away anyway over time, because the next notifier to try to notify > that relation name will delete the entries when he discovers he can't > signal that backend PID anymore. So cleanup-at-exit is not at all a > critical function, and it's hard to justify a bunch of new complexity > just for this. > > Comments? This depends a lot on whether you think LISTEN cleanup > is a unique case, or whether there will be other standard cleanup > activities that a backend ought to perform. (Is there such a thing > as an ON EXIT hook in SQL?) The code should be left if it works in the normal case. I believe that some tuples left in pg_listener don't do any harm because they refer to dead backends and the notify code can detect this situation and delete the dead tuples. Anyway for safety I always truncate the file to 0 in the shell loop that keeps the postmaster alive. Also a vacuum every few hours doesn't hurt. -- Massimo Dal Zotto +----------------------------------------------------------------------+ | Massimo Dal Zotto email: dz@cs.unitn.it | | Via Marconi, 141 phone: ++39-461-534251 | | 38057 Pergine Valsugana (TN) www: http://www.cs.unitn.it/~dz/ | | Italy pgp: finger dz@tango.cs.unitn.it | +----------------------------------------------------------------------+
Massimo Dal Zotto <dz@cs.unitn.it> writes: >> I have fixed several contributing bugs >> (having to do with the limited size of the on_shmem_exit table), > could you explain this to us ? async.c tries to arrange to clean up its entries in pg_listener by scheduling an exit callback routine via on_shmem_exit. (I think Bruce or someone has been changing that stuff, and that on_shmem_exit used to be called something else, but it's still basically an atexit- like list of procedures to call.) The trouble is that on_shmem_exit has a fixed-size, and not very large, array for the callbacks. Try to register too many exit callbacks, and after a while they stop getting registered. I saw two problems associated with that. First, async.c was registering a separate callback for each pg_listen entry it's ever made. That was easily dealt with, since we now have an "unlisten all" subroutine: I got rid of the per-listen callback and now register only a single callback over the life of the backend. It just calls UnlistenAll. Second, the main loop in postgres.c was miscoded so that every time you had an error, it added a redundant FlushBufferPool callback to the list. Make enough mistakes before you do your first listen, and you're still out of luck. A simple reordering of the code fixes that. (While I was at it I tried to improve the comments to make it clear where the main loop *really* starts, so that future hackers won't misplace code in PostgresMain the same way again...) In the long term it might be a good idea to make on_shmem_exit construct a linked list of registered callbacks (eg using Dllist) rather than having a fixed maximum number of exit callbacks. But just at the moment I don't think it's necessary, given these bug fixes --- and if we did have a linked list in there, we'd still want to make these changes because they would be memory leaks otherwise. regards, tom lane PS: I hope to check these changes in shortly.
> I have been looking into the problem reported by Massimo, and also seen > by me, that backends sometimes exit without cleaning up entries they > made in the pg_listener table. I have fixed several contributing bugs > (having to do with the limited size of the on_shmem_exit table), but > I still see the problem occasionally. > > I just discovered one fairly fundamental reason why there's still a > problem: > > test=> listen test; > LISTEN > test=> begin; > BEGIN > test=> \q > $ > > (start new psql) > > test=> select * from pg_listener; > relname|listenerpid|notification > -------+-----------+------------ > test | 20024| 0 > (1 row) > > Oops. It would seem that async.c is *trying* to clean up, but the > changes are thrown away because they are considered part of a > never-committed transaction :-(. More generally, I believe that > a backend killed in the middle of any individual SQL statement will > fail to remove its LISTEN entries, because it will think that any > database changes made just before exit ought to be discarded. > > If we want a backend to clean out its pg_listen entries at shutdown, > it looks to me like backend exit will have to work like this: > > 1. Abort the current transaction, if any, and exit the current > transaction block, if any. > > 2. Perform LISTEN cleanup within a new transaction. Commit this > transaction. > > 3. Continue with closing up shop. > > I don't know whether there are any other things besides LISTEN cleanup > that really need to be done, not discarded, during backend exit. If > there are, then we probably ought to restructure the exit sequence to > handle this cleanly. But I'm worried about doing this much processing > during backend exit --- if you're trying to kill a backend that's gone > wrong, the last thing you want is for it to start doing new > transactions. > > (BTW, that last thought also says that we *still* won't be able to get > rid of pg_listen entries 100% reliably. In a panic exit scenario, > ie after one backend has crashed, none of the backends will clean up > their pg_listen entries. We dare not tamper with that. At least > I dare not.) > > I'm half tempted to say that we should rip out the cleanup code and > not even try to remove pg_listener entries at backend exit. They'll > go away anyway over time, because the next notifier to try to notify > that relation name will delete the entries when he discovers he can't > signal that backend PID anymore. So cleanup-at-exit is not at all a > critical function, and it's hard to justify a bunch of new complexity > just for this. > > Comments? This depends a lot on whether you think LISTEN cleanup > is a unique case, or whether there will be other standard cleanup > activities that a backend ought to perform. (Is there such a thing > as an ON EXIT hook in SQL?) Seems like we should try and clean them out, especially when the backend is doing a normal exit. In a panic exit, we don't need to clean it out. We can install something into proc_exit() and shmem_exit() to clean out the table. You can have a global variable that installed the pg_listen cleaner the first time a notify() is done so as not to install a new exit handler for each notify() call. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> Massimo Dal Zotto <dz@cs.unitn.it> writes: > >> I have fixed several contributing bugs > >> (having to do with the limited size of the on_shmem_exit table), > > > could you explain this to us ? > > async.c tries to arrange to clean up its entries in pg_listener by > scheduling an exit callback routine via on_shmem_exit. (I think > Bruce or someone has been changing that stuff, and that on_shmem_exit > used to be called something else, but it's still basically an atexit- > like list of procedures to call.) Yes. The old code atexit() was called for process exit and shared-memory reset, and that caused problems as more and more cleanup had to be done ONLY at shared memory reset time. Having two separate queues and proc names helped. > > The trouble is that on_shmem_exit has a fixed-size, and not very large, > array for the callbacks. Try to register too many exit callbacks, and > after a while they stop getting registered. > > I saw two problems associated with that. First, async.c was registering > a separate callback for each pg_listen entry it's ever made. That was > easily dealt with, since we now have an "unlisten all" subroutine: > I got rid of the per-listen callback and now register only a single > callback over the life of the backend. It just calls UnlistenAll. That is why you need some type of global variable that sets the cleanup the first time, and only once. You could just call the proc_exit call, and prevent it from scheduling two of the same exit functions. That would be cleaner. > > Second, the main loop in postgres.c was miscoded so that every time you > had an error, it added a redundant FlushBufferPool callback to the list. > Make enough mistakes before you do your first listen, and you're still > out of luck. A simple reordering of the code fixes that. (While I was > at it I tried to improve the comments to make it clear where the main > loop *really* starts, so that future hackers won't misplace code in > PostgresMain the same way again...) > > In the long term it might be a good idea to make on_shmem_exit construct > a linked list of registered callbacks (eg using Dllist) rather than > having a fixed maximum number of exit callbacks. But just at the moment > I don't think it's necessary, given these bug fixes --- and if we did > have a linked list in there, we'd still want to make these changes > because they would be memory leaks otherwise. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: >> [ should we work harder to ensure pg_listener entries get cleaned out? ] > Seems like we should try and clean them out, especially when the backend > is doing a normal exit. In a panic exit, we don't need to clean it out. OK, I will add some more code to try to cope with the exit-during- uncommitted-transaction case. > We can install something into proc_exit() and shmem_exit() to clean out > the table. The old code inserted the cleanup action into shmem_exit(). Is that the right list to put it in? I'm not clear on the difference between shmem_exit and proc_exit lists... > You can have a global variable that installed the pg_listen > cleaner the first time a notify() is done so as not to install a new > exit handler for each notify() call. That part's already done. regards, tom lane
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > >> [ should we work harder to ensure pg_listener entries get cleaned out? ] > > > Seems like we should try and clean them out, especially when the backend > > is doing a normal exit. In a panic exit, we don't need to clean it out. > > OK, I will add some more code to try to cope with the exit-during- > uncommitted-transaction case. > > > We can install something into proc_exit() and shmem_exit() to clean out > > the table. > > The old code inserted the cleanup action into shmem_exit(). Is that the > right list to put it in? I'm not clear on the difference between > shmem_exit and proc_exit lists... proc_exit is triggered on every backend exit. shmem_exit is for postmaster exit, or postmaster shared memory reset. > > > You can have a global variable that installed the pg_listen > > cleaner the first time a notify() is done so as not to install a new > > exit handler for each notify() call. > > That part's already done. Cool. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026