Thread: Re: [COMMITTERS] pgsql: Repair two places where SIGTERM exit could leave shared memory
Re: [COMMITTERS] pgsql: Repair two places where SIGTERM exit could leave shared memory
From
Alvaro Herrera
Date:
Tom Lane wrote: > Also use this method > for createdb cleanup --- that wasn't a shared-memory-corruption problem, > but SIGTERM abort of createdb could leave orphaned files lying around. I wonder if we could use this mechanism for cleaning up in case of failed CLUSTER, REINDEX or the like. I think these can leave dangling files around. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Re: Re: [COMMITTERS] pgsql: Repair two places where SIGTERM exit couldleave shared memory
From
Heikki Linnakangas
Date:
Alvaro Herrera wrote: > Tom Lane wrote: > >> Also use this method >> for createdb cleanup --- that wasn't a shared-memory-corruption problem, >> but SIGTERM abort of createdb could leave orphaned files lying around. > > I wonder if we could use this mechanism for cleaning up in case of > failed CLUSTER, REINDEX or the like. I think these can leave dangling > files around. They do clean up on abort or SIGTERM. If you experience a sudden power loss, or kill -9 while CLUSTER or REINDEX is running, they will leave behind dangling files, but that's a different problem. It's not limited to utility commands like that either: if you create a table and copy a few gigabytes of data into it in a transaction, and crash before committing, you're left with a dangling file as well. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Re: [COMMITTERS] pgsql: Repair two places where SIGTERM exit couldleave shared memory
From
Alvaro Herrera
Date:
Heikki Linnakangas wrote: > Alvaro Herrera wrote: >> Tom Lane wrote: >> >>> Also use this method >>> for createdb cleanup --- that wasn't a shared-memory-corruption problem, >>> but SIGTERM abort of createdb could leave orphaned files lying around. >> >> I wonder if we could use this mechanism for cleaning up in case of >> failed CLUSTER, REINDEX or the like. I think these can leave dangling >> files around. > > They do clean up on abort or SIGTERM. Ah, we're OK then. > If you experience a sudden power loss, or kill -9 while CLUSTER or > REINDEX is running, they will leave behind dangling files, but that's > a different problem. Sure, no surprises there. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Re: Re: [COMMITTERS] pgsql: Repair two places whereSIGTERM exit couldleave shared memory
From
Heikki Linnakangas
Date:
Alvaro Herrera wrote: > Heikki Linnakangas wrote: >> Alvaro Herrera wrote: >>> Tom Lane wrote: >>> >>>> Also use this method >>>> for createdb cleanup --- that wasn't a shared-memory-corruption problem, >>>> but SIGTERM abort of createdb could leave orphaned files lying around. >>> I wonder if we could use this mechanism for cleaning up in case of >>> failed CLUSTER, REINDEX or the like. I think these can leave dangling >>> files around. >> They do clean up on abort or SIGTERM. > > Ah, we're OK then. Wait, my memory failed me! No, we don't clean up dangling files on SIGTERM. We should... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Re: [COMMITTERS] pgsql: Repair two places whereSIGTERM exit couldleave shared memory
From
Heikki Linnakangas
Date:
Heikki Linnakangas wrote: > Alvaro Herrera wrote: >> Heikki Linnakangas wrote: >>> Alvaro Herrera wrote: >>>> Tom Lane wrote: >>>> >>>>> Also use this method >>>>> for createdb cleanup --- that wasn't a shared-memory-corruption >>>>> problem, >>>>> but SIGTERM abort of createdb could leave orphaned files lying around. >>>> I wonder if we could use this mechanism for cleaning up in case of >>>> failed CLUSTER, REINDEX or the like. I think these can leave dangling >>>> files around. >>> They do clean up on abort or SIGTERM. >> >> Ah, we're OK then. > > Wait, my memory failed me! No, we don't clean up dangling files on > SIGTERM. We should... No, wait, we do after all. I was fooled by the new 8.3 behavior to leave the files dangling until next checkpoint. The files are not cleaned up immediately on SIGTERM, but they are at the next checkpoint. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Re: [COMMITTERS] pgsql: Repair two places where SIGTERM exit couldleave shared memory
From
Martijn van Oosterhout
Date:
On Thu, Apr 17, 2008 at 04:03:18PM +0300, Heikki Linnakangas wrote: > They do clean up on abort or SIGTERM. If you experience a sudden power > loss, or kill -9 while CLUSTER or REINDEX is running, they will leave > behind dangling files, but that's a different problem. It's not limited > to utility commands like that either: if you create a table and copy a > few gigabytes of data into it in a transaction, and crash before > committing, you're left with a dangling file as well. Is this so? This happened to me the other day (hence the question about having COPY note failure earlier) because the disk filled up. I was confused because du showed nothing. Eventually I did an lsof and found the postgres backend had a large number of open file handles to deleted files (each one gigabyte). So something certainly deletes them (though maybe not on windows?) before the transaction ends. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Re: Re: [COMMITTERS] pgsql: Repair two places where SIGTERM exit couldleave shared memory
From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes: > Is this so? This happened to me the other day (hence the question about > having COPY note failure earlier) because the disk filled up. I was > confused because du showed nothing. Eventually I did an lsof and found > the postgres backend had a large number of open file handles to deleted > files (each one gigabyte). The backend, or the bgwriter? Please be specific. The bgwriter should drop open file references after the next checkpoint, but I don't recall any forcing function for regular backends to close open files. 8.3 and HEAD should ftruncate() the first segment of a relation but I think they just unlink the rest. Is it sane to think of ftruncate then unlink on the non-first segments, to alleviate the disk-space issue when someone else is holding the file open? regards, tom lane
Re: Re: [COMMITTERS] pgsql: Repair two places where SIGTERM exit couldleave shared memory
From
Martijn van Oosterhout
Date:
On Thu, Apr 17, 2008 at 11:48:41AM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > Is this so? This happened to me the other day (hence the question about > > having COPY note failure earlier) because the disk filled up. I was > > confused because du showed nothing. Eventually I did an lsof and found > > the postgres backend had a large number of open file handles to deleted > > files (each one gigabyte). > > The backend, or the bgwriter? Please be specific. I beleive the backend, because I was using lsof -p <pid> using the pid copied from ps. But I can't be 100%. > 8.3 and HEAD should ftruncate() the first segment of a relation but I > think they just unlink the rest. Is it sane to think of ftruncate then > unlink on the non-first segments, to alleviate the disk-space issue when > someone else is holding the file open? It's possible. OTOH, if the copy error had been return in the PQputline() the driving program (which has several COPYs running at once) would have aborted and the data would have been reclaimed immediately. As it is it kept going for an hour before noticing and then dying (and cleaning everything up). The one ftruncate does explain why there was some free space, so that part is appreciated. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.