Thread: Re: [COMMITTERS] pgsql: Repair two places where SIGTERM exit could leave shared memory

Re: [COMMITTERS] pgsql: Repair two places where SIGTERM exit could leave shared memory

From

Alvaro Herrera

Date:

17 April 2008, 09:30:25

Tom Lane wrote:

> Also use this method
> for createdb cleanup --- that wasn't a shared-memory-corruption problem,
> but SIGTERM abort of createdb could leave orphaned files lying around.

I wonder if we could use this mechanism for cleaning up in case of
failed CLUSTER, REINDEX or the like.  I think these can leave dangling
files around.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Re: [COMMITTERS] pgsql: Repair two places where SIGTERM exit couldleave shared memory

From

Heikki Linnakangas

Date:

17 April 2008, 10:03:36

Alvaro Herrera wrote:
> Tom Lane wrote:
> 
>> Also use this method
>> for createdb cleanup --- that wasn't a shared-memory-corruption problem,
>> but SIGTERM abort of createdb could leave orphaned files lying around.
> 
> I wonder if we could use this mechanism for cleaning up in case of
> failed CLUSTER, REINDEX or the like.  I think these can leave dangling
> files around.

They do clean up on abort or SIGTERM. If you experience a sudden power 
loss, or kill -9 while CLUSTER or REINDEX is running, they will leave 
behind dangling files, but that's a different problem. It's not limited 
to utility commands like that either: if you create a table and copy a 
few gigabytes of data into it in a transaction, and crash before 
committing, you're left with a dangling file as well.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Re: [COMMITTERS] pgsql: Repair two places where SIGTERM exit couldleave shared memory

From

Alvaro Herrera

Date:

17 April 2008, 10:05:58

Heikki Linnakangas wrote:
> Alvaro Herrera wrote:
>> Tom Lane wrote:
>>
>>> Also use this method
>>> for createdb cleanup --- that wasn't a shared-memory-corruption problem,
>>> but SIGTERM abort of createdb could leave orphaned files lying around.
>>
>> I wonder if we could use this mechanism for cleaning up in case of
>> failed CLUSTER, REINDEX or the like.  I think these can leave dangling
>> files around.
>
> They do clean up on abort or SIGTERM.

Ah, we're OK then.

> If you experience a sudden power  loss, or kill -9 while CLUSTER or
> REINDEX is running, they will leave  behind dangling files, but that's
> a different problem.

Sure, no surprises there.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Re: [COMMITTERS] pgsql: Repair two places whereSIGTERM exit couldleave shared memory

From

Heikki Linnakangas

Date:

17 April 2008, 10:14:34

Alvaro Herrera wrote:
> Heikki Linnakangas wrote:
>> Alvaro Herrera wrote:
>>> Tom Lane wrote:
>>>
>>>> Also use this method
>>>> for createdb cleanup --- that wasn't a shared-memory-corruption problem,
>>>> but SIGTERM abort of createdb could leave orphaned files lying around.
>>> I wonder if we could use this mechanism for cleaning up in case of
>>> failed CLUSTER, REINDEX or the like.  I think these can leave dangling
>>> files around.
>> They do clean up on abort or SIGTERM.
> 
> Ah, we're OK then.

Wait, my memory failed me! No, we don't clean up dangling files on 
SIGTERM. We should...

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Re: [COMMITTERS] pgsql: Repair two places whereSIGTERM exit couldleave shared memory

From

Heikki Linnakangas

Date:

17 April 2008, 10:17:44

Heikki Linnakangas wrote:
> Alvaro Herrera wrote:
>> Heikki Linnakangas wrote:
>>> Alvaro Herrera wrote:
>>>> Tom Lane wrote:
>>>>
>>>>> Also use this method
>>>>> for createdb cleanup --- that wasn't a shared-memory-corruption 
>>>>> problem,
>>>>> but SIGTERM abort of createdb could leave orphaned files lying around.
>>>> I wonder if we could use this mechanism for cleaning up in case of
>>>> failed CLUSTER, REINDEX or the like.  I think these can leave dangling
>>>> files around.
>>> They do clean up on abort or SIGTERM.
>>
>> Ah, we're OK then.
> 
> Wait, my memory failed me! No, we don't clean up dangling files on 
> SIGTERM. We should...

No, wait, we do after all. I was fooled by the new 8.3 behavior to leave 
the files dangling until next checkpoint. The files are not cleaned up 
immediately on SIGTERM, but they are at the next checkpoint.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Re: [COMMITTERS] pgsql: Repair two places where SIGTERM exit couldleave shared memory

From

Martijn van Oosterhout

Date:

17 April 2008, 12:13:26

On Thu, Apr 17, 2008 at 04:03:18PM +0300, Heikki Linnakangas wrote:
> They do clean up on abort or SIGTERM. If you experience a sudden power
> loss, or kill -9 while CLUSTER or REINDEX is running, they will leave
> behind dangling files, but that's a different problem. It's not limited
> to utility commands like that either: if you create a table and copy a
> few gigabytes of data into it in a transaction, and crash before
> committing, you're left with a dangling file as well.

Is this so? This happened to me the other day (hence the question about
having COPY note failure earlier) because the disk filled up. I was
confused because du showed nothing. Eventually I did an lsof and found
the postgres backend had a large number of open file handles to deleted
files (each one gigabyte).

So something certainly deletes them (though maybe not on windows?)
before the transaction ends.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: Re: [COMMITTERS] pgsql: Repair two places where SIGTERM exit couldleave shared memory

From

Tom Lane

Date:

17 April 2008, 12:48:56

Martijn van Oosterhout <kleptog@svana.org> writes:
> Is this so? This happened to me the other day (hence the question about
> having COPY note failure earlier) because the disk filled up. I was
> confused because du showed nothing. Eventually I did an lsof and found
> the postgres backend had a large number of open file handles to deleted
> files (each one gigabyte).

The backend, or the bgwriter?  Please be specific.

The bgwriter should drop open file references after the next checkpoint,
but I don't recall any forcing function for regular backends to close
open files.

8.3 and HEAD should ftruncate() the first segment of a relation but I
think they just unlink the rest.  Is it sane to think of ftruncate then
unlink on the non-first segments, to alleviate the disk-space issue when
someone else is holding the file open?
        regards, tom lane

Re: Re: [COMMITTERS] pgsql: Repair two places where SIGTERM exit couldleave shared memory

From

Martijn van Oosterhout

Date:

18 April 2008, 04:09:38

On Thu, Apr 17, 2008 at 11:48:41AM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > Is this so? This happened to me the other day (hence the question about
> > having COPY note failure earlier) because the disk filled up. I was
> > confused because du showed nothing. Eventually I did an lsof and found
> > the postgres backend had a large number of open file handles to deleted
> > files (each one gigabyte).
>
> The backend, or the bgwriter?  Please be specific.

I beleive the backend, because I was using lsof -p <pid> using the pid
copied from ps. But I can't be 100%.

> 8.3 and HEAD should ftruncate() the first segment of a relation but I
> think they just unlink the rest.  Is it sane to think of ftruncate then
> unlink on the non-first segments, to alleviate the disk-space issue when
> someone else is holding the file open?

It's possible. OTOH, if the copy error had been return in the
PQputline() the driving program (which has several COPYs running at
once) would have aborted and the data would have been reclaimed
immediately. As it is it kept going for an hour before noticing and
then dying (and cleaning everything up).

The one ftruncate does explain why there was some free space, so that
part is appreciated.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.