Home > mailing lists

Re: DropRelFileNodeBuffers API change (was Re: [BUGS] BUG #5599: Vacuum fails due to index corruption issues) - Mailing list pgsql-hackers

From	Bruce Momjian
Subject	Re: DropRelFileNodeBuffers API change (was Re: [BUGS] BUG #5599: Vacuum fails due to index corruption issues)
Date	February 18, 2011 17:43:38
Msg-id	201102182143.p1ILhJo27167@momjian.us Whole thread Raw
In response to	Re: DropRelFileNodeBuffers API change (was Re: [BUGS] BUG #5599: Vacuum fails due to index corruption issues) (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: DropRelFileNodeBuffers API change (was Re: [BUGS] BUG #5599: Vacuum fails due to index corruption issues)
List	pgsql-hackers

Tree view

Did we make any progress on this?  Is it a TODO?

---------------------------------------------------------------------------

Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Sun, Aug 15, 2010 at 9:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> ... and PANIC is absolutely, entirely, 100% unacceptable here. �I don't
> >> think you understand the context. �We've already written the truncate
> >> action to WAL (as we must: WAL before data change). �If we PANIC, that
> >> means we'll PANIC again during WAL replay. �So that solution means a
> >> down, and perhaps unrecoverably broken, database.
> 
> > All right, that would be bad.
> 
> Actually ... after some study of the code, I find that I'm holding this
> proposal to a higher standard than the current code maintains.
> According to our normal rules for applying WAL-loggable data changes,
> there should be a critical section wrapping the application of the
> data changes with making the WAL entry.  RelationTruncate fails to
> do any such thing: it just pushes out the WAL entry and then calls
> smgrtruncate, and so any error inside the latter just results in
> elog(ERROR).  Whereupon the system state is entirely different from what
> WAL says it should be.  So my previous gut feeling that we need to
> rethink this whole thing seems justified.
> 
> I traced through everything that leads to an ftruncate() call in the
> backend as of HEAD, and found that we have these cases to worry about:
> 
> mdunlink calls ftruncate directly, and does nothing worse than
> elog(WARNING) on failure.  This is fine because all it wants to do
> is save some disk space until the file gets deleted "for real" at
> the next checkpoint.  Failure only means we waste disk space
> temporarily, and is surely not cause for panic.
> 
> All other calls proceed (sometimes indirectly) from RelationTruncate
> or replay of the WAL record it emits.  We have not thought hard
> about the order of the various truncations this does and whether or
> not we have a self-consistent state if it fails partway through.
> If we don't want to make the whole thing a critical section, we need
> to figure that out.
> 
> RelationTruncate (and its subsidiary RelationTruncateIndexes) is called
> from heap_truncate_one_rel (which itself does things in an unsafe order),
> and from lazy_truncate_heap in VACUUM.
> 
> heap_truncate_one_rel has (indirectly) two call sources:
> 
> from ExecuteTruncate for a locally created rel, where we don't care,
> and would definitely rather NOT have a panic on error: just rolling back
> the transaction is fine thank you very much.
> 
> from PreCommit_on_commit_actions, to process ON COMMIT DELETE ROWS.
> Here again, so far as heaps are concerned, rollback on error would be
> plenty since any inserted rows would then just be dead.  The tricky
> part is the indexes for such a table.  If we truncate them before
> truncating the heap, then the worst possible case is an internally
> inconsistent index on a temp table, which will be automatically cleaned
> up during the next successful commit in its session.  So it's pretty
> hard to justify a PANIC response here either.
> 
> So it seems like the only case where there is really grounds for PANIC
> on failure is the VACUUM case.  And there we might apply Heikki's idea
> of trying to zero the untruncatable pages first.
> 
> I'm thinking that we need some sort of what-to-do-on-error flag passed
> into RelationTruncate, plus at least order-of-operations fixes in
> several other places, if not a wholesale refactoring of this whole call
> stack.  But I'm running out of steam and don't have a concrete proposal
> to make right now.  In any case, we've got more problems here than just
> the original one of forgetting dirty buffers too soon.
> 
>             regards, tom lane
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

pgsql-hackers by date:

From: Tom Lane
Date: 18 February 2011, 17:41:35
Subject: Re: WIP - Add ability to constrain backend temporary file space

From: Tom Lane
Date: 18 February 2011, 17:46:11
Subject: Re: DropRelFileNodeBuffers API change (was Re: [BUGS] BUG #5599: Vacuum fails due to index corruption issues)

Re: DropRelFileNodeBuffers API change (was Re: [BUGS] BUG #5599: Vacuum fails due to index corruption issues) - Mailing list pgsql-hackers

Previous

Next