Thread: random system table corruption ...

random system table corruption ...

From

Hans-Jürgen Schönig

Date:

11 September 2005, 08:12:42

in the past we have faced a couple of problems with corrupted system 
tables. this seems to be a version independent problem which occurs on 
hackers' from time to time.
i have checked a broken file and i have seen that the corrupted page has 
actually been zeroed out.

my question is: are there any options to implement something which makes 
system tables more robust? the problem is: the described error happens 
only once i an while and cannot be reproduced. maybe there is a way to 
add some more sanity checks before the page is actually written.

any suggestions?
best regards,
    hans

-- 
Cybertec Geschwinde & Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

Re: random system table corruption ...

From

Martijn van Oosterhout

Date:

11 September 2005, 08:57:54

On Sun, Sep 11, 2005 at 01:12:34PM +0200, Hans-Jürgen Schönig wrote:
> in the past we have faced a couple of problems with corrupted system
> tables. this seems to be a version independent problem which occurs on
> hackers' from time to time.
> i have checked a broken file and i have seen that the corrupted page has
> actually been zeroed out.

Near as I can tell, the only times pages are zeroed out is if
zero_damaged_pages is set (destroying the evidence) or during WAL
recovery.

> my question is: are there any options to implement something which makes
> system tables more robust? the problem is: the described error happens
> only once i an while and cannot be reproduced. maybe there is a way to
> add some more sanity checks before the page is actually written.

Well, the most common causes are dodgy memory. Other than that I guess
you could arrange for bgwriter to check the pages it is writing. I
imagine it already does check the header, checking the data requires
knowledge about the actual table and attributes. And about the only
thing that says "I'm broken" is a varlena value with a long value.

As they say, the only thing sure would be to have a backup. the only
thing I can imagine being really useful would be a restore mode where
you feed it the schema so it can reconstruct the pg_class and
pg_attribute just enough for you to dump it to reconstruct
everything...

You know, VACUUM FREEZE BACKUP on pg_catalog, physically copy the
datafiles and offer the option to blat your catalog with an old one...
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: random system table corruption ...

From

Tom Lane

Date:

11 September 2005, 13:24:49

Hans-Jürgen Schönig <postgres@cybertec.at> writes:
> in the past we have faced a couple of problems with corrupted system 
> tables. this seems to be a version independent problem which occurs on 
> hackers' from time to time.
> i have checked a broken file and i have seen that the corrupted page has 
> actually been zeroed out.

That sounds to me like a hardware problem --- disk or disk controller
momentarily writing zeroes instead of what it should write.   Have you
seen this on more than one physical machine?  Do you have any evidence
for the implication that it only happens to system tables and not user
tables?

Also, you don't have zero_damaged_pages turned on by any chance?
        regards, tom lane

Re: random system table corruption ...

From

Hans-Jürgen Schönig

Date:

11 September 2005, 13:41:45

Tom Lane wrote:
> Hans-Jürgen Schönig <postgres@cybertec.at> writes:
> 
>>in the past we have faced a couple of problems with corrupted system 
>>tables. this seems to be a version independent problem which occurs on 
>>hackers' from time to time.
>>i have checked a broken file and i have seen that the corrupted page has 
>>actually been zeroed out.
> 
> 
> That sounds to me like a hardware problem --- disk or disk controller
> momentarily writing zeroes instead of what it should write.   Have you
> seen this on more than one physical machine?  Do you have any evidence
> for the implication that it only happens to system tables and not user
> tables?
> 
> Also, you don't have zero_damaged_pages turned on by any chance?
> 
>             regards, tom lane

tom,

well, there is some evidence that this is not a hardware related issue.
we have only seen this problem from time to time but it happened on 
different machines. it cannot be reproduced. it can even happen when 
somebody runs a script which has been called million times before.
in my current scenario the page header only consists of 0x00 bytes and 
therefore the page checks fails when reading the system table.

i have never seen this in data files up to now (at least not when the 
hardware was still intact).

did anybody face similar problems? maybe on sun?
by the way: currently the broken system is running PostgreSQL 7.4 but as 
I said - we have also seen that on 8.0 once.
best regards,
    hans

-- 
Cybertec Geschwinde & Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

Re: random system table corruption ...

From

Alvaro Herrera

Date:

11 September 2005, 15:00:48

On Sun, Sep 11, 2005 at 01:12:34PM +0200, Hans-Jürgen Schönig wrote:
> in the past we have faced a couple of problems with corrupted system 
> tables. this seems to be a version independent problem which occurs on 
> hackers' from time to time.
> i have checked a broken file and i have seen that the corrupted page has 
> actually been zeroed out.

IIRC the XFS filesystem zeroes out pages that it recovers from the
journal but did not have a fsync on them (AFAIK XFS journals only
metadata, so page creation but not the content itself).  I don't think
this would be applicable to your case, because we do fsync modified
files on checkpoint, and rewrite them completely from WAL images after
that.  But I thought I'd mention it.

-- 
Alvaro Herrera -- Valdivia, Chile         Architect, www.EnterpriseDB.com
"Just treat us the way you want to be treated + some extra allowancefor ignorance."
(MichaelBrusser)

Re: random system table corruption ...

From

Hans-Jürgen Schönig

Date:

11 September 2005, 16:37:44

Alvaro Herrera wrote:
> On Sun, Sep 11, 2005 at 01:12:34PM +0200, Hans-Jürgen Schönig wrote:
> 
>>in the past we have faced a couple of problems with corrupted system 
>>tables. this seems to be a version independent problem which occurs on 
>>hackers' from time to time.
>>i have checked a broken file and i have seen that the corrupted page has 
>>actually been zeroed out.
> 
> 
> IIRC the XFS filesystem zeroes out pages that it recovers from the
> journal but did not have a fsync on them (AFAIK XFS journals only
> metadata, so page creation but not the content itself).  I don't think
> this would be applicable to your case, because we do fsync modified
> files on checkpoint, and rewrite them completely from WAL images after
> that.  But I thought I'd mention it.
> 


alvora,

thanks a lot.
we have some reports about sun systems.
meanwhile i got the impression that the filesystem might be doing 
something wrong. i have seen that the page is not completely zeroed out. 
at some strange positions there are 2 bytes of crap (i have overlooked 
that at first glance). the first couple hundreds of bytes are crap, 
however. very strange ...
best regards,
    hans

-- 
Cybertec Geschwinde & Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

Re: random system table corruption ...

From

Hans-Juergen Schoenig

Date:

15 September 2005, 04:57:15

alvora,

what concerns me here: this is a sun system and the problem happened
during normal operation.
there should not be a recovery related operation. something which is
also interesting: there are two corrupted pages in there (page number
22 and 26).
strange thing :(.
    thanks a lot,
        hans


On 11 Sep 2005, at 20:01, Alvaro Herrera wrote:

> On Sun, Sep 11, 2005 at 01:12:34PM +0200, Hans-Jürgen Schönig wrote:
>
>> in the past we have faced a couple of problems with corrupted system
>> tables. this seems to be a version independent problem which
>> occurs on
>> hackers' from time to time.
>> i have checked a broken file and i have seen that the corrupted
>> page has
>> actually been zeroed out.
>>
>
> IIRC the XFS filesystem zeroes out pages that it recovers from the
> journal but did not have a fsync on them (AFAIK XFS journals only
> metadata, so page creation but not the content itself).  I don't think
> this would be applicable to your case, because we do fsync modified
> files on checkpoint, and rewrite them completely from WAL images after
> that.  But I thought I'd mention it.
>
> --
> Alvaro Herrera -- Valdivia, Chile         Architect,
> www.EnterpriseDB.com
> "Just treat us the way you want to be treated + some extra allowance
>  for ignorance."                                    (Michael Brusser)
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>