Thread: random system table corruption ...
in the past we have faced a couple of problems with corrupted system tables. this seems to be a version independent problem which occurs on hackers' from time to time. i have checked a broken file and i have seen that the corrupted page has actually been zeroed out. my question is: are there any options to implement something which makes system tables more robust? the problem is: the described error happens only once i an while and cannot be reproduced. maybe there is a way to add some more sanity checks before the page is actually written. any suggestions? best regards, hans -- Cybertec Geschwinde & Schönig GmbH Schöngrabern 134; A-2020 Hollabrunn Tel: +43/1/205 10 35 / 340 www.postgresql.at, www.cybertec.at
On Sun, Sep 11, 2005 at 01:12:34PM +0200, Hans-Jürgen Schönig wrote: > in the past we have faced a couple of problems with corrupted system > tables. this seems to be a version independent problem which occurs on > hackers' from time to time. > i have checked a broken file and i have seen that the corrupted page has > actually been zeroed out. Near as I can tell, the only times pages are zeroed out is if zero_damaged_pages is set (destroying the evidence) or during WAL recovery. > my question is: are there any options to implement something which makes > system tables more robust? the problem is: the described error happens > only once i an while and cannot be reproduced. maybe there is a way to > add some more sanity checks before the page is actually written. Well, the most common causes are dodgy memory. Other than that I guess you could arrange for bgwriter to check the pages it is writing. I imagine it already does check the header, checking the data requires knowledge about the actual table and attributes. And about the only thing that says "I'm broken" is a varlena value with a long value. As they say, the only thing sure would be to have a backup. the only thing I can imagine being really useful would be a restore mode where you feed it the schema so it can reconstruct the pg_class and pg_attribute just enough for you to dump it to reconstruct everything... You know, VACUUM FREEZE BACKUP on pg_catalog, physically copy the datafiles and offer the option to blat your catalog with an old one... -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Hans-Jürgen Schönig <postgres@cybertec.at> writes: > in the past we have faced a couple of problems with corrupted system > tables. this seems to be a version independent problem which occurs on > hackers' from time to time. > i have checked a broken file and i have seen that the corrupted page has > actually been zeroed out. That sounds to me like a hardware problem --- disk or disk controller momentarily writing zeroes instead of what it should write. Have you seen this on more than one physical machine? Do you have any evidence for the implication that it only happens to system tables and not user tables? Also, you don't have zero_damaged_pages turned on by any chance? regards, tom lane
Tom Lane wrote: > Hans-Jürgen Schönig <postgres@cybertec.at> writes: > >>in the past we have faced a couple of problems with corrupted system >>tables. this seems to be a version independent problem which occurs on >>hackers' from time to time. >>i have checked a broken file and i have seen that the corrupted page has >>actually been zeroed out. > > > That sounds to me like a hardware problem --- disk or disk controller > momentarily writing zeroes instead of what it should write. Have you > seen this on more than one physical machine? Do you have any evidence > for the implication that it only happens to system tables and not user > tables? > > Also, you don't have zero_damaged_pages turned on by any chance? > > regards, tom lane tom, well, there is some evidence that this is not a hardware related issue. we have only seen this problem from time to time but it happened on different machines. it cannot be reproduced. it can even happen when somebody runs a script which has been called million times before. in my current scenario the page header only consists of 0x00 bytes and therefore the page checks fails when reading the system table. i have never seen this in data files up to now (at least not when the hardware was still intact). did anybody face similar problems? maybe on sun? by the way: currently the broken system is running PostgreSQL 7.4 but as I said - we have also seen that on 8.0 once. best regards, hans -- Cybertec Geschwinde & Schönig GmbH Schöngrabern 134; A-2020 Hollabrunn Tel: +43/1/205 10 35 / 340 www.postgresql.at, www.cybertec.at
On Sun, Sep 11, 2005 at 01:12:34PM +0200, Hans-Jürgen Schönig wrote: > in the past we have faced a couple of problems with corrupted system > tables. this seems to be a version independent problem which occurs on > hackers' from time to time. > i have checked a broken file and i have seen that the corrupted page has > actually been zeroed out. IIRC the XFS filesystem zeroes out pages that it recovers from the journal but did not have a fsync on them (AFAIK XFS journals only metadata, so page creation but not the content itself). I don't think this would be applicable to your case, because we do fsync modified files on checkpoint, and rewrite them completely from WAL images after that. But I thought I'd mention it. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com "Just treat us the way you want to be treated + some extra allowancefor ignorance." (MichaelBrusser)
Alvaro Herrera wrote: > On Sun, Sep 11, 2005 at 01:12:34PM +0200, Hans-Jürgen Schönig wrote: > >>in the past we have faced a couple of problems with corrupted system >>tables. this seems to be a version independent problem which occurs on >>hackers' from time to time. >>i have checked a broken file and i have seen that the corrupted page has >>actually been zeroed out. > > > IIRC the XFS filesystem zeroes out pages that it recovers from the > journal but did not have a fsync on them (AFAIK XFS journals only > metadata, so page creation but not the content itself). I don't think > this would be applicable to your case, because we do fsync modified > files on checkpoint, and rewrite them completely from WAL images after > that. But I thought I'd mention it. > alvora, thanks a lot. we have some reports about sun systems. meanwhile i got the impression that the filesystem might be doing something wrong. i have seen that the page is not completely zeroed out. at some strange positions there are 2 bytes of crap (i have overlooked that at first glance). the first couple hundreds of bytes are crap, however. very strange ... best regards, hans -- Cybertec Geschwinde & Schönig GmbH Schöngrabern 134; A-2020 Hollabrunn Tel: +43/1/205 10 35 / 340 www.postgresql.at, www.cybertec.at
alvora, what concerns me here: this is a sun system and the problem happened during normal operation. there should not be a recovery related operation. something which is also interesting: there are two corrupted pages in there (page number 22 and 26). strange thing :(. thanks a lot, hans On 11 Sep 2005, at 20:01, Alvaro Herrera wrote: > On Sun, Sep 11, 2005 at 01:12:34PM +0200, Hans-Jürgen Schönig wrote: > >> in the past we have faced a couple of problems with corrupted system >> tables. this seems to be a version independent problem which >> occurs on >> hackers' from time to time. >> i have checked a broken file and i have seen that the corrupted >> page has >> actually been zeroed out. >> > > IIRC the XFS filesystem zeroes out pages that it recovers from the > journal but did not have a fsync on them (AFAIK XFS journals only > metadata, so page creation but not the content itself). I don't think > this would be applicable to your case, because we do fsync modified > files on checkpoint, and rewrite them completely from WAL images after > that. But I thought I'd mention it. > > -- > Alvaro Herrera -- Valdivia, Chile Architect, > www.EnterpriseDB.com > "Just treat us the way you want to be treated + some extra allowance > for ignorance." (Michael Brusser) > > ---------------------------(end of > broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >