Re: Block-level CRC checks - Mailing list pgsql-hackers
From | Decibel! |
---|---|
Subject | Re: Block-level CRC checks |
Date | |
Msg-id | 2220D58D-DD72-4F17-8DF2-E2ACD25CA774@decibel.org Whole thread Raw |
In response to | Re: Block-level CRC checks (pgsql@mohawksoft.com) |
Responses |
Re: Block-level CRC checks
|
List | pgsql-hackers |
On Sep 30, 2008, at 2:17 PM, pgsql@mohawksoft.com wrote: >> A customer of ours has been having trouble with corrupted data for >> some >> time. Of course, we've almost always blamed hardware (and we've seen >> RAID controllers have their firmware upgraded, among other >> actions), but >> the useful thing to know is when corruption has happened, and where. > > That is an important statement, to know when it happens not > necessarily to > be able to recover the block or where in the block it is corrupt. > Is that > correct? Oh, correcting the corruption would be AWESOME beyond belief! But at this point I'd settle for just knowing it had happened. >> So we've been tasked with adding CRCs to data files. > > CRC or checksum? If the objective is merely general "detection" there > should be some latitude in choosing the methodology for performance. See above. Perhaps the best win would be a case where you could choose which method you wanted. We generally have extra CPU on the servers, so we could afford to burn some cycles with more complex algorithms. >> The idea is that these CRCs are going to be checked just after >> reading >> files from disk, and calculated just before writing it. They are >> just a protection against the storage layer going mad; they are not >> intended to protect against faulty RAM, CPU or kernel. > > It will actually find faults in all if it. If the CPU can't add and/ > or a > RAM location lost a bit, this will blow up just as easily as a bad > block. > It may cause "false identification" of an error, but it will keep a > bad > system from hiding. Well, very likely not, since the intention is to only compute the CRC when we write the block out, at least for now. In the future I would like to be able to detect when a CPU or memory goes bonkers and poops on something, because that's actually happened to us as well. >> The implementation I'm envisioning requires the use of a new relation >> fork to store the per-block CRCs. Initially I'm aiming at a CRC32 >> sum >> for each block. FlushBuffer would calculate the checksum and >> store it >> in the CRC fork; ReadBuffer_common would read the page, calculate the >> checksum, and compare it to the one stored in the CRC fork. > > Hell, all that is needed is a long or a short checksum value in the > block. > I mean, if you just want a sanity test, it doesn't take much. Using a > second relation creates confusion. If there is a CRC discrepancy > between > two different blocks, who's wrong? You need a third "control" to > know. If > the block knows its CRC or checksum and that is in error, the block is > bad. I believe the idea was to make this as non-invasive as possible. And it would be really nice if this could be enabled without a dump/ reload (maybe the upgrade stuff would make this possible?) -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
pgsql-hackers by date: