Re: Block-level CRC checks - Mailing list pgsql-hackers
From | Josh Berkus |
---|---|
Subject | Re: Block-level CRC checks |
Date | |
Msg-id | 4B156C4B.9000905@agliodbs.com Whole thread Raw |
In response to | Re: Block-level CRC checks (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Block-level CRC checks
Re: Block-level CRC checks |
List | pgsql-hackers |
All, I feel strongly that we should be verifying pages on write, or at least providing the option to do so, because hardware is simply not reliable.And a lot of our biggest users are having issues;it seems pretty much guarenteed that if you have more than 20 postgres servers, at least one of them will have bad memory, bad RAID and/or a bad driver. (and yes, InnoDB, DB2 and Oracle do provide tools to detect hardware corruption when it happens. Oracle even provides correction tools. We are *way* behind them in this regard) There are two primary conditions we are testing for: (a) bad RAM, which happens as frequently as 8% of the time on commodity servers, and given a sufficient amount of RAM happens 99% of the time due to quantum effects, and (b) bad I/O, in the form of bad drivers, bad RAID, and/or bad disks. Our users want to potentially take two degrees of action on this: 1. detect the corruption immediately when it happens, so that they can effectively troubleshoot the cause of the corruption, and potentially shut down the database before further corruption occurs and while they still have clean backups. 2. make an attempt to fix the corrupted page before/immediately after it is written. Further, based on talking to some of these users who are having chronic and not-debuggable issues on their sets of 100's of PostgreSQL servers, there are some other specs: -- Many users would be willing to sacrifice significant performance (up to 20%) as a start-time option in order to be "corruption-proof". -- Even more users would only be interested in using the anti-corruption options after they know they have a problem to troubleshoot it, and then turn the corruption detection back off. So, based on my conversations with users, what we really want is a solution which does (1) for both (a) and (b) as a start-time option, and having siginificant performance overhead for this option is OK. Now, does block-level CRCs qualify? The problem I have with CRC checks is that it only detects bad I/O, and is completely unable to detect data corruption due to bad memory. This means that really we want a different solution which can detect both bad RAM and bad I/O, and should only fall back on CRC checks if we're unable to devise one. One of the things Simon and I talked about in Japan is that most of the time, data corruption makes the data page and/or tuple unreadable. So, checking data format for readable pages and tuples (and index nodes) both before and after write to disk (the latter would presumably be handled by the bgwriter and/or checkpointer) would catch a lot of kinds of corruption before they had a chance to spread. However, that solution would not detect subtle corruption, like single-bit-flipping issues caused by quantum errors. Also, it would require reading back each page as it's written to disk, which is OK for a bunch of single-row writes, but for bulk data loads a significant problem. So, what I'm saying is that I think we really want a better solution, and am throwing this out there to see if anyone is clever enough. --Josh Berkus
pgsql-hackers by date: