Re: 16-bit page checksums for 9.2 - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: 16-bit page checksums for 9.2 |
Date | |
Msg-id | 4EFC4A3B02000025000441E2@gw.wicourts.gov Whole thread Raw |
In response to | 16-bit page checksums for 9.2 (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: 16-bit page checksums for 9.2
|
List | pgsql-hackers |
> Heikki Linnakangas wrote: > Simon Riggs wrote: >> OK, then we are talking at cross purposes. Double write buffers, >> in the way you explain them allow us to remove full page writes. >> They clearly don't do anything to check page validity on read. >> Torn pages are not the only fault we wish to correct against... >> and the double writes idea is orthogonal to the idea of checksums. > > The reason we're talking about double write buffers in this thread > is that double write buffers can be used to solve the problem with > hint bits and checksums. Exactly. Every time the issue of page checksums is raised, there are objections because OS or hardware crashes could cause torn pages for hint-bit-only writes which would be treated as serious errors (potentially indicating hardware failure) when they are in fact expected and benign. Some time before the thread dies, someone generally points out that double-write technology would be a graceful way to handle that, with the side benefit of smaller WAL files. All available evidence suggests it would also allow a small performance improvement, although I hesitate to emphasize that aspect of it; the other benefits fully justify the effort without that. I do feel there is value in a page checksum patch even without torn page protection. The discussion on the list has convinced me that a failed checksum should be treated as seriously as other page format errors, rather than as a warning, even though (in the absence of torn page protection) torn hint-bit-only page writes would be benign. As an example of how this might be useful, consider our central databases which contain all the detail replicated from the circuit court databases in all the counties. These are mission-critical, so we have redundant servers in separate buildings. At one point, one of them experienced hardware problems and we started seeing invalid pages. Since we can shift the load between these servers without down time, we moved all applications to other servers, and investigated. Now, it's possible that for some time before we got errors on the bad pages, there could have been subtle corruption which didn't generate errors but presented bad data on our web site. A page checksum would help prevent that sort of problem, and a post-crash false positive might waste a little time in investigation, but that cost would be far outweighed by the benefit of better accuracy guarantees. Of course, it will be a big plus if we can roll this out in 9.2 in conjunction with a double-write feature. Not only will double-write probably be a bit faster than full_page_writes in the WAL log, but it will allow protection against torn pages on hint-bit-only writes without adding those writes to the WAL or doing any major rearrangement of where they sit that would break pg_upgrade. It would be nice not to have to put all sorts of caveats and explanations into the docs about how a checksum error might be benign due to hint bit writes. -Kevin
pgsql-hackers by date: