Re: 16-bit page checksums for 9.2 - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: 16-bit page checksums for 9.2 |
Date | |
Msg-id | CA+U5nMJ-Sc5rrSQqu5ed1sbUbM8Zr-BcZYwhqazSB3q99aFOmw@mail.gmail.com Whole thread Raw |
In response to | Re: 16-bit page checksums for 9.2 (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: 16-bit page checksums for 9.2
Re: 16-bit page checksums for 9.2 Re: 16-bit page checksums for 9.2 |
List | pgsql-hackers |
On Thu, Feb 9, 2012 at 3:16 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, Feb 8, 2012 at 2:05 PM, Noah Misch <noah@leadboat.com> wrote: >> On Wed, Feb 08, 2012 at 11:40:34AM +0000, Simon Riggs wrote: >>> On Wed, Feb 8, 2012 at 3:24 AM, Noah Misch <noah@leadboat.com> wrote: >>> > On Tue, Feb 07, 2012 at 08:58:59PM +0000, Simon Riggs wrote: >>> >> On Thu, Jan 26, 2012 at 8:20 PM, Noah Misch <noah@leadboat.com> wrote: >>> >> > This patch uses FPIs to guard against torn hint writes, even when the >>> >> > checksums are disabled. ?One could not simply condition them on the >>> >> > page_checksums setting, though. ?Suppose page_checksums=off and we're hinting >>> >> > a page previously written with page_checksums=on. ?If that write tears, >>> >> > leaving the checksum intact, that block will now fail verification. ?A couple >>> >> > of ideas come to mind. ?(a) Read the on-disk page and emit an FPI only if the >>> >> > old page had a checksum. ?(b) Add a checksumEnableLSN field to pg_control. >>> >> > Whenever the cluster starts with checksums disabled, set the field to >>> >> > InvalidXLogRecPtr. ?Whenever the cluster starts with checksums enabled and the >>> >> > field is InvalidXLogRecPtr, set the field to the next LSN. ?When a checksum >>> >> > failure occurs in a page with LSN older than the stored one, emit either a >>> >> > softer warning or no message at all. >>> >> >>> >> We can only change page_checksums at restart (now) so the above seems moot. >>> >> >>> >> If we crash with FPWs enabled we repair any torn pages. >>> > >>> > There's no live bug, but that comes at a high cost: the patch has us emit >>> > full-page images for hint bit writes regardless of the page_checksums setting. >>> >>> Sorry, I don't understand what you mean. I don't see any failure cases >>> that require that. >>> >>> page_checksums can only change at a shutdown checkpoint, >>> >>> The statement "If that write tears, >>> >> > leaving the checksum intact, that block will now fail verification." >>> cannot happen, ISTM. >>> >>> If we write out a block we update the checksum if page_checksums is >>> set, or we clear it. If we experience a torn page at crash, the FPI >>> corrects that, so the checksum never does fail verification. We only >>> need to write a FPI when we write with checksums. >>> >>> If that's wrong, please explain a failure case in detail. >> >> In the following description, I will treat a heap page as having two facts. >> The first is either CHKSUM or !CHKSUM, and the second is either HINT or !HINT. >> A torn page write lacking the protection of a full-page image can update one >> or both facts despite the transaction having logically updated both. (This is >> just the normal torn-page scenario.) Given that, consider the following >> sequence of events: >> >> 1. startup with page_checksums = on >> 2. update page P with facts CHKSUM, !HINT >> 3. clean write of P >> 4. clean shutdown >> >> 5. startup with page_checksums = off >> 6. update P with facts !CHKSUM, HINT. no WAL since page_checksums=off >> 7. begin write of P >> 8. crash. torn write of P only updates HINT. disk state now CHKSUM, HINT >> >> 9. startup with page_checksums = off >> 10. crash recovery. does not touch P, because step 6 generated no WAL >> 11. clean shutdown >> >> 12. startup with page_checksums = on >> 13. read P. checksum failure >> >> Again, this cannot happen with checksum16_with_wallogged_hint_bits.v7.patch, >> because step 6 _does_ write WAL. That ought to change for the sake of >> page_checksums=off performance, at which point we must protect the above >> scenario by other means. > > Thanks for explaining. > > Logic fixed. > >>> >> > PageSetLSN() is not atomic, so the shared buffer content lock we'll be holding >>> >> > is insufficient. >>> >> >>> >> Am serialising this by only writing PageLSN while holding buf hdr lock. >>> > >>> > That means also taking the buffer header spinlock in every PageGetLSN() caller >>> > holding only a shared buffer content lock. ?Do you think that will pay off, >>> > versus the settled pattern of trading here your shared buffer content lock for >>> > an exclusive one? >>> >>> Yes, I think it will pay off. This is the only code location where we >>> do that, and we are already taking the buffer header lock, so there is >>> effectively no overhead. >> >> The sites I had in the mind are the calls to PageGetLSN() in FlushBuffer() >> (via BufferGetLSN()) and XLogCheckBuffer(). We don't take the buffer header >> spinlock at either of those locations. If they remain safe under the new >> rules, we'll need comments explaining why. I think they may indeed be safe, >> but it's far from obvious. > > OK, some slight rework required to do that, no problems. > > I checked all other call sites and have updated README to explain. v8 attached -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: