Re: Enabling Checksums - Mailing list pgsql-hackers
From | Greg Smith |
---|---|
Subject | Re: Enabling Checksums |
Date | |
Msg-id | 51339560.8030701@2ndQuadrant.com Whole thread Raw |
In response to | Re: Enabling Checksums (Greg Smith <greg@2ndQuadrant.com>) |
Responses |
Re: Enabling Checksums
Re: Enabling Checksums |
List | pgsql-hackers |
The 16-bit checksum feature seems functional, with two sources of overhead. There's some CPU time burned to compute checksums when pages enter the system. And there's extra overhead for WAL logging hint bits. I'll quantify both of those better in another message. For completeness sake I've attached the latest versions of the patches I tested (same set as my last message) along with the testing programs and source changes that have been useful for my review. I have a test case now demonstrating a tricky issue my gut told me was possible in page header handling, and that's what I talk about most here. = Handling bit errors in page headers = The thing I've been stuck on is trying to find a case where turning checksums on results in data that could be read becoming completely unavailable, after a single bit of corruption. That seemed to me the biggest risk of this feature. If checksumming can result in lost data, where before that data would be available just with some potential for error in it, that's kind of bad. I've created a program that does just that, with a repeatable shell script test case (check-check.sh) This builds on the example I gave before, where I can corrupt a single bit of data in pgbench_accounts (lowest bit in byte 14 in the page) and then reads that page without problems: $ psql -c "select sum(abalance) from pgbench_accounts" sum ----- 0 Corrupting the same bit on a checksums enabled build catches the problem: WARNING: page verification failed, calculated checksum 5900 but expected 9227 ERROR: invalid page header in block 0 of relation base/16384/16397 This is good, because it's exactly the sort of quiet corruption that the feature is supposed to find. But clearly it's *possible* to still read all of the data in this page, because the build without checksums does just that. All of these fail now: $ psql -c "select sum(abalance) from pgbench_accounts" WARNING: page verification failed, calculated checksum 5900 but expected 9227 ERROR: invalid page header in block 0 of relation base/16384/16397 $ psql -c "select * from pgbench_accounts" WARNING: page verification failed, calculated checksum 5900 but expected 9227 ERROR: invalid page header in block 0 of relation base/16384/16397 And you get this sort of mess out of pg_dump: COPY pgbench_accounts (aid, bid, abalance, filler) FROM stdin; pg_dump: WARNING: page verification failed, calculated checksum 5900 but expected 9227 \. pg_dump: Dumping the contents of table "pgbench_accounts" failed: PQgetResult() failed. pg_dump: Error message from server: ERROR: invalid page header in block 0 of relation base/16384/16397 pg_dump: The command was: COPY public.pgbench_accounts (aid, bid, abalance, filler) TO stdout; I think an implicit goal of this feature was to soldier on when possible to do so. The case where something in the page header is corrupted seems the weakest part of that idea. I would still be happy to enable this feature on a lot of servers, because stopping in the case of subtle header corruption just means going to another known good copy of the data; probably a standby server. I could see some people getting surprised by this change though. I'm not sure if it's possible to consider a checksum failure in a page header something that is WARNed about, rather than always treating it as a failure and the data is unavailable (without page inspection tools at least). That seems like the main thing that might be improved in this feature right now. = Testing issues = It is surprisingly hard to get a repeatable test program that corrupts a bit on a data page. If you already have a copy of the page in memory and you corrupt the copy on disk, the corrupted copy won't be noticed. And if you happen to trigger a write of that page, the corruption will quietly be fixed. This is all good, but it's something to be aware of when writing test code. The other thing to watch out for is that you're not hitting an Index-Only Scan anywhere, because then you're bypassing the database page you corrupted. What I've done is come up with a repeatable test case that shows the checksum patch finding a single bit of corruption that is missed by a regular server. The program is named check-check.sh, and a full output run is attached as check-check.log I also added a developer only debugging test patch as show_block_verifications.patch This makes every block read spew a message about what relation it's touching, and proves the checksum mechanism is being hit each time. The main reason I needed that is to make sure the pages I expected to be read were actually the ones being read. When I accidentally was hitting index-only scans for example, I could tell that because it was touching something from pgbench_accounts_pkey instead the pgbench_account table data I was corrupting. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
Attachment
pgsql-hackers by date: