Re: [PATCH] Verify Checksums during Basebackups - Mailing list pgsql-hackers
From | Michael Banck |
---|---|
Subject | Re: [PATCH] Verify Checksums during Basebackups |
Date | |
Msg-id | 1521797787.15036.23.camel@credativ.de Whole thread Raw |
In response to | Re: [PATCH] Verify Checksums during Basebackups (David Steele <david@pgmasters.net>) |
Responses |
Re: [PATCH] Verify Checksums during Basebackups
|
List | pgsql-hackers |
Hi David, thanks for the review! Am Donnerstag, den 22.03.2018, 12:22 -0400 schrieb David Steele: > On 3/17/18 5:34 PM, Michael Banck wrote: > > On Fri, Mar 09, 2018 at 10:35:33PM +0100, Michael Banck wrote: > > I think most people (including those I had off-list discussions about > > this with) were of the opinion that such an option should be there, so I > > added an additional option NOVERIFY_CHECKSUMS to the BASE_BACKUP > > replication command and also an option -k / --no-verify-checksums to > > pg_basebackup to trigger this. > > > > Updated patch attached. > > + memcpy(page, (buf + BLCKSZ * i), BLCKSZ); > > Why make a copy here? How about: > > char *page = buf + BLCKSZ * i Right, ok. > I know pg_checksum_page manipulates the checksum field but I have found > it to be safe. > > + if (phdr->pd_checksum != checksum) > > I've attached a patch that adds basic retry functionality. It's not > terrible efficient since it rereads the entire buffer for any block > error. A better way is to keep a bitmap for each block in the buffer, > then on retry compare bitmaps. If the block is still bad, report it. > If the block was corrected moved on. If a block was good before but is > bad on retry it can be ignored. I have to admit I find it a bit convoluted and non-obvious on first reading, but I'll try to check it out some more. I think it would be very useful if we could come up with a testcase showing this problem, but I guess this will be quite hard to hit reproducibly, right? > + ereport(WARNING, > + (errmsg("checksum verification failed in file " > > I'm worried about how verbose this warning could be since there are > 131,072 blocks per segment. It's unlikely to have that many block > errors, but users do sometimes put files in PGDATA which look like they > should be validated. Since these warnings all go to the server log it > could get pretty bad. We only verify checksums of files in tablespaces, and I don't think dropping random files in those is supported in any way, as opposed to maybe the top-level PGDATA directory. So I would say that this is not a real concern. > We should either stop warning after the first failure, or aggregate the > failures for a file into a single message. I agree that major corruption could make the whole output blow up but I would prefer to keep this feature simple for now, which implies possibly printing out a lot of WARNING or maybe just stopping after the first one (or first few, dunno). Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgsql-hackers by date: