Re: Online enabling of checksums - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Online enabling of checksums |
Date | |
Msg-id | e78bb22b-3f22-9f17-b9d8-7d76829cee43@2ndquadrant.com Whole thread Raw |
In response to | Re: Online enabling of checksums (Stephen Frost <sfrost@snowman.net>) |
Responses |
Re: Online enabling of checksums
|
List | pgsql-hackers |
On 09/29/2018 06:51 PM, Stephen Frost wrote: > Greetings, > > * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote: >> On 09/29/2018 02:19 PM, Stephen Frost wrote: >>> * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote: >>>> While looking at the online checksum verification patch (which I guess >>>> will get committed before this one), it occurred to me that disabling >>>> checksums may need to be more elaborate, to protect against someone >>>> using the stale flag value (instead of simply switching to "off" >>>> assuming that's fine). >>>> >>>> The signals etc. seem good enough for our internal stuff, but what if >>>> someone uses the flag in a different way? E.g. the online checksum >>>> verification runs as an independent process (i.e. not a backend) and >>>> reads the control file to find out if the checksums are enabled or not. >>>> So if we just switch from "on" to "off" that will break. >>>> >>>> Of course, we may also say "Don't disable checksums while online >>>> verification is running!" but that's not ideal. >>> >>> I'm not really sure what else we could say here..? I don't particularly >>> see an issue with telling people that if they disable checksums while >>> they're running a tool that's checking the checksums that they're going >>> to get odd results. >> >> I don't know, to be honest. I was merely looking at the online >> verification patch and realized that if someone disables checksums it >> won't notice it (because it only reads the flag once, at the very >> beginning) and will likely produce bogus errors. >> >> Although, maybe it won't - it now uses a checkpoint LSN, so that might >> fix it. The checkpoint LSN is read from the same controlfile as the >> flag, so we know the checksums were enabled during that checkpoint. Soi >> if we ignore failures with a newer LSN, that should do the trick, no? >> >> So perhaps that's the right "protocol" to handle this? > > I certainly don't think we need to do anything more. > Not sure I agree. I'm not suggesting we absolutely have to write huge amount of code to deal with this issue, but I hope we agree we need to at least understand the issue so that we can put warnings into docs. FWIW pg_basebackup (in the default "verify checksums") has this issue too AFAICS, and it seems rather unfriendly to just start reporting checksum errors during backup in that case. But as I mentioned, maybe there's no problem at all and using the checkpoint LSN deals with it automatically. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: