Re: Online checksums patch - once again - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Online checksums patch - once again |
Date | |
Msg-id | d89d50a9-06b8-dff6-d80d-9bd3a841216c@iki.fi Whole thread Raw |
In response to | Re: Online checksums patch - once again (Daniel Gustafsson <daniel@yesql.se>) |
Responses |
Re: Online checksums patch - once again
|
List | pgsql-hackers |
On 17/11/2020 10:56, Daniel Gustafsson wrote: >> On 5 Oct 2020, at 13:36, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> 2. The signaling between enable_data_checksums() and the launcher process looks funny to me. The general idea seems tobe that enable_data_checksums() just starts the launcher process, and the launcher process figures out what it need todo and makes all the changes to the global state. But then there's this violation of the idea: enable_data_checksums()checks DataChecksumsOnInProgress(), and tells the launcher process whether it should continue a previouslycrashed operation or start from scratch. I think it would be much cleaner if the launcher process figured thatout itself, and enable_data_checksums() would just tell the launcher what the target state is. >> >> enable_data_checksums() and disable_data_checksums() seem prone to race conditions. If you call enable_data_checksums()in two backends concurrently, depending on the timing, there are two possible outcomes: >> >> a) One call returns true, and launches the background process. The other call returns false. >> >> b) Both calls return true, but one of them emits a "NOTICE: data checksums worker is already running". >> >> In disable_data_checksum() imagine what happens if another backend calls enable_data_checksums() in between the ShutdownDatachecksumsWorkerIfRunning()and SetDataChecksumsOff() calls. > > I've reworked this in the attached such that the enable_ and disable_ functions > merely call into the launcher with the desired outcome, and the launcher is > responsible for figuring out the rest. The datachecksumworker is now the sole > place which initiates a state transfer. Well, you still fill the DatachecksumsWorkerShmem->operations array in the backend process that launches the datacheckumworker, not in the worker process. I find that still a bit surprising, but I believe it works. >>> /* >>> * Mark the buffer as dirty and force a full page write. We have to >>> * re-write the page to WAL even if the checksum hasn't changed, >>> * because if there is a replica it might have a slightly different >>> * version of the page with an invalid checksum, caused by unlogged >>> * changes (e.g. hintbits) on the master happening while checksums >>> * were off. This can happen if there was a valid checksum on the page >>> * at one point in the past, so only when checksums are first on, then >>> * off, and then turned on again. Iff wal_level is set to "minimal", >>> * this could be avoided iff the checksum is calculated to be correct. >>> */ >>> START_CRIT_SECTION(); >>> MarkBufferDirty(buf); >>> log_newpage_buffer(buf, false); >>> END_CRIT_SECTION(); >> >> It's really unfortunate that we have to dirty the page even if the checksum already happens to match. Could we only dothe log_newpage_buffer() call and skip MarkBufferDirty() in that case? > > I think we can, but I've intentionally stayed away from such optimizations > until the basic version of the patch was deemed safe and approaching done. > It's complicated enough as it is IMO. Fair enough. > The attached fixes the above plus a few other small things found while hacking > on this version. I haven't read through the whole patch, but a few random comments below, in no particular order: pg_enable/disable_data_checksums() should perform a superuser-check. I don't think we want to expose them to any users. > +/* > + * Local state for Controlfile data_checksum_version. After initialization, > + * this is only updated when absorbing a procsignal barrier during interrupt > + * processing. Thus, it can be read by backends without the need for a lock. > + * Possible values are the checksum versions defined in storage/bufpage.h and > + * zero for when checksums are disabled. > + */ > +static uint32 LocalDataChecksumVersion = 0; The comment is a bit confusing: "Thus, it can be read by backends without the need for a lock". Since it's a variable in backend-private memory, it can only be read by the same backend, not "backends". And that's also why you don't need a lock, not because it's updated during interrupt processing. I understand how this works, but maybe rephrase the comment a bit. > +/* > + * DataChecksumsOnInProgress > + * Returns whether data checksums are being enabled > + * > + * Most callsites shouldn't need to worry about the "inprogress" states, since > + * they should check the requirement for verification or writing. Some low- > + * level callsites dealing with page writes need however to know. It's also > + * used to check for aborted checksum processing which need to be restarted. > */ > bool > -DataChecksumsEnabled(void) > +DataChecksumsOnInProgress(void) > +{ > + return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION); > +} s/need/needs/. The whole paragraph feels a bit awkwardly worded in general. I'd suggest something like: "Most operations don't need to worry about the "inprogress" states, and should use DataChecksumsNeedVerify() or DataChecksumsNeedWrite()". DataChecksumsOffInProgress() is called from StartDatachecksumsWorkerLauncher(), which I wouldn't call a "low-level callsite". > @@ -1079,7 +1091,7 @@ XLogInsertRecord(XLogRecData *rdata, > Assert(RedoRecPtr < Insert->RedoRecPtr); > RedoRecPtr = Insert->RedoRecPtr; > } > - doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites); > + doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || DataChecksumsOnInProgress()); > > if (doPageWrites && > (!prevDoPageWrites || The comment above this deserves to be updated. Also, this is a very hot codepath; will this slow down WAL-logging, when full-page writes are disabled? Could we inline DataChecksumsOnInProgress() or set Insert->forcePageWrites when checksums are being computed or something? In StartupXLOG(): > + /* > + * If data checksums were being disabled when the cluster was shutdown, we > + * know that we have a state where all backends have stopped validating > + * checksums and we can move to off instead. > + */ > + if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION) > + { > + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); > + ControlFile->data_checksum_version = 0; > + LWLockRelease(ControlFileLock); > + } > + Should this be WAL-logged, like in SetDataChecksumsOff()? In SetDataChecksumsOff(): > + ControlFile->data_checksum_version = 0; > + UpdateControlFile(); > + LWLockRelease(ControlFileLock); > + > + XlogChecksums(0); > + WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_CHECKSUM_OFF)); > +} What happens is if you crash between UpdateControlFile() and XlogChecksum()? - Heikki
pgsql-hackers by date: