Re: Online checksums patch - once again - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Online checksums patch - once again |
Date | |
Msg-id | c29d7e22-c40b-8a5b-72ed-d7ea87741403@iki.fi Whole thread Raw |
In response to | Re: Online checksums patch - once again (Daniel Gustafsson <daniel@yesql.se>) |
Responses |
Re: Online checksums patch - once again
|
List | pgsql-hackers |
Revisiting an issue we discussed earlier: On 25/11/2020 15:20, Daniel Gustafsson wrote: > On 23 Nov 2020, at 18:36, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote: >> On 17/11/2020 10:56, Daniel Gustafsson wrote: >>> I've reworked this in the attached such that the enable_ and >>> disable_ functions merely call into the launcher with the desired >>> outcome, and the launcher is responsible for figuring out the >>> rest. The datachecksumworker is now the sole place which >>> initiates a state transfer. >> >> Well, you still fill the DatachecksumsWorkerShmem->operations array >> in the backend process that launches the datacheckumworker, not in >> the worker process. I find that still a bit surprising, but I >> believe it works. > > I'm open to changing it in case there are strong opinions, it just > seemed the most natural to me. This kept bothering me, so I spent a while hacking this to my liking. The attached patch moves the code to fill in 'operations' from the backend to the launcher, so that the pg_enable/disable_checksums() call now truly just stores the desired state, checksum on or off, in shared memory, and launches the launcher process. The launcher process figures out how to get to the desired state. This removes the couple of corner cases that previously emitted a NOTICE about the processing being concurrently disabled or aborted. What do you think? I haven't done much testing, so if you adopt this approach, please check if I broke something in the process. This changes the way the abort works. If the pg_enable/disable_checksums() is called, while the launcher is already busy changing the state, pg_enable/disable_checksums() will just set the new desired state in shared memory anyway. The launcher process will notice that the target state changed some time later, and restart from scratch. A couple of other issues came up while doing that: - AbortProcessing() has two callers, one in code that runs in the launcher process, and another one in code that runs in the worker process. Is it really safe to use from the worker process? Calling ProcessAllDatabases() in the worker seems sketchy. (This is moot in this new patch version, as I removed AbortProcessing() altogether) - Is it possible that the worker keeps running after the launcher has already exited, e.g. because of an ERROR or SIGTERM? If you then quickly call pg_enable_checksums() again, can you end up with two workers running at the same time? Is that bad? On 26/01/2021 23:00, Daniel Gustafsson wrote: >> On 22 Jan 2021, at 12:55, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >>> @@ -3567,6 +3571,27 @@ RelationBuildLocalRelation(const char *relname, >>> relkind == RELKIND_MATVIEW) >>> RelationInitTableAccessMethod(rel); >>> + /* >>> + * Set the data checksum state. Since the data checksum state can change at >>> + * any time, the fetched value might be out of date by the time the >>> + * relation is built. DataChecksumsNeedWrite returns true when data >>> + * checksums are: enabled; are in the process of being enabled (state: >>> + * "inprogress-on"); are in the process of being disabled (state: >>> + * "inprogress-off"). Since relhaschecksums is only used to track progress >>> + * when data checksums are being enabled, and going from disabled to >>> + * enabled will clear relhaschecksums before starting, it is safe to use >>> + * this value for a concurrent state transition to off. >>> + * >>> + * If DataChecksumsNeedWrite returns false, and is concurrently changed to >>> + * true then that implies that checksums are being enabled. Worst case, >>> + * this will lead to the relation being processed for checksums even though >>> + * each page written will have them already. Performing this last shortens >>> + * the window, but doesn't avoid it. >>> + */ >>> + HOLD_INTERRUPTS(); >>> + rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite(); >>> + RESUME_INTERRUPTS(); >>> + >>> /* >>> * Okay to insert into the relcache hash table. >>> * >> >> I grepped for relhashcheckums, and concluded that the value in the >> relcache isn't actually used for anything. Not so! In >> heap_create_with_catalog(), the actual pg_class row is constructed >> from the relcache entry, so the value set in >> RelationBuildLocalRelation() finds its way to pg_class. Perhaps it >> would be more clear to pass relhachecksums directly as an argument >> to AddNewRelationTuple(). That way, the value in the relcache would >> be truly never used. > > I might be thick (or undercaffeinated) but I'm not sure I follow. > AddNewRelationTuple calls InsertPgClassTuple which in turn avoids the > relcache entry. Ah, you're right, I misread AddNewRelationTuple. That means that the relhaschecksums field in the relcache is never used? That's a clear rule. The changes to formrdesc() and RelationBuildLocalRelation() seem unnecessary then, we can always initialize relhaschecksums to false in the relcache. - Heikki
Attachment
pgsql-hackers by date: