O(n) tasks cause lengthy startups and checkpoints - Mailing list pgsql-hackers
From | Bossart, Nathan |
---|---|
Subject | O(n) tasks cause lengthy startups and checkpoints |
Date | |
Msg-id | C1EE64B0-D4DB-40F3-98C8-0CED324D34CB@amazon.com Whole thread Raw |
Responses |
Re: O(n) tasks cause lengthy startups and checkpoints
Re: O(n) tasks cause lengthy startups and checkpoints Re: O(n) tasks cause lengthy startups and checkpoints |
List | pgsql-hackers |
Hi hackers, Thanks to 61752af, SyncDataDirectory() can make use of syncfs() to avoid individually syncing all database files after a crash. However, as noted earlier this year [0], there are still a number of O(n) tasks that affect startup and checkpointing that I'd like to improve. Below, I've attempted to summarize each task and to offer ideas for improving matters. I'll likely split each of these into its own thread, given there is community interest for such changes. 1) CheckPointSnapBuild(): This function loops through pg_logical/snapshots to remove all snapshots that are no longer needed. If there are many entries in this directory, this can take a long time. The note above this function indicates that this is done during checkpoints simply because it is convenient. IIUC there is no requirement that this function actually completes for a given checkpoint. My current idea is to move this to a new maintenance worker. 2) CheckPointLogicalRewriteHeap(): This function loops through pg_logical/mappings to remove old mappings and flush all remaining ones. IIUC there is no requirement that the "remove old mappings" part must complete for a given checkpoint, but the "flush all remaining" portion allows replay after a checkpoint to only "deal with the parts of a mapping that have been written out after the checkpoint started." Therefore, I think we should move the "remove old mappings" part to a new maintenance worker (probably the same one as for 1), and we should consider using syncfs() for the "flush all remaining" part. (I suspect the main argument against the latter will be that it could cause IO spikes.) 3) RemovePgTempFiles(): This step can delay startup if there are many temporary files to individually remove. This step is already optionally done after a crash via the remove_temp_files_after_crash GUC. I propose that we have startup move the temporary file directories aside and create new ones, and then a separate worker (probably the same one from 1 and 2) could clean up the old files. 4) StartupReorderBuffer(): This step deletes logical slot data that has been spilled to disk. This code appears to be written to avoid deleting different types of files in these directories, but AFAICT there shouldn't be any other files. Therefore, I think we could do something similar to 3 (i.e., move the directories aside during startup and clean them up via a new maintenance worker). I realize adding a new maintenance worker might be a bit heavy-handed, but I think it would be nice to have somewhere to offload tasks that really shouldn't impact startup and checkpointing. I imagine such a process would come in handy down the road, too. WDYT? Nathan [0] https://postgr.es/m/32B59582-AA6C-4609-B08F-2256A271F7A5%40amazon.com
pgsql-hackers by date: