RE: Parallel heap vacuum - Mailing list pgsql-hackers
From | Hayato Kuroda (Fujitsu) |
---|---|
Subject | RE: Parallel heap vacuum |
Date | |
Msg-id | TYAPR01MB5692772248611889010B6823F55A2@TYAPR01MB5692.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Parallel heap vacuum (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Parallel heap vacuum
|
List | pgsql-hackers |
Dear Sawada-san, > TidStoreBeginIterateShared() is designed for multiple parallel workers > to iterate a shared TidStore. During an iteration, parallel workers > share the iteration state and iterate the underlying radix tree while > taking appropriate locks. Therefore, it's available only for a shared > TidStore. This is required to implement the parallel heap vacuum, > where multiple parallel workers do the iteration on the shared > TidStore. > > On the other hand, TidStoreBeginIterate() is designed for a single > process to iterate a TidStore. It accepts even a shared TidStore as > you mentioned, but during an iteration there is no inter-process > coordination such as locking. When it comes to parallel vacuum, > supporting TidStoreBeginIterate() on a shared TidStore is necessary to > cover the case where we use only parallel index vacuum but not > parallel heap scan/vacuum. In this case, we need to store dead tuple > TIDs on the shared TidStore during heap scan so parallel workers can > use it during index vacuum. But it's not necessary to use > TidStoreBeginIterateShared() because only one (leader) process does > heap vacuum. Okay, thanks for the description. I felt it is OK to keep. I read 0001 again and here are comments. 01. vacuumlazy.c ``` +#define LV_PARALLEL_SCAN_SHARED 0xFFFF0001 +#define LV_PARALLEL_SCAN_DESC 0xFFFF0002 +#define LV_PARALLEL_SCAN_DESC_WORKER 0xFFFF0003 ``` I checked other DMS keys used for parallel work, and they seems to have name like PARALEL_KEY_XXX. Can we follow it? 02. LVRelState ``` + BlockNumber next_fsm_block_to_vacuum; ``` Only the attribute does not have comments Can we add like: "Next freespace map page to be checked"? 03. parallel_heap_vacuum_gather_scan_stats ``` + vacrel->scan_stats->vacuumed_pages += ss->vacuumed_pages; ``` Note that `scan_stats->vacuumed_pages` does not exist in 0001, it is defined in 0004. Can you move it? 04. heap_parallel_vacuum_estimate ``` + + heap_parallel_estimate_shared_memory_size(rel, nworkers, &pscan_len, + &shared_len, &pscanwork_len); + + /* space for PHVShared */ + shm_toc_estimate_chunk(&pcxt->estimator, shared_len); + shm_toc_estimate_keys(&pcxt->estimator, 1); + + /* space for ParallelBlockTableScanDesc */ + pscan_len = table_block_parallelscan_estimate(rel); + shm_toc_estimate_chunk(&pcxt->estimator, pscan_len); + shm_toc_estimate_keys(&pcxt->estimator, 1); + + /* space for per-worker scan state, PHVScanWorkerState */ + pscanwork_len = mul_size(sizeof(PHVScanWorkerState), nworkers); + shm_toc_estimate_chunk(&pcxt->estimator, pscanwork_len); + shm_toc_estimate_keys(&pcxt->estimator, 1); ``` I feel pscan_len and pscanwork_len are calclated in heap_parallel_estimate_shared_memory_size(). Can we remove table_block_parallelscan_estimate() and mul_size() from here? 05. Idea Can you update documentations? 06. Idea AFAICS pg_stat_progress_vacuum does not contain information related with the parallel execution. How do you think adding an attribute which shows a list of pids? Not sure it is helpful for users but it can show the parallelism. Best regards, Hayato Kuroda FUJITSU LIMITED
pgsql-hackers by date: