Re: [HACKERS] Block level parallel vacuum WIP - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [HACKERS] Block level parallel vacuum WIP |
Date | |
Msg-id | CAD21AoDcH=1t8zi8TiXpaPvfvbsKnrsf_K1jCdnR19T5HAec0A@mail.gmail.com Whole thread Raw |
In response to | Re: Block level parallel vacuum WIP (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: [HACKERS] Block level parallel vacuum WIP
Re: [HACKERS] Block level parallel vacuum WIP |
List | pgsql-hackers |
On Mon, Oct 3, 2016 at 11:00 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Fri, Sep 16, 2016 at 6:56 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Yeah, I don't have a good solution for this problem so far. >> We might need to improve group locking mechanism for the updating >> operation or came up with another approach to resolve this problem. >> For example, one possible idea is that the launcher process allocates >> vm and fsm enough in advance in order to avoid extending fork relation >> by parallel workers, but it's not resolve fundamental problem. > I got some advices at PGConf.ASIA 2016 and started to work on this again. The most big problem so far is the group locking. As I mentioned before, parallel vacuum worker could try to extend the same visibility map page at the same time. So we need to make group locking conflict in some cases, or need to eliminate the necessity of acquiring extension lock. Attached 000 patch uses former idea, which makes the group locking conflict between parallel workers when parallel worker tries to acquire extension lock on same page. I'm not sure this is the best idea but it's very simple and enough to support parallel vacuum. More smart idea could be needed when we want to support parallel DML and so on. 001 patch adds parallel option to VACUUM command. As Robert suggested before, parallel option is set with parallel degree. =# VACUUM (PARALLEL 4) table_name; ..means 4 background processes are launched and background process executes lazy_scan_heap while the launcher (leader) process waits for all vacuum workers finish. If N = 1 or without parallel option, leader process itself executes lazy_scan_heap. Internal Design ============= I changed the parallel vacuum internal design. Collecting garbage on table is processed in block-level parallel. For tables with indexes, each index on table is assigned to each vacuum worker and all garbage on a index are processed by particular assigned vacuum worker. The all spaces for the array of dead tuple TIDs used by vacuum worker are allocated in dynamic shared memory by launcher process. Vacuum worker process stores dead tuple location into its dead tuple array without lock, the TIDs in a dead tuple array are ordered by TID. Note that entire space for dead tuple, that is a bunch of dead tuple array, are not ordered. If table has index, all dead tuple TIDs needs to be shared with all vacuum workers before actual reclaiming dead tuples starts and these data should be cleared after all vacuum worker finished to use them. So I put two synchronization points at where before reclaiming dead tuples and where after finished to reclaim them. At these points, parallel vacuum worker waits for all other workers to reach to the same point. Once all vacuum workers reached to same point, vacuum worker resumes next operation. For example, If a table has five indexes and we execute parallel lazy vacuum on that table with three vacuum workers, two of three vacuum workers are assigned two indexes and another one vacuum worker is assigned to one indexes. After the amount of dead tuple of all vacuum worker reached to maintenance_work_mem size vacuum worker starts to reclaim dead tuple on table and index. A vacuum worker that is assigned to one index finishes (probably first) and sleeps until other two vacuum workers finish to vacuum. If table has no index then each parallel vacuum worker vacuums each page as we go. Performance =========== I measured the execution time of vacuum on dirty table with several parallel degree in my poor environment. table_size | indexes | parallel_degree | time ------------+---------+-----------------+---------- 6.5GB | 0 | 1 | 00:00:14 6.5GB | 0 | 2 | 00:00:02 6.5GB | 0 | 4 | 00:00:02 6.5GB | 1 | 1 | 00:00:13 6.5GB | 1 | 2 | 00:00:15 6.5GB | 1 | 4 | 00:00:18 6.5GB | 2 | 1 | 00:02:18 6.5GB | 2 | 2 | 00:00:38 6.5GB | 2 | 4 | 00:00:46 13GB | 0 | 1 | 00:03:52 13GB | 0 | 2 | 00:00:49 13GB | 0 | 4 | 00:00:50 13GB | 1 | 1 | 00:01:41 13GB | 1 | 2 | 00:01:59 13GB | 1 | 4 | 00:01:24 13GB | 2 | 1 | 00:12:42 13GB | 2 | 2 | 00:01:17 13GB | 2 | 4 | 00:02:12 In result of my measurement, vacuum execution time got better in some cases but didn't improve in case where index = 1. I'll investigate the cause. ToDo ====== * Vacuum progress support. * Storage parameter support, perhaps parallel_vacuum_workers parameter which allows autovacuum to use parallel vacuum on specified table. I register this to next CF. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: