Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch) - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch) |
Date | |
Msg-id | CA+fd4k5W+AXFFfAD=UucYzunxDOCY57qVH0qO_5dHhr7SXYecA@mail.gmail.com Whole thread Raw |
In response to | Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch) (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>) |
Responses |
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch) |
List | pgsql-hackers |
On Mon, 30 Mar 2020 at 15:46, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Sun, 29 Mar 2020 at 20:44, Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Sun, 29 Mar 2020 at 20:15, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Sun, Mar 29, 2020 at 1:44 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > > > On Sun, Mar 29, 2020 at 9:52 AM Masahiko Sawada > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > I've run vacuum with/without parallel workers on the table having 5 > > > > > indexes. The vacuum reads all blocks of table and indexes. > > > > > > > > > > * VACUUM command with no parallel workers > > > > > =# select total_time, shared_blks_hit, shared_blks_read, > > > > > shared_blks_hit + shared_blks_read as total_read_blks, > > > > > shared_blks_dirtied, shared_blks_written from pg_stat_statements where > > > > > query ~ 'vacuum'; > > > > > > > > > > total_time | shared_blks_hit | shared_blks_read | total_read_blks | > > > > > shared_blks_dirtied | shared_blks_written > > > > > --------------+-----------------+------------------+-----------------+---------------------+--------------------- > > > > > 19857.217207 | 45238 | 226944 | 272182 | > > > > > 225943 | 225894 > > > > > (1 row) > > > > > > > > > > * VACUUM command with 4 parallel workers > > > > > =# select total_time, shared_blks_hit, shared_blks_read, > > > > > shared_blks_hit + shared_blks_read as total_read_blks, > > > > > shared_blks_dirtied, shared_blks_written from pg_stat_statements where > > > > > query ~ 'vacuum'; > > > > > > > > > > total_time | shared_blks_hit | shared_blks_read | total_read_blks | > > > > > shared_blks_dirtied | shared_blks_written > > > > > -------------+-----------------+------------------+-----------------+---------------------+--------------------- > > > > > 6932.117365 | 45205 | 73079 | 118284 | > > > > > 72403 | 72365 > > > > > (1 row) > > > > > > > > > > The total number of blocks of table and indexes are about 182243 > > > > > blocks. As Julien reported, obviously the total number of read blocks > > > > > during parallel vacuum is much less than single process vacuum's > > > > > result. > > > > > > > > > > Parallel create index has the same issue but it doesn't exist in > > > > > parallel queries for SELECTs. > > > > > > > > > > I think we need to change parallel maintenance commands so that they > > > > > report buffer usage like what ParallelQueryMain() does; prepare to > > > > > track buffer usage during query execution by > > > > > InstrStartParallelQuery(), and report it by InstrEndParallelQuery() > > > > > after parallel maintenance command. To report buffer usage of parallel > > > > > maintenance command correctly, I'm thinking that we can (1) change > > > > > parallel create index and parallel vacuum so that they prepare > > > > > gathering buffer usage, or (2) have a common entry point for parallel > > > > > maintenance commands that is responsible for gathering buffer usage > > > > > and calling the entry functions for individual maintenance command. > > > > > I'll investigate it more in depth. > > > > > > > > As I just mentioned, (2) seems like a better design as it's quite > > > > likely that the number of parallel-aware utilities will probably > > > > continue to increase. One problem also is that parallel CREATE INDEX > > > > has been introduced in pg11, so (2) probably won't be packpatchable > > > > (and (1) seems problematic too). > > > > > > > > > > I am not sure if we can decide at this stage whether it is > > > back-patchable or not. Let's first see the patch and if it turns out > > > to be complex, then we can try to do some straight-forward fix for > > > back-branches. > > > > Agreed. > > > > > In general, I don't see why the fix here should be > > > complex? > > > > Yeah, particularly the approach (1) will not be complex. I'll write a > > patch tomorrow. > > > > I've attached two patches fixing this issue for parallel index > creation and parallel vacuum. These approaches take the same approach; > we allocate DSM to share buffer usage and the leader gathers them, > described as approach (1) above. I think this is a straightforward > approach for this issue. We can create a common entry point for > parallel maintenance command that is responsible for gathering buffer > usage as well as sharing query text etc. But it will accompany > relatively big change and it might be overkill at this stage. We can > discuss that and it will become an item for PG14. > The patch for vacuum conflicts with recent changes in vacuum. So I've attached rebased one. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: