Home > mailing lists

Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch) - Mailing list pgsql-hackers

From	Masahiko Sawada
Subject	Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
Date	March 30, 2020 07:01:18
Msg-id	CA+fd4k5W+AXFFfAD=UucYzunxDOCY57qVH0qO_5dHhr7SXYecA@mail.gmail.com Whole thread Raw
In response to	Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch) (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Responses	Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch) Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
List	pgsql-hackers

Tree view

On Mon, 30 Mar 2020 at 15:46, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Sun, 29 Mar 2020 at 20:44, Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Sun, 29 Mar 2020 at 20:15, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Sun, Mar 29, 2020 at 1:44 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > > >
> > > > On Sun, Mar 29, 2020 at 9:52 AM Masahiko Sawada
> > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > >
> > > > > I've run vacuum with/without parallel workers on the table having 5
> > > > > indexes. The vacuum reads all blocks of table and indexes.
> > > > >
> > > > > * VACUUM command with no parallel workers
> > > > > =# select total_time, shared_blks_hit, shared_blks_read,
> > > > > shared_blks_hit + shared_blks_read as total_read_blks,
> > > > > shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> > > > > query ~ 'vacuum';
> > > > >
> > > > >   total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
> > > > > shared_blks_dirtied | shared_blks_written
> > > > >
--------------+-----------------+------------------+-----------------+---------------------+---------------------
> > > > >  19857.217207 |           45238 |           226944 |          272182 |
> > > > >              225943 |              225894
> > > > > (1 row)
> > > > >
> > > > > * VACUUM command with 4 parallel workers
> > > > > =# select total_time, shared_blks_hit, shared_blks_read,
> > > > > shared_blks_hit + shared_blks_read as total_read_blks,
> > > > > shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> > > > > query ~ 'vacuum';
> > > > >
> > > > >  total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
> > > > > shared_blks_dirtied | shared_blks_written
> > > > >
-------------+-----------------+------------------+-----------------+---------------------+---------------------
> > > > >  6932.117365 |           45205 |            73079 |          118284 |
> > > > >              72403 |               72365
> > > > > (1 row)
> > > > >
> > > > > The total number of blocks of table and indexes are about 182243
> > > > > blocks. As Julien reported, obviously the total number of read blocks
> > > > > during parallel vacuum is much less than single process vacuum's
> > > > > result.
> > > > >
> > > > > Parallel create index has the same issue but it doesn't exist in
> > > > > parallel queries for SELECTs.
> > > > >
> > > > > I think we need to change parallel maintenance commands so that they
> > > > > report buffer usage like what ParallelQueryMain() does; prepare to
> > > > > track buffer usage during query execution by
> > > > > InstrStartParallelQuery(), and report it by InstrEndParallelQuery()
> > > > > after parallel maintenance command. To report buffer usage of parallel
> > > > > maintenance command correctly, I'm thinking that we can (1) change
> > > > > parallel create index and parallel vacuum so that they prepare
> > > > > gathering buffer usage, or (2) have a common entry point for parallel
> > > > > maintenance commands that is responsible for gathering buffer usage
> > > > > and calling the entry functions for individual maintenance command.
> > > > > I'll investigate it more in depth.
> > > >
> > > > As I just mentioned, (2) seems like a better design as it's quite
> > > > likely that the number of parallel-aware utilities will probably
> > > > continue to increase.  One problem also is that parallel CREATE INDEX
> > > > has been introduced in pg11, so (2) probably won't be packpatchable
> > > > (and (1) seems problematic too).
> > > >
> > >
> > > I am not sure if we can decide at this stage whether it is
> > > back-patchable or not.  Let's first see the patch and if it turns out
> > > to be complex, then we can try to do some straight-forward fix for
> > > back-branches.
> >
> > Agreed.
> >
> > > In general, I don't see why the fix here should be
> > > complex?
> >
> > Yeah, particularly the approach (1) will not be complex. I'll write a
> > patch tomorrow.
> >
>
> I've attached two patches fixing this issue for parallel index
> creation and parallel vacuum. These approaches take the same approach;
> we allocate DSM to share buffer usage and the leader gathers them,
> described as approach (1) above. I think this is a straightforward
> approach for this issue. We can create a common entry point for
> parallel maintenance command that is responsible for gathering buffer
> usage as well as sharing query text etc. But it will accompany
> relatively big change and it might be overkill at this stage. We can
> discuss that and it will become an item for PG14.
>

The patch for vacuum conflicts with recent changes in vacuum. So I've
attached rebased one.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

bufferusage_vacuum_v2.patch

pgsql-hackers by date:

From: David Rowley
Date: 30 March 2020, 06:49:35
Subject: Re: Berserk Autovacuum (let's save next Mandrill)

From: Michael Paquier
Date: 30 March 2020, 07:43:00
Subject: Re: [HACKERS] WAL logging problem in 9.4.3?

Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch) - Mailing list pgsql-hackers

Attachment

Previous

Next