Home > mailing lists

Re: Streaming read-ready sequential scan code - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: Streaming read-ready sequential scan code
Date	April 3, 2024 20:27:35
Msg-id	CA+hUKGLbn-V3muGoObOu3_MENXATa9=p8+JGMRjjd2VYa4vz7A@mail.gmail.com Whole thread Raw
In response to	Re: Streaming read-ready sequential scan code (Melanie Plageman <melanieplageman@gmail.com>)
Responses	Re: Streaming read-ready sequential scan code Re: Streaming read-ready sequential scan code
List	pgsql-hackers

Tree view

On Thu, Apr 4, 2024 at 6:03 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> On Tue, Apr 2, 2024 at 1:10 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > On 01/04/2024 22:58, Melanie Plageman wrote:
> > > Attached v7 has version 14 of the streaming read API as well as a few
> > > small tweaks to comments and code.
> >
> > I saw benchmarks in this thread to show that there's no regression when
> > the data is in cache, but I didn't see any benchmarks demonstrating the
> > benefit of this. So I ran this quick test:
>
> Good point! It would be good to show why we would actually want this
> patch. Attached v8 is rebased over current master (which now has the
> streaming read API).

Anecdotally by all reports I've seen so far, all-in-cache seems to be
consistently a touch faster than master if anything, for streaming seq
scan and streaming ANALYZE.  That's great!, but it doesn't seem to be
due to intentional changes.  No efficiency is coming from batching
anything.  Perhaps it has to do with CPU pipelining effects: though
it's doing the same work as ReadBuffer()-when-it-gets-a-hit, the work
itself is cut up into stages in a kind of pipeline:
read_stream_next_buffer() chooses page n + 2, pins page n + 1 and
returns page n each time you call it, so maybe we get more CPU
parallelism due to spreading the data dependencies out?  (Makes me
wonder what happens if you insert a memory prefetch for the page
header into that production line, and if there are more opportunities
to spread dependencies out eg hashing the buffer tag ahead of time.)

> On the topic of BAS_BULKREAD buffer access strategy, I think the least
> we could do is add an assert like this to read_stream_begin_relation()
> after calculating max_pinned_buffers.
>
>     Assert(GetAccessStrategyBufferCount(strategy) > max_pinned_buffers);
>
> Perhaps we should do more? I think with a ring size of 16 MB, large
> SELECTs are safe for now. But I know future developers will make
> changes and it would be good not to assume they will understand that
> pinning more buffers than the size of the ring effectively invalidates
> the ring.

Yeah I think we should clamp max_pinned_buffers if we see a strategy.
What do you think about:

    if (strategy)
    {
        int strategy_buffers = GetAccessStrategyBufferCount(strategy);
        max_pinned_buffers = Min(strategy_buffers / 2, max_pinned_buffers);
    }

I just don't know where to get that '2'.  The idea would be to
hopefully never actually be constrained by it in practice, except in
tiny/toy setups, so we can't go too wrong with our number '2' there.

Then we should increase the default ring sizes for BAS_BULKREAD and
BAS_VACUUM to make the previous statement true.  The size of main
memory and L2 cache have increased dramatically since those strategies
were invented.  I think we should at least double them, and more
likely quadruple them.  I realise you already made them configurable
per command in commit 1cbbee033, but I mean the hard coded default 256
in freelist.c.  It's not only to get more space for our prefetching
plans, it's also to give the system more chance of flushing WAL
asynchronously/in some other backend before you crash into dirty data;
as you discovered, prefetching accidentally makes that effect worse
in.a BAS_VACUUM strategy, by taking away space that is effectively
deferring WAL flushes, so I'd at least like to double the size for if
we use "/ 2" above.

pgsql-hackers by date:

From: Jeff Davis
Date: 03 April 2024, 20:15:28
Subject: Re: LogwrtResult contended spinlock

From: "Regina Obe"
Date: 03 April 2024, 20:34:48
Subject: RE: Can't compile PG 17 (master) from git under Msys2 autoconf

Re: Streaming read-ready sequential scan code - Mailing list pgsql-hackers

Previous

Next