Re: Streaming read-ready sequential scan code - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Streaming read-ready sequential scan code |
Date | |
Msg-id | CA+hUKGLbn-V3muGoObOu3_MENXATa9=p8+JGMRjjd2VYa4vz7A@mail.gmail.com Whole thread Raw |
In response to | Re: Streaming read-ready sequential scan code (Melanie Plageman <melanieplageman@gmail.com>) |
Responses |
Re: Streaming read-ready sequential scan code
Re: Streaming read-ready sequential scan code |
List | pgsql-hackers |
On Thu, Apr 4, 2024 at 6:03 AM Melanie Plageman <melanieplageman@gmail.com> wrote: > On Tue, Apr 2, 2024 at 1:10 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 01/04/2024 22:58, Melanie Plageman wrote: > > > Attached v7 has version 14 of the streaming read API as well as a few > > > small tweaks to comments and code. > > > > I saw benchmarks in this thread to show that there's no regression when > > the data is in cache, but I didn't see any benchmarks demonstrating the > > benefit of this. So I ran this quick test: > > Good point! It would be good to show why we would actually want this > patch. Attached v8 is rebased over current master (which now has the > streaming read API). Anecdotally by all reports I've seen so far, all-in-cache seems to be consistently a touch faster than master if anything, for streaming seq scan and streaming ANALYZE. That's great!, but it doesn't seem to be due to intentional changes. No efficiency is coming from batching anything. Perhaps it has to do with CPU pipelining effects: though it's doing the same work as ReadBuffer()-when-it-gets-a-hit, the work itself is cut up into stages in a kind of pipeline: read_stream_next_buffer() chooses page n + 2, pins page n + 1 and returns page n each time you call it, so maybe we get more CPU parallelism due to spreading the data dependencies out? (Makes me wonder what happens if you insert a memory prefetch for the page header into that production line, and if there are more opportunities to spread dependencies out eg hashing the buffer tag ahead of time.) > On the topic of BAS_BULKREAD buffer access strategy, I think the least > we could do is add an assert like this to read_stream_begin_relation() > after calculating max_pinned_buffers. > > Assert(GetAccessStrategyBufferCount(strategy) > max_pinned_buffers); > > Perhaps we should do more? I think with a ring size of 16 MB, large > SELECTs are safe for now. But I know future developers will make > changes and it would be good not to assume they will understand that > pinning more buffers than the size of the ring effectively invalidates > the ring. Yeah I think we should clamp max_pinned_buffers if we see a strategy. What do you think about: if (strategy) { int strategy_buffers = GetAccessStrategyBufferCount(strategy); max_pinned_buffers = Min(strategy_buffers / 2, max_pinned_buffers); } I just don't know where to get that '2'. The idea would be to hopefully never actually be constrained by it in practice, except in tiny/toy setups, so we can't go too wrong with our number '2' there. Then we should increase the default ring sizes for BAS_BULKREAD and BAS_VACUUM to make the previous statement true. The size of main memory and L2 cache have increased dramatically since those strategies were invented. I think we should at least double them, and more likely quadruple them. I realise you already made them configurable per command in commit 1cbbee033, but I mean the hard coded default 256 in freelist.c. It's not only to get more space for our prefetching plans, it's also to give the system more chance of flushing WAL asynchronously/in some other backend before you crash into dirty data; as you discovered, prefetching accidentally makes that effect worse in.a BAS_VACUUM strategy, by taking away space that is effectively deferring WAL flushes, so I'd at least like to double the size for if we use "/ 2" above.
pgsql-hackers by date: