Re: index prefetching - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | 6butbqln6ewi5kuxz3kfv2mwomnlgtate4mb4lpa7gb2l63j4t@stlwbi2dvvev Whole thread Raw |
In response to | Re: index prefetching (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
Hi, On 2025-08-14 19:36:49 -0400, Andres Freund wrote: > On 2025-08-14 17:55:53 -0400, Peter Geoghegan wrote: > > On Thu, Aug 14, 2025 at 5:06 PM Peter Geoghegan <pg@bowt.ie> wrote: > > > > We can optimize that by deferring the StartBufferIO() if we're encountering a > > > > buffer that is undergoing IO, at the cost of some complexity. I'm not sure > > > > real-world queries will often encounter the pattern of the same block being > > > > read in by a read stream multiple times in close proximity sufficiently often > > > > to make that worth it. > > > > > > We definitely need to be prepared for duplicate prefetch requests in > > > the context of index scans. > > > > Can you (or anybody else) think of a quick and dirty way of working > > around the problem on the read stream side? I would like to prioritize > > getting the patch into a state where its overall performance profile > > "feels right". From there we can iterate on fixing the underlying > > issues in more principled ways. > > I think I can see a way to fix the issue, below read stream. Basically, > whenever AsyncReadBuffers() finds a buffer that has ongoing IO, instead of > waiting, as we do today, copy the wref to the ReadBuffersOperation() and set a > new flag indicating that we are waiting for an IO that was not started by the > wref. Then, in WaitReadBuffers(), we wait for such foreign started IOs. That > has to be somewhat different code from today, because we have to deal with the > fact of the "foreign" IO potentially having failed. > > I'll try writing a prototype for that tomorrow. I think to actually get that > into a committable shape we need a test harness (probably a read stream > controlled by an SQL function that gets an array of buffers). Attached is a prototype of this approach. It does seem to fix this issue. New code disabled: #### backwards sequential table #### ┌──────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├──────────────────────────────────────────────────────────────────────┤ │ Index Scan Backward using t_pk on t (actual rows=1048576.00 loops=1) │ │ Index Cond: ((a >= 16336) AND (a <= 49103)) │ │ Index Searches: 1 │ │ Buffers: shared hit=10291 read=49933 │ │ I/O Timings: shared read=213.277 │ │ Planning: │ │ Buffers: shared hit=91 read=19 │ │ I/O Timings: shared read=2.124 │ │ Planning Time: 3.269 ms │ │ Execution Time: 1023.279 ms │ └──────────────────────────────────────────────────────────────────────┘ (10 rows) New code enabled: #### backwards sequential table #### ┌──────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├──────────────────────────────────────────────────────────────────────┤ │ Index Scan Backward using t_pk on t (actual rows=1048576.00 loops=1) │ │ Index Cond: ((a >= 16336) AND (a <= 49103)) │ │ Index Searches: 1 │ │ Buffers: shared hit=10291 read=49933 │ │ I/O Timings: shared read=217.225 │ │ Planning: │ │ Buffers: shared hit=91 read=19 │ │ I/O Timings: shared read=2.009 │ │ Planning Time: 2.685 ms │ │ Execution Time: 602.987 ms │ └──────────────────────────────────────────────────────────────────────┘ (10 rows) With the change enabled, the sequential query is faster than the random query: #### backwards random table #### ┌────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├────────────────────────────────────────────────────────────────────────────────────────────┤ │ Index Scan Backward using t_randomized_pk on t_randomized (actual rows=1048576.00 loops=1) │ │ Index Cond: ((a >= 16336) AND (a <= 49103)) │ │ Index Searches: 1 │ │ Buffers: shared hit=6085 read=77813 │ │ I/O Timings: shared read=347.285 │ │ Planning: │ │ Buffers: shared hit=127 read=5 │ │ I/O Timings: shared read=1.001 │ │ Planning Time: 1.751 ms │ │ Execution Time: 820.544 ms │ └────────────────────────────────────────────────────────────────────────────────────────────┘ (10 rows) Greetings, Andres Freund
Attachment
pgsql-hackers by date: