Re: index prefetching - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | CAH2-WzkAC_gWQgc3MV3MWBx2EUaE4TfPk8XQPkJgs1R_qNvo=A@mail.gmail.com Whole thread Raw |
In response to | Re: index prefetching (Tomas Vondra <tomas@vondra.me>) |
List | pgsql-hackers |
On Thu, Sep 4, 2025 at 2:55 PM Tomas Vondra <tomas@vondra.me> wrote: > Aren't plenty of real-world data sets correlated, but not perfectly? Attached is the latest revision of the prefetching patch, taken from the shared branch that Tomas and I have been working on for some weeks. This revision is the first "official revision" that uses the complex approach, which we agreed was the best approach right before we started collaborating through this shared branch. While Tomas and I have posted versions of this "complex" approach at various times, those were "unofficial" previews of different approaches. Whereas this is the latest official patch revision of record, that should be tested by CFTester for the prefetch patch's CF entry, etc. We haven't done a good job of maintaining an unambiguous, easy to test "official" CF entry patch before now. That's why I'm being explicit about what this patch revision represents. It's the shared work of Tomas and I; it isn't some short-term experimental fork. Future revisions will be incremental improvements on what I'm posting now. Our focus has been on fixing a variety of regressions that came to light following testing by Tomas. There are a few bigger changes that are intended to fix these regressions, plus lots of small changes. There's too many small changes to list. But the bigger changes are: * We're now carrying Andres' patch [1] that deals with inefficiencies on the read stream side [2]. We need this to get decent performance with certain kinds of index scans where the same heap page buffer needs to be read multiple times in close succession. * We now delay prefetching/creating a new read stream until after we've already read one index batch, with the goal of avoiding regressions on cheap, selective queries (e.g., pgbench SELECT). This optimization has been referred to as the "priorbatch" optimization earlier in this thread. * The third patch is a new one, authored by Tomas. It aims to ameliorate nestloop join regressions by caching memory used to store batches across rescans. This is still experimental. * The regression that we were concerned about most recently [3][4] is fixed by a new mechanism that sometimes disables prefetching/the read stream some time prefetching begins, having already read a small number of batches with prefetching -- the INDEX_SCAN_MIN_TUPLE_DISTANCE optimization. This is also experimental. But it does fully fix the problem at hand, without any read stream changes. (This is part of the main prefetching patch.) This works like the "priorbatch" optimization, but in reverse. We *unset* the scan's read stream when our INDEX_SCAN_MIN_TUPLE_DISTANCE test shows that prefetching hasn't worked out (as opposed to delaying starting it up until it starts to look like prefetching might help). Like the "priorbatch" optimization, this optimization is concerned with fixed prefetching costs that cannot possibly pay for themselves. Note that we originally believed that the regression in question [3][4] necessitated more work on the read stream side, to directly account for the way that we saw prefetch distance collapse to 2.0 for the entire scan. But our current thinking is that the regression in question occurs with scans where wholly avoiding prefetching is the right goal. Which is why, tentatively, we're addressing the problem within indexam.c itself (not in the read stream), by adding this new INDEX_SCAN_MIN_TUPLE_DISTANCE test to the read stream callback. This means that various experimental read stream distance patches [3][5] that initially seemed relevant no longer appear necessary (and so aren't included in this new revision at all). Much cleanup work remains to get the changes I just described in proper shape (to say nothing about open items that we haven't made a start on yet, like moving the read stream out of indexam.c and into heapam). But it has been too long since the last revision. I'd like to establish a regular cadence for posting new revisions of the patch set. [1] https://postgr.es/m/6butbqln6ewi5kuxz3kfv2mwomnlgtate4mb4lpa7gb2l63j4t@stlwbi2dvvev [2] https://postgr.es/m/kvyser45imw3xmisfvpeoshisswazlzw35el3fq5zg73zblpql@f56enfj45nf7 [3] https://postgr.es/m/8f5d66cf-44e9-40e0-8349-d5590ba8efb4@vondra.me [4] https://github.com/tvondra/postgres/blob/index-prefetch-master/microbenchmarks/tomas-weird-issue-readstream.sql [5] https://postgr.es/m/CA+hUKG+9Qp=E5XWE+_1UPCxULLXz6JrAY=83pmnJ5ifupH-NSA@mail.gmail.com -- Peter Geoghegan
Attachment
pgsql-hackers by date: