Home > mailing lists

Re: index prefetching - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: index prefetching
Date	February 15, 2024 16:42:07
Msg-id	CAH2-Wz=gMnsLQph1KM_xxTu-ZFRFqbDbK9tFBPTKcfXB1Z8=og@mail.gmail.com Whole thread Raw
In response to	Re: index prefetching (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses	Re: index prefetching
List	pgsql-hackers

Tree view

On Thu, Feb 15, 2024 at 9:36 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> On 2/15/24 00:06, Peter Geoghegan wrote:
> > I suppose that it might be much more important than I imagine it is
> > right now, but it'd be nice to have something a bit more concrete to
> > go on.
> >
>
> This probably depends on which corner cases are considered important.
>
> The page-at-a-time approach essentially means index items at the
> beginning of the page won't get prefetched (or vice versa, prefetch
> distance drops to 0 when we get to end of index page).

I don't think that's true. At least not for nbtree scans.

As I went into last year, you'd get the benefit of the work I've done
on "boundary cases" (most recently in commit c9c0589f from just a
couple of months back), which helps us get the most out of suffix
truncation. This maximizes the chances of only having to scan a single
index leaf page in many important cases. So I can see no reason why
index items at the beginning of the page are at any particular
disadvantage (compared to those from the middle or the end of the
page).

Where you might have a problem is cases where it's just inherently
necessary to visit more than a single leaf page, despite the best
efforts of the nbtsplitloc.c logic -- cases where the scan just
inherently needs to return tuples that "straddle the boundary between
two neighboring pages". That isn't a particularly natural restriction,
but it's also not obvious that it's all that much of a disadvantage in
practice.

> It certainly was a great improvement, no doubt about that. I dislike the
> restriction, but that's partially for aesthetic reasons - it just seems
> it'd be nice to not have this.
>
> That being said, I'd be OK with having this restriction if it makes v1
> feasible. For me, the big question is whether it'd mean we're stuck with
> this restriction forever, or whether there's a viable way to improve
> this in v2.

I think that there is no question that this will need to not
completely disable kill_prior_tuple -- I'd be surprised if one single
person disagreed with me on this point. There is also a more nuanced
way of describing this same restriction, but we don't necessarily need
to agree on what exactly that is right now.

> And I don't have answer to that :-( I got completely lost in the ongoing
> discussion about the locking implications (which I happily ignored while
> working on the PoC patch), layering tensions and questions which part
> should be "in control".

Honestly, I always thought that it made sense to do things on the
index AM side. When you went the other way I was surprised. Perhaps I
should have said more about that, sooner, but I'd already said quite a
bit at that point, so...

Anyway, I think that it's pretty clear that "naive desynchronization"
is just not acceptable, because that'll disable kill_prior_tuple
altogether. So you're going to have to do this in a way that more or
less preserves something like the current kill_prior_tuple behavior.
It's going to have some downsides, but those can be managed. They can
be managed from within the index AM itself, a bit like the
_bt_killitems() no-pin stuff does things already.

Obviously this interpretation suggests that doing things at the index
AM level is indeed the right way to go, layering-wise. Does it make
sense to you, though?

--
Peter Geoghegan

pgsql-hackers by date:

From: Nathan Bossart
Date: 15 February 2024, 16:14:35
Subject: Re: MAINTAIN privilege -- what do we need to un-revert it?

From: Tomas Vondra
Date: 15 February 2024, 16:56:15
Subject: Re: planner chooses incremental but not the best one

Re: index prefetching - Mailing list pgsql-hackers

Previous

Next