Re: Adding support for Row Compares to nbtree startikey optimization - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Adding support for Row Compares to nbtree startikey optimization
Date
Msg-id CAH2-WzmPahbGOQhuTUvULsAcNJgTrgs1=JEr=63uZ2+FHKp6gg@mail.gmail.com
Whole thread Raw
In response to Re: Adding support for Row Compares to nbtree startikey optimization  (Chao Li <li.evan.chao@gmail.com>)
Responses Re: Adding support for Row Compares to nbtree startikey optimization
List pgsql-hackers
On Fri, Aug 15, 2025 at 2:38 AM Chao Li <li.evan.chao@gmail.com> wrote:
> I tested the patch. Basically all of my test cases passed.

Thanks for the review.

> +                               if (subkey->sk_attno > firstchangingattnum) /* >, not >= */
> +                                       break;
>
> Can we enhance the comment to something like ">, not >= because firstchangingattnum starting from 1".

But that's not the reason why the condition is ">, not >=". The
condition really is ">=" later in the same function, when we deal with
= strategy keys. If you think about how equalities and inequalities
need to work with firstchangingattnum, you'll understand why this is
(note that row compares are always inequalities, and have almost the
same rules as plain scalar inequalities). I'll give you a summary, in
case that helps.

With equalities, the index attribute for the key must have exactly one
distinct value across the whole page. They must *also* be preceded by
0 or more higher-order attributes that also have only one distinct
value among tuples from the page. Otherwise, we cannot use firsttup
and lasttup to prove that the equality must be satisfied by every
tuple on the page. That's why we must break on ">=".

With inequalities, the rules are less strict: we *can* break only on
">". It is okay if there are multiple values among the tuples on the
page in respect of the attribute for the key/subkey (provided that the
attribute is firstchangingattnum, not some later attribute) -- as long
as firsttup and lasttup are both satisfied by the inequality.

Notice that with equalities (but not inequalities) it isn't possible
for firsttup and lasttup to both be satisfied when the equality key's
attribute is firstchangingattnum -- since there are multiple distinct
values for the attribute, and an = condition can only be equal to one
specific value (never multiple values). This is even true with =
ANY()/ScalarArrayOp conditions, since there is no way to know that all
the index tuples have matches in the array when "attr ==
firstchangingattnum" (we really do have to read every tuple from the
page to give correct results for a = ANY()/ScalarArrayOp key).

> +                               /*
> +                                * Compare the last tuple's datum for this row compare member
> +                                * (unless we already know that lasttup must satisfy qual)
> +                                */
> +                               if (!lastsatisfied)
> +                               {
>
> When subkey->sk_attno <= firstchangingattnum, if first is not satisfied, then last should not be satisfied either, so
wecan skip check last. 

That's not quite true: when "subkey->sk_attno == firstchangingattnum",
it's possible for the subkey to be satisfied by only one of the two
tuples (either firsttup or lasttup) -- so we really do need to check
both.

I believe that a weaker version of this idea is correct, though: we
could avoid directly testing the subkey against lasttup when
"subkey->sk_attno < firstchangingattnum", provided we make sure that
at least firsttup's corresponding datum (firstdatum) satisfies the
subkey. This must be correct because the exact same datum will be used
against the same subkey a second time -- the answer cannot possibly
differ between firsttup and lasttup. I don't think that that
optimization is particularly worthwhile here, though; it adds
additional complexity, for a saving that is hardly noticeable at all.

The value of this optimization comes from the fact that we can save
hundreds of calls to _bt_check_rowcompare for each tuple on the page
(when it's safe to set pstate.ikey to a value that's past some
RowCompare key's offset in so->keyData[]). A minor, rare saving in
"startup costs" just doesn't seem worth the trouble to me.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6
Next
From: David Rowley
Date:
Subject: Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6