Re: Index Searches higher than expected for skip scan - Mailing list pgsql-performance

From Peter Geoghegan
Subject Re: Index Searches higher than expected for skip scan
Date
Msg-id CAH2-WznKg6Bg5R8Z+TA=DuE8fekLagnuzJCgVYi=y6jEKsv6sQ@mail.gmail.com
Whole thread Raw
In response to Re: Index Searches higher than expected for skip scan  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-performance
On Thu, Nov 6, 2025 at 2:54 PM Peter Geoghegan <pg@bowt.ie> wrote:
> That just leaves SK_ISNULL. We cannot assume that the index doesn't
> have any NULLs (unless the query uses IS NOT NULL directly).

Actually, that won't work here. Because the optimizer recognizes that
the leading boolean column isn't nullable, and "helpfully"
optimizes-away the IS NOT NULL qualification. But if the column *was*
nullable, adding IS NOT NULL would cut the number of index searches by
1.

> We might end up having to read the rightmost leaf page anyway, in which case
> there won't be an extra search for SK_ISNULL, but we'll likely need an
> extra search for this too (just like the leftmost leaf page, with
> SK_BT_MINVAL).

I said the wrong thing here. In fact, it's very likely that we will
have to read the rightmost leaf page with a query that has no qual on
the leading column, no matter the datatype. Here's why:

Suppose we only store tuples with the values 1 - 100 in the leading
column "a", for a query "WHERE b = 5432", with an index on (a, b).
Once skip scan reaches the point where it returns "(a, b) = (100,
5432)" to the scan (or once it sees that there's no such match in the
index), it'll guess that the next "a" value is 101. We'll then try and
fail to find a match "(a, b) = (101, 5432)".

If the column "a" has no NULLs, then the scan will already be at the
rightmost position of its rightmost leaf page -- so we're done then
and there. Notice that there hasn't been an extra search for SK_ISNULL
here. Because we visited the leaf page where NULLs would have been,
had there been any, and found that there was none at all (which is the
common case, contrary to what I said earlier).

If, on the other hand, the column "a" has many NULLs, then this failed
attempt to find "(a, b) = (101, 5432)" will advance the array on "a"
from 101 to SK_ISNULL, and then perform another search, this time for
"(a, b) = (NULL, 5432)". Again, notice that this also isn't wasteful
-- we simply have no better way to reliably determine that there is no
non-NULL value after 100 in this index.

It's perhaps worth noting that your original boolean example shows
that skip scan *is* directly aware that the next value after false is
SK_ISNULL -- even though (as I said in my first email from earlier) it
cannot do the similar trick of knowing that the lowest value really
should be false (this is knowable before the first scan/search even
begins). There are very obscure performance reasons for this
inconsistency, though it might seem a bit arbitrary.

--
Peter Geoghegan



pgsql-performance by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Index Searches higher than expected for skip scan