Home > mailing lists

Re: plan shape work - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: plan shape work
Date	September 9 18:12:51
Msg-id	1172265.1757430771@sss.pgh.pa.us Whole thread Raw
In response to	Re: plan shape work (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: plan shape work
List	pgsql-hackers

Tree view

Robert Haas <robertmhaas@gmail.com> writes:
> First of all, as an administrative note, since both you and Alex seem
> to like 0001 and 0002 and no suggestions for improvement have been
> offered, I plan to commit those soon unless there are objections or
> additional review comments.

FWIW, I don't love the details of 0001: I think it's going in the
right direction, but could use more polish.  In particular, you've
defined Result.relids in a way that seems ambiguous.  There are two
different meanings for NULL, and one of them is being interpreted as
an "Aggregate" without a lot of principle behind that.  I think you
need to store more data in order to make that less of a hack.

So far as I can see from the regression-test changes, the "Aggregate"
case only occurs when we've replaced a aggregate calculation with
a MinMaxAggPath representing an index endpoint probe.  What I would
like to see come out when that's the case is something like

    Replaces: MIN or MAX aggregate over scan on tab1

This means first that the Result.relids needs to include the relid of
the table being scanned by the indexscan, and second that EXPLAIN will
then need some other cue to help it distinguish this case from a
case where it should just say "Replaces: Scan on tab1".  It's possible
that you could intuit that by examining the initplans attached to the
Result node, but I think what would make a ton more sense is to add
an enum field to Result that explicitly identifies why it's there.
We've got at least "apply one-time filter to subplan", "apply per-row
gating filter to subplan", "represent a relation proven empty", and
"place-hold for a MinMaxAgg InitPlan".  Tracing back from all the
calls to make_result() might identify some more cases.  I'm not
arguing that the user-visible EXPLAIN output should distinguish
all of these (but probably overexplain should).  I think though
that it'd be useful to have this recorded in the plan tree.

> On Mon, Sep 8, 2025 at 10:22 PM Richard Guo <guofenglinux@gmail.com> wrote:
>> One idea (not fully thought through) is that we record the calculated
>> outerjoin_relids for each outer join in its JoinPaths.

> I'm OK with moving the conversation to a separate thread, but can you
> clarify from where you believe that joinpath->ojrelids would be
> populated? It seems to me that the assertion couldn't pass unless
> every join path ended up with the same value of joinpath->ojrelids.

What I have been intending to suggest is that you should add a field
to join plan nodes that is zero if an inner join, but the relid of
the outer join RTE if it's an outer join.  This is uniquely defined
because any given join node implements a specific outer join, even
though the planner's relids for the completed join are (intentionally)
ambiguous about the order in which multiple joins were done.

The reason I wanted to do this is that I think it'd become possible to
tighten the assertions in setrefs.c about whether Vars' varnullingrels
are correct, so that we can always assert that those relid sets are
exactly thus-and-so and not have to settle for superset/subset tests.
I've not worked through the details to be entirely sure that this is
possible, so I didn't bring it up before.  But maybe labeling join
nodes this way would also address Richard's concern.  In any case
it fits into your overall goal of decorating plan trees with more
information.

            regards, tom lane

pgsql-hackers by date:

From: Jacob Champion
Date: 09 September, 18:12:36
Subject: Re: OAuth client code doesn't work with Google OAuth

From: Alexandra Wang
Date: 09 September, 18:14:04
Subject: Re: SQL:2023 JSON simplified accessor support

Re: plan shape work - Mailing list pgsql-hackers

Previous

Next