Re: plan shape work - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: plan shape work |
Date | |
Msg-id | 1172265.1757430771@sss.pgh.pa.us Whole thread Raw |
In response to | Re: plan shape work (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: plan shape work
|
List | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > First of all, as an administrative note, since both you and Alex seem > to like 0001 and 0002 and no suggestions for improvement have been > offered, I plan to commit those soon unless there are objections or > additional review comments. FWIW, I don't love the details of 0001: I think it's going in the right direction, but could use more polish. In particular, you've defined Result.relids in a way that seems ambiguous. There are two different meanings for NULL, and one of them is being interpreted as an "Aggregate" without a lot of principle behind that. I think you need to store more data in order to make that less of a hack. So far as I can see from the regression-test changes, the "Aggregate" case only occurs when we've replaced a aggregate calculation with a MinMaxAggPath representing an index endpoint probe. What I would like to see come out when that's the case is something like Replaces: MIN or MAX aggregate over scan on tab1 This means first that the Result.relids needs to include the relid of the table being scanned by the indexscan, and second that EXPLAIN will then need some other cue to help it distinguish this case from a case where it should just say "Replaces: Scan on tab1". It's possible that you could intuit that by examining the initplans attached to the Result node, but I think what would make a ton more sense is to add an enum field to Result that explicitly identifies why it's there. We've got at least "apply one-time filter to subplan", "apply per-row gating filter to subplan", "represent a relation proven empty", and "place-hold for a MinMaxAgg InitPlan". Tracing back from all the calls to make_result() might identify some more cases. I'm not arguing that the user-visible EXPLAIN output should distinguish all of these (but probably overexplain should). I think though that it'd be useful to have this recorded in the plan tree. > On Mon, Sep 8, 2025 at 10:22 PM Richard Guo <guofenglinux@gmail.com> wrote: >> One idea (not fully thought through) is that we record the calculated >> outerjoin_relids for each outer join in its JoinPaths. > I'm OK with moving the conversation to a separate thread, but can you > clarify from where you believe that joinpath->ojrelids would be > populated? It seems to me that the assertion couldn't pass unless > every join path ended up with the same value of joinpath->ojrelids. What I have been intending to suggest is that you should add a field to join plan nodes that is zero if an inner join, but the relid of the outer join RTE if it's an outer join. This is uniquely defined because any given join node implements a specific outer join, even though the planner's relids for the completed join are (intentionally) ambiguous about the order in which multiple joins were done. The reason I wanted to do this is that I think it'd become possible to tighten the assertions in setrefs.c about whether Vars' varnullingrels are correct, so that we can always assert that those relid sets are exactly thus-and-so and not have to settle for superset/subset tests. I've not worked through the details to be entirely sure that this is possible, so I didn't bring it up before. But maybe labeling join nodes this way would also address Richard's concern. In any case it fits into your overall goal of decorating plan trees with more information. regards, tom lane
pgsql-hackers by date: