Re: Expose custom planning data in EXPLAIN - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Expose custom planning data in EXPLAIN
Date
Msg-id CA+TgmoYNX5FLn158c6vBVWbbiHkS+JvgrZyEC7NMdx5DYwoxMg@mail.gmail.com
Whole thread Raw
In response to Expose custom planning data in EXPLAIN  (Andrei Lepikhov <lepihov@gmail.com>)
List pgsql-hackers
On Wed, Aug 13, 2025 at 9:51 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
> It appears that the only two changes required to enable the feature are
> a hook and a field in the Plan node. In this patch, I have chosen to add
> the hook to the copy_generic_path_info routine to limit its usage for
> tracking purposes only. Also, I extended its interface with the
> PlannerInfo pointer, which may be helpful in many cases. The new extlist
> field in the Plan structure should contain (by convention) extensible
> nodes only to let modules correctly pick their data. Also, it simplifies
> the logic of the node serialisation.
>
> An additional motivation for choosing Extensible Node is its lack of
> core usage, which makes it seem unpolished and requires copying a
> significant amount of code to use. This patch highlights this imperfection.

This seems quite closely related to what I propose here:

http://postgr.es/m/CA+TgmoYxfg90rw13+JcYwn4dwSC+agw7o8-A+fA3M0fh96pg8w@mail.gmail.com

There are some differences. In my proposal, specifically in v3-0004, I
just add a single member to the PlannedStmt, and assume that the code
can find a way to jam all the state it cares about into that single
field, for example by creating a list of plan_node_id values and a
list of associated nodes that can carry the corresponding data.
Likewise, I just put a single hook in there, in v3-0003, to allow data
to be propagated from the plan-time data structures to that new
PlannedStmt member. In your proposal, by contrast, there's a place to
put extension-relevant information in every single Plan node, and a
hook call for every single plan node as well.

I think both approaches have some advantages. One advantage of my
proposal is that it's cheaper. Your proposal makes every Plan node 8
bytes larger even though most of the time that extra pointer will be
NULL. I have been yelled at in the past for proposing to increase the
size of Plan, so I'm a little reluctant to believe that it's OK to do
that here. It might be less relevant now, as I think before we might
have been just on the cusp of needing one more cache line for every
Plan node, and it doesn't look like that's true currently, so maybe it
wouldn't provoke as much objection, but I'm still nervous about the
idea. A related disadvantage of your approach is that it needs to
consider calling the hook function lots of times instead of just one
time, though perhaps that's too insignificant to bother about. Also,
with my approach is that it's possible to propagate information from
PlannerInfo or PlannerGlobal structs, not just individual Plan nodes.

On the other hand, your proposal has usability advantages. If what
you're trying to do is save some details for every Plan node, my
approach requires you to run around and walk the plan tree and
marshall the data that you want to save, whereas your approach allows
you to do things in a more straightforward way. I think this actually
points to a deeper flaw in my approach: sure, you can run around and
look at the best path and the final plan and save whatever you want,
but how do you connect a path node to the corresponding plan node? The
Plan objects have a plan_node_id value, but the path objects don't
yet, and it's not real obvious how to match things up. Your approach
solves this problem by putting a callback in a place where it gets
passed the Path and the corresponding Plan at the same time. That's
extremely convenient.

Another thing that is different is that my patch series is clearer
about how multiple unrelated planner extensions are intended to
coexist. That's not a fundamental advantage of my approach, because
the same idea could be integrated into what you've done; it's only a
difference in how things stand as currently proposed.

My overall feeling is that we should try to come up with a unified
approach here. I'm not sure exactly what it should look like, though.
I think the strongest part of your proposal is the fact that it
connects each Path node to the corresponding Plan node in a very clear
way, and I think that the weakest part of your proposal is that it
makes each Plan node larger. I would be curious to hear what others
think.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: Checkpointer write combining
Next
From: Nathan Bossart
Date:
Subject: Re: Should io_method=worker remain the default?