Re: Expose custom planning data in EXPLAIN - Mailing list pgsql-hackers

From Andrei Lepikhov
Subject Re: Expose custom planning data in EXPLAIN
Date
Msg-id a1ac6771-ac2c-44f8-852d-56d6d14d0932@gmail.com
Whole thread Raw
In response to Re: Expose custom planning data in EXPLAIN  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 9/9/2025 17:32, Robert Haas wrote:
> On Wed, Aug 13, 2025 at 9:51 AM Andrei Lepikhov <lepihov@gmail.com> wrote:
> This seems quite closely related to what I propose here:
> 
> http://postgr.es/m/CA+TgmoYxfg90rw13+JcYwn4dwSC+agw7o8-A+fA3M0fh96pg8w@mail.gmail.com
I'd say it is another viewpoint (mostly opposite) of the feature.>
> There are some differences. In my proposal, specifically in v3-0004, I
> just add a single member to the PlannedStmt, and assume that the code
> can find a way to jam all the state it cares about into that single
> field, for example by creating a list of plan_node_id values and a
> list of associated nodes that can carry the corresponding data.
It may work if it is underpinned with a proper hook at the place where 
we already have the final path tree and can pointer to the plan node.> 
Likewise, I just put a single hook in there, in v3-0003, to allow data
> to be propagated from the plan-time data structures to that new
> PlannedStmt member. In your proposal, by contrast, there's a place to
> put extension-relevant information in every single Plan node, and a
> hook call for every single plan node as well.
Yes, we don't know which path tree is the final one. That's more, a plan 
tree may contain nodes that have never been in the path tree (Hash and 
Sort nodes as an example). My approach was caused by years of struggle 
to match path -> plan and plan -> 'no path' cases. The create_plan hook 
provides a straightforward way and guarantees.>
> I think both approaches have some advantages. One advantage of my
> proposal is that it's cheaper. Your proposal makes every Plan node 8
> bytes larger even though most of the time that extra pointer will be
> NULL. I have been yelled at in the past for proposing to increase the
> size of Plan, so I'm a little reluctant to believe that it's OK to do
> that here.
That's why I have not exposed this approach before. But now I see how 
specific plan nodes grow just to let explain be more detailed (remember 
the recent Memoise node expose of distinct predictions). The same stuff 
may be usual for HashJoin and IncrementalSort. IndexScan still hides a 
lot of the optimiser decisions that are sometimes needed to reveal 
performance issues. MergeJoin doesn't always show hidden 
optimisations... So, I think it would be better to extend the Plan node 
once and let in-core and external modules put their data inside that 
list on demand.> It might be less relevant now, as I think before we might
> have been just on the cusp of needing one more cache line for every
> Plan node, and it doesn't look like that's true currently, so maybe it
> wouldn't provoke as much objection, but I'm still nervous about the
> idea.
I think it needs an alternative glance.

> A related disadvantage of your approach is that it needs to
> consider calling the hook function lots of times instead of just one
> time, though perhaps that's too insignificant to bother about. Also,
The final plan does not contain so many nodes to care about. 
Additionally, plan tracking extensions will require adding data to most 
plan nodes. So, it would allow us to design more effective extensions, 
having data right in place (the Plan node) and no need to pass through a 
hash table or node tree.> with my approach is that it's possible to 
propagate information from
> PlannerInfo or PlannerGlobal structs, not just individual Plan nodes.
PlannerInfo and PlannerGlobal are the right nodes to be extended. I also 
constantly patch the RelOptInfo node because it is a highly stable node 
during the planning phase and represents a kinda of relational operation.
In my extensions/patches, PlannerGlobal data is usually needed for the 
initialisation of the EState, and PlannerInfo data is used to set up 
PlannedStmt and Subplan nodes properly.
> yet, and it's not real obvious how to match things up. Your approach
> solves this problem by putting a callback in a place where it gets
> passed the Path and the corresponding Plan at the same time. That's
> extremely convenient.
Yes, when we need to track plan decisions and observe their impact on 
query (node) performance, stability of this match becomes a critical part.>
> Another thing that is different is that my patch series is clearer
> about how multiple unrelated planner extensions are intended to
> coexist. That's not a fundamental advantage of my approach, because
> the same idea could be integrated into what you've done; it's only a
> difference in how things stand as currently proposed.
We have the same issues with hooks. But up to now, extensions have lived 
together, sharing hooks. Having a convention with DefElem or an 
Extensible node, we may forget about that issue. More importantly, we 
need to survive read/write/copy operations, at least to live inside a 
parallel worker, plan cache, generic plan, and pass READ/WRITE plan tree 
tests, enabled on-demand during compilation.>
> My overall feeling is that we should try to come up with a unified
> approach here. I'm not sure exactly what it should look like, though.
> I think the strongest part of your proposal is the fact that it
> connects each Path node to the corresponding Plan node in a very clear
> way, and I think that the weakest part of your proposal is that it
> makes each Plan node larger. I would be curious to hear what others
> think.
Agree with the node grow issue.
Positive arguments from the top of my mind:
1. It provides a clear and unified approach, allowing to extend any node 
in the same way according to the same convention.
2. Extensions may collaborate through these fields. It is suitable for 
developers who implement business logic close to the Postgres core as an 
extension (remember Yuri Rashkovskii's initiative).
3. It is clearer how to maintain read/write/copy object.

Generally, it seems we represent opposite design approaches. Business 
app developers include a lot of logic in their code and want to reduce 
overhead as much as possible. So, they value flexibility, like hooks 
provide. In their mind, DBMS is another glibc ;). You look from the core 
safety point of view and want to protect everything possible.
Not sure if it is needed here.

Anyway, I will personally reduce the core patch size in my modules 
drastically with your approach, too, but it will come at the cost of 
increasing complexity.

-- regards, Andrei Lepikhov



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Don't synchronously wait for already-in-progress IO in read stream
Next
From: Masahiko Sawada
Date:
Subject: Re: Add memory_limit_hits to pg_stat_replication_slots