Re: Expose custom planning data in EXPLAIN - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Expose custom planning data in EXPLAIN |
Date | |
Msg-id | CA+TgmoYNX5FLn158c6vBVWbbiHkS+JvgrZyEC7NMdx5DYwoxMg@mail.gmail.com Whole thread Raw |
In response to | Expose custom planning data in EXPLAIN (Andrei Lepikhov <lepihov@gmail.com>) |
List | pgsql-hackers |
On Wed, Aug 13, 2025 at 9:51 AM Andrei Lepikhov <lepihov@gmail.com> wrote: > It appears that the only two changes required to enable the feature are > a hook and a field in the Plan node. In this patch, I have chosen to add > the hook to the copy_generic_path_info routine to limit its usage for > tracking purposes only. Also, I extended its interface with the > PlannerInfo pointer, which may be helpful in many cases. The new extlist > field in the Plan structure should contain (by convention) extensible > nodes only to let modules correctly pick their data. Also, it simplifies > the logic of the node serialisation. > > An additional motivation for choosing Extensible Node is its lack of > core usage, which makes it seem unpolished and requires copying a > significant amount of code to use. This patch highlights this imperfection. This seems quite closely related to what I propose here: http://postgr.es/m/CA+TgmoYxfg90rw13+JcYwn4dwSC+agw7o8-A+fA3M0fh96pg8w@mail.gmail.com There are some differences. In my proposal, specifically in v3-0004, I just add a single member to the PlannedStmt, and assume that the code can find a way to jam all the state it cares about into that single field, for example by creating a list of plan_node_id values and a list of associated nodes that can carry the corresponding data. Likewise, I just put a single hook in there, in v3-0003, to allow data to be propagated from the plan-time data structures to that new PlannedStmt member. In your proposal, by contrast, there's a place to put extension-relevant information in every single Plan node, and a hook call for every single plan node as well. I think both approaches have some advantages. One advantage of my proposal is that it's cheaper. Your proposal makes every Plan node 8 bytes larger even though most of the time that extra pointer will be NULL. I have been yelled at in the past for proposing to increase the size of Plan, so I'm a little reluctant to believe that it's OK to do that here. It might be less relevant now, as I think before we might have been just on the cusp of needing one more cache line for every Plan node, and it doesn't look like that's true currently, so maybe it wouldn't provoke as much objection, but I'm still nervous about the idea. A related disadvantage of your approach is that it needs to consider calling the hook function lots of times instead of just one time, though perhaps that's too insignificant to bother about. Also, with my approach is that it's possible to propagate information from PlannerInfo or PlannerGlobal structs, not just individual Plan nodes. On the other hand, your proposal has usability advantages. If what you're trying to do is save some details for every Plan node, my approach requires you to run around and walk the plan tree and marshall the data that you want to save, whereas your approach allows you to do things in a more straightforward way. I think this actually points to a deeper flaw in my approach: sure, you can run around and look at the best path and the final plan and save whatever you want, but how do you connect a path node to the corresponding plan node? The Plan objects have a plan_node_id value, but the path objects don't yet, and it's not real obvious how to match things up. Your approach solves this problem by putting a callback in a place where it gets passed the Path and the corresponding Plan at the same time. That's extremely convenient. Another thing that is different is that my patch series is clearer about how multiple unrelated planner extensions are intended to coexist. That's not a fundamental advantage of my approach, because the same idea could be integrated into what you've done; it's only a difference in how things stand as currently proposed. My overall feeling is that we should try to come up with a unified approach here. I'm not sure exactly what it should look like, though. I think the strongest part of your proposal is the fact that it connects each Path node to the corresponding Plan node in a very clear way, and I think that the weakest part of your proposal is that it makes each Plan node larger. I would be curious to hear what others think. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: