Thread: Violation of principle that plan trees are read-only
While chasing down Valgrind leakage reports, I was disturbed to realize that some of them arise from a case where the executor scribbles on the plan tree it's given, which it is absolutely not supposed to do: /* * Initialize result tuple slot and assign its rowtype using the first * RETURNING list. We assume the rest will look the same. */ mtstate->ps.plan->targetlist = (List *) linitial(returningLists); A bit of git archaeology fingers Andres' commit 4717fdb14, which we can't easily revert since he later got rid of ExecAssignResultType altogether. But I think we need to do something about it --- it's purest luck that this doesn't cause serious problems in some cases. regards, tom lane
On Sun, May 18, 2025 at 7:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > While chasing down Valgrind leakage reports, I was disturbed > to realize that some of them arise from a case where the > executor scribbles on the plan tree it's given, which it is > absolutely not supposed to do: > > /* > * Initialize result tuple slot and assign its rowtype using the first > * RETURNING list. We assume the rest will look the same. > */ > mtstate->ps.plan->targetlist = (List *) linitial(returningLists); > > A bit of git archaeology fingers Andres' commit 4717fdb14, which we > can't easily revert since he later got rid of ExecAssignResultType > altogether. But I think we need to do something about it --- it's > purest luck that this doesn't cause serious problems in some cases. Is there some way that we can detect violations of this rule automatically? I recall that we were recently discussing with Richard Guo a proposed patch that would have had a similar problem, so it's evidently not that hard for a committer to either fail to understand what the rule is or fail to realize that they are violating it. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, May 19, 2025 at 10:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I proposed a possible way to test for this at [1]. I was intending to > get around to that sooner or later, but the urgency of the matter just > went up in my eyes... Ah, right, I actually read that thread but had forgotten about it. I don't know if that idea will work out but it certainly seems like a good thing to try. -- Robert Haas EDB: http://www.enterprisedb.com
Isaac Morland <isaac.morland@gmail.com> writes: > I assume this question has an obvious negative answer, but why can't we > attach const declarations to the various structures that make up the plan > tree (at all levels, all the way down)? I know const doesn't actually > prevent a value from changing, but at least the compiler would complain if > code accidentally tried. The big problem is that a "const" attached to a top-level pointer doesn't inherently propagate down to sub-nodes. So if I had, say, "const Query *stmt", the compiler would complain about stmt->jointree = foo; but not about stmt->jointree->quals = foo; I guess we could imagine developing an entirely parallel set of struct declarations with "const" on all pointer fields, like typedef struct ConstQuery { ... const ConstFromExpr *jointree; ... } ConstQuery; but even with automated maintenance of the ConstFoo doppelganger typedefs, it seems like that'd be a notational nightmare. For one thing, I'm not sure how to teach the compiler that casting "Query *" to "ConstQuery *" is okay but vice versa isn't. Does C++ have a better story in this area? I haven't touched it in so long that I don't remember. regards, tom lane
On 19/5/25 16:45, Tom Lane wrote: > [snip] > For one thing, I'm not sure how to teach the compiler that casting > "Query *" to "ConstQuery *" is okay but vice versa isn't. > > Does C++ have a better story in this area? Hi, A C++ compiler *does* enforce "const correctness" (for examples see, for example, Meyer's fantastic "Efficient C++" series). It seems that Postgres is approaching the class of complexity/maturity that (modern) C++ was designed to solve. It should be unmatched in that area. I believe that, with recent platform deprecations, the vast majority of systems where one can compile a modern Postgres should already have a decent C++ compiler available (g++ & clang probably cover >95%) As opposed to say, rewriting in Rust ---where the compilar also does a very substantial checking of semantics at compile time---, introducing some C++ in Postgres could be as simple as throwing some "#ifdef __cplusplus__ extern "C" { #endif" in the headers and use the required subset in certain modules *only*. (this is how, for instance, Bacula did it ~two decades ago) The planner and parts of the bufmgr / arenas (palloc et al.) seem to me as the most obvious candidates to try. I'm not sure whether it'd introduce any unintended regressions, though. > I haven't touched it > in so long that I don't remember. C++ 17 would be the standard to tackle, IMHO. HTH, J.L. -- Parkinson's Law: Work expands to fill the time alloted to it.
Hi, On 2025-05-18 19:31:33 -0400, Tom Lane wrote: > While chasing down Valgrind leakage reports, I was disturbed > to realize that some of them arise from a case where the > executor scribbles on the plan tree it's given, which it is > absolutely not supposed to do: > > /* > * Initialize result tuple slot and assign its rowtype using the first > * RETURNING list. We assume the rest will look the same. > */ > mtstate->ps.plan->targetlist = (List *) linitial(returningLists); > > A bit of git archaeology fingers Andres' commit 4717fdb14, which we > can't easily revert since he later got rid of ExecAssignResultType > altogether. But I think we need to do something about it --- it's > purest luck that this doesn't cause serious problems in some cases. I have no idea what I was smoking at that time, this clearly is wrong. I think the reason it doesn't cause problems is that we're just using the first child's targetlist every time and we're just assigning the same value back every time. What's even more absurd is: Why do we even need to assign the result type at all? Before & after 4717fdb14. The planner ought to have already figured this out, no? It seems that if not, we'd have a problem anyway, who says the "calling" nodes (say a wCTE) can cope with whatever output we come up with? Except of course, there is exactly one case in our tests where the tupledescs aren't equal :( I've not dug fully into this, but I thought I should send this email to confirm that I'm looking into the issue. Greetings, Andres Freund
I wrote: > I think we can just delete this assignment (and fix these comments). As attached. I'm tempted to back-patch this: the plan tree damage seems harmless at present, but maybe it'd become less harmless with future fixes. regards, tom lane diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 46d533b7288..71bc174cfc9 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -4830,12 +4830,11 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) ExprContext *econtext; /* - * Initialize result tuple slot and assign its rowtype using the first - * RETURNING list. We assume the rest will look the same. + * Initialize result tuple slot and assign its rowtype using the plan + * node's declared targetlist, which the planner set up to be the same + * as the first RETURNING list. We assume the rest will produce + * compatible output. */ - mtstate->ps.plan->targetlist = (List *) linitial(returningLists); - - /* Set up a slot for the output of the RETURNING projection(s) */ ExecInitResultTupleSlotTL(&mtstate->ps, &TTSOpsVirtual); slot = mtstate->ps.ps_ResultTupleSlot; diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index 150e9f060ee..8e1eb77dd49 100644 --- a/src/backend/optimizer/plan/setrefs.c +++ b/src/backend/optimizer/plan/setrefs.c @@ -1097,9 +1097,10 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) /* * Set up the visible plan targetlist as being the same as - * the first RETURNING list. This is for the use of - * EXPLAIN; the executor won't pay any attention to the - * targetlist. We postpone this step until here so that + * the first RETURNING list. This is mostly for the use + * of EXPLAIN; the executor won't execute that targetlist, + * although it does use it to prepare the node's result + * tuple slot. We postpone this step until here so that * we don't have to do set_returning_clause_references() * twice on identical targetlists. */
Hi, On 2025-05-20 13:04:46 -0400, Tom Lane wrote: > I wrote: > > I think we can just delete this assignment (and fix these comments). > > As attached. Largely makes sense, the only thing I see is that the !returningLists branch does: /* * We still must construct a dummy result tuple type, because InitPlan * expects one (maybe should change that?). */ mtstate->ps.plan->targetlist = NIL; which we presumably shouldn't do anymore either. It never changes anything afaict, but still. > I'm tempted to back-patch this: the plan tree damage seems harmless at > present, but maybe it'd become less harmless with future fixes. I am not sure, I can see arguments either way. <ponder> There are *some* cases where this changes the explain output, but but the new output is more correct, I think: --- /tmp/a.out 2025-05-20 14:19:04.947945026 -0400 +++ /tmp/b.out 2025-05-20 14:19:12.878975702 -0400 @@ -18,7 +18,7 @@ QUERY PLAN ----------------------------------------------------------------------------- Update on public.part_abc - Output: part_abc_1.a, part_abc_1.b, part_abc_1.c + Output: a, b, c Update on public.part_abc_2 part_abc_1 -> Append Subplans Removed: 1 I suspect this is an argument for backpatching, not against - seems that deparsing could end up creating bogus output in cases where it could matter? Not sure if such cases are reachable via views (and thus pg_dump) or postgres_fdw, but it seems possible. Greetings, Andres Freund
On Mon, May 19, 2025 at 08:41:06AM -0400, Isaac Morland wrote: > I assume this question has an obvious negative answer, but why can't we > attach const declarations to the various structures that make up the plan > tree (at all levels, all the way down)? I know const doesn't actually > prevent a value from changing, but at least the compiler would complain if > code accidentally tried. What you want is for C to have a type attribute that denotes immutability all the way down. `const` doesn't do that. One thing that could be done is to write a utility that creates const-all-the-way-down clones of given types, but such a tool can't be written as C pre-processor macros -- it would have to be a tool that parses the actual type definitions, or uses DWARF or similar to from the compilation of a file (I've done this latter before, but this weds you to compilers that output DWARF, which MSVC doesn't, for example). Really, the C committee ought to add this at some point, darn it. It would be the opposite of Rust's &mut. Nico --
Andres Freund <andres@anarazel.de> writes: > On 2025-05-20 16:18:57 -0400, Tom Lane wrote: >> I'm curious though: what was the test case you were looking at? > It's a modified query from our regression tests, I had added some debugging to > find cases where the targetlists differed. I attached the extracted, somewhat > modified, sql script. [ confused... ] I get the same output from that script with or without the patch. regards, tom lane
I wrote: > [ confused... ] I get the same output from that script with or > without the patch. Oh, scratch that: I'd gotten confused about which branch I was working in. It does change the output as you say. I still think it's irrelevant to view display, including pg_dump, though. Those operations work from a parse tree not a plan tree. regards, tom lane
On 20/5/25 22:43, Nico Williams wrote: > [snip] > What you want is for C to have a type attribute that denotes > immutability all the way down. `const` doesn't do that. One thing that > could be done is to write a utility that creates const-all-the-way-down > clones of given types, but such a tool can't be written as C > pre-processor macros -- it would have to be a tool that parses the > actual type definitions, or uses DWARF or similar to from the > compilation of a file (I've done this latter before, but this weds you > to compilers that output DWARF, which MSVC doesn't, for example). > > Really, the C committee ought to add this at some point, darn it. It > would be the opposite of Rust's &mut. Like C++'s const specifier, specially const references to objects? This is actually natively compatible with C code, "just" by throwing extern "C" around..... https://en.cppreference.com/w/cpp/language/cv (most of Postgres' code is already "Object-oriented in C" in my view...) The fact that the C++ compiler is usually able to optimize deeper/better than a C one due to aggresive inlining+global optimization and inferred "strict"ness doesn't hurt either :) My €.02. HTH. / J.L. -- Parkinson's Law: Work expands to fill the time alloted to it.
On Mon, May 19, 2025 at 7:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Isaac Morland <isaac.morland@gmail.com> writes:
> I assume this question has an obvious negative answer, but why can't we
> attach const declarations to the various structures that make up the plan
> tree (at all levels, all the way down)? I know const doesn't actually
> prevent a value from changing, but at least the compiler would complain if
> code accidentally tried.
The big problem is that a "const" attached to a top-level pointer
doesn't inherently propagate down to sub-nodes. So if I had, say,
"const Query *stmt", the compiler would complain about
stmt->jointree = foo;
but not about
stmt->jointree->quals = foo;
I guess we could imagine developing an entirely parallel set of
struct declarations with "const" on all pointer fields, like
typedef struct ConstQuery
{
...
const ConstFromExpr *jointree;
...
} ConstQuery;
but even with automated maintenance of the ConstFoo doppelganger
typedefs, it seems like that'd be a notational nightmare. For
one thing, I'm not sure how to teach the compiler that casting
"Query *" to "ConstQuery *" is okay but vice versa isn't.
Does C++ have a better story in this area? I haven't touched it
in so long that I don't remember.
regards, tom lane
One unconventional but potentially effective approach to detect unexpected modifications in the plan tree can be as follows:
Yes, this approach introduces some memory and performance overhead. However, we can limit its impact by enabling it conditionally via a compile-time flag or #define, making it suitable for debugging or assertion-enabled builds.
It might sound a bit unconventional, but it could serve as a useful sanity check especially during development or when investigating plan tree integrity issues.
- Implement a function that can deeply compare two plan trees for structural or semantic differences.
- Before passing the original plan tree to the executor, make a deep copy of it.
- After execution (or at strategic checkpoints), compare the current plan tree against the original copy.
- If any differences are detected, emit a warning or log it for further inspection.
Yes, this approach introduces some memory and performance overhead. However, we can limit its impact by enabling it conditionally via a compile-time flag or #define, making it suitable for debugging or assertion-enabled builds.
It might sound a bit unconventional, but it could serve as a useful sanity check especially during development or when investigating plan tree integrity issues.
Pardon me if this sounds naive!
Yasir
Data Bene
Data Bene
I wrote: > Oh, scratch that: I'd gotten confused about which branch I was > working in. It does change the output as you say. After poking at that for awhile, I decided we need a small tweak in EXPLAIN itself to make the output consistent. See attached. I'm now leaning against back-patching, as there is a user-visible behavior change in EXPLAIN, and I don't think there are really any severe consequences to the mistaken assignments. regards, tom lane From 19382e5a4af617a56e86d197f394bed93862cd66 Mon Sep 17 00:00:00 2001 From: Tom Lane <tgl@sss.pgh.pa.us> Date: Wed, 21 May 2025 12:00:38 -0400 Subject: [PATCH v1] In ExecInitModifyTable, don't scribble on the source plan. The code carelessly modified mtstate->ps.plan->targetlist, which it's not supposed to do. Fortunately, there's not really any need to do that because the planner already set up a perfectly acceptable targetlist for the plan node. We just need to remove the erroneous assignments and update some relevant comments. As it happens, the erroneous assignments caused the targetlist to point to a different part of the source plan tree, so that there isn't really a risk of the pointer becoming dangling after executor termination. The only visible effect we can find is that EXPLAIN will show upper references to the ModifyTable's output expressions using different variables. Formerly it showed Vars from the first target relation that survived executor-startup pruning. Now it always shows such references using the first relation appearing in the planner output, independently of what happens during executor pruning. On the whole that seems like a good thing. (We do need a small tweak in ExplainPreScanNode to ensure that that relation will receive a refname assignment in set_rtable_names, even if it got pruned at startup.) Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Author: Andres Freund <andres@anarazel.de> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/213261.1747611093@sss.pgh.pa.us --- src/backend/commands/explain.c | 4 ++ src/backend/executor/nodeModifyTable.c | 10 ++-- src/backend/optimizer/plan/setrefs.c | 7 +-- src/test/regress/expected/partition_prune.out | 47 ++++++++++--------- src/test/regress/sql/partition_prune.sql | 6 +-- 5 files changed, 41 insertions(+), 33 deletions(-) diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 786ee865f14..45942b4dd4f 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -1232,6 +1232,10 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used) if (((ModifyTable *) plan)->exclRelRTI) *rels_used = bms_add_member(*rels_used, ((ModifyTable *) plan)->exclRelRTI); + /* Ensure Vars used in RETURNING will have refnames */ + if (plan->targetlist) + *rels_used = bms_add_member(*rels_used, + linitial_int(((ModifyTable *) plan)->resultRelations)); break; case T_Append: *rels_used = bms_add_members(*rels_used, diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c index 46d533b7288..2bc89bf84dc 100644 --- a/src/backend/executor/nodeModifyTable.c +++ b/src/backend/executor/nodeModifyTable.c @@ -4830,12 +4830,11 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) ExprContext *econtext; /* - * Initialize result tuple slot and assign its rowtype using the first - * RETURNING list. We assume the rest will look the same. + * Initialize result tuple slot and assign its rowtype using the plan + * node's declared targetlist, which the planner set up to be the same + * as the first (before runtime pruning) RETURNING list. We assume + * all the result rels will produce compatible output. */ - mtstate->ps.plan->targetlist = (List *) linitial(returningLists); - - /* Set up a slot for the output of the RETURNING projection(s) */ ExecInitResultTupleSlotTL(&mtstate->ps, &TTSOpsVirtual); slot = mtstate->ps.ps_ResultTupleSlot; @@ -4865,7 +4864,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags) * We still must construct a dummy result tuple type, because InitPlan * expects one (maybe should change that?). */ - mtstate->ps.plan->targetlist = NIL; ExecInitResultTypeTL(&mtstate->ps); mtstate->ps.ps_ExprContext = NULL; diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index 150e9f060ee..8e1eb77dd49 100644 --- a/src/backend/optimizer/plan/setrefs.c +++ b/src/backend/optimizer/plan/setrefs.c @@ -1097,9 +1097,10 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) /* * Set up the visible plan targetlist as being the same as - * the first RETURNING list. This is for the use of - * EXPLAIN; the executor won't pay any attention to the - * targetlist. We postpone this step until here so that + * the first RETURNING list. This is mostly for the use + * of EXPLAIN; the executor won't execute that targetlist, + * although it does use it to prepare the node's result + * tuple slot. We postpone this step until here so that * we don't have to do set_returning_clause_references() * twice on identical targetlists. */ diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out index 0bf35260b46..d1966cd7d82 100644 --- a/src/test/regress/expected/partition_prune.out +++ b/src/test/regress/expected/partition_prune.out @@ -4553,16 +4553,18 @@ create view part_abc_view as select * from part_abc where b <> 'a' with check op prepare update_part_abc_view as update part_abc_view set b = $2 where a = $1 returning *; -- Only the unpruned partition should be shown in the list of relations to be -- updated -explain (costs off) execute update_part_abc_view (1, 'd'); - QUERY PLAN -------------------------------------------------------- - Update on part_abc - Update on part_abc_1 +explain (verbose, costs off) execute update_part_abc_view (1, 'd'); + QUERY PLAN +----------------------------------------------------------------------------- + Update on public.part_abc + Output: part_abc_1.a, part_abc_1.b, part_abc_1.c + Update on public.part_abc_1 -> Append Subplans Removed: 1 - -> Seq Scan on part_abc_1 - Filter: ((b <> 'a'::text) AND (a = $1)) -(6 rows) + -> Seq Scan on public.part_abc_1 + Output: $2, part_abc_1.tableoid, part_abc_1.ctid + Filter: ((part_abc_1.b <> 'a'::text) AND (part_abc_1.a = $1)) +(8 rows) execute update_part_abc_view (1, 'd'); a | b | c @@ -4570,28 +4572,31 @@ execute update_part_abc_view (1, 'd'); 1 | d | t (1 row) -explain (costs off) execute update_part_abc_view (2, 'a'); - QUERY PLAN -------------------------------------------------------- - Update on part_abc - Update on part_abc_2 part_abc_1 +explain (verbose, costs off) execute update_part_abc_view (2, 'a'); + QUERY PLAN +----------------------------------------------------------------------------- + Update on public.part_abc + Output: part_abc_1.a, part_abc_1.b, part_abc_1.c + Update on public.part_abc_2 -> Append Subplans Removed: 1 - -> Seq Scan on part_abc_2 part_abc_1 - Filter: ((b <> 'a'::text) AND (a = $1)) -(6 rows) + -> Seq Scan on public.part_abc_2 + Output: $2, part_abc_2.tableoid, part_abc_2.ctid + Filter: ((part_abc_2.b <> 'a'::text) AND (part_abc_2.a = $1)) +(8 rows) execute update_part_abc_view (2, 'a'); ERROR: new row violates check option for view "part_abc_view" DETAIL: Failing row contains (2, a, t). -- All pruned. -explain (costs off) execute update_part_abc_view (3, 'a'); - QUERY PLAN ------------------------------ - Update on part_abc +explain (verbose, costs off) execute update_part_abc_view (3, 'a'); + QUERY PLAN +---------------------------------------------------- + Update on public.part_abc + Output: part_abc_1.a, part_abc_1.b, part_abc_1.c -> Append Subplans Removed: 2 -(3 rows) +(4 rows) execute update_part_abc_view (3, 'a'); a | b | c diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql index f6db9479f54..d93c0c03bab 100644 --- a/src/test/regress/sql/partition_prune.sql +++ b/src/test/regress/sql/partition_prune.sql @@ -1371,12 +1371,12 @@ create view part_abc_view as select * from part_abc where b <> 'a' with check op prepare update_part_abc_view as update part_abc_view set b = $2 where a = $1 returning *; -- Only the unpruned partition should be shown in the list of relations to be -- updated -explain (costs off) execute update_part_abc_view (1, 'd'); +explain (verbose, costs off) execute update_part_abc_view (1, 'd'); execute update_part_abc_view (1, 'd'); -explain (costs off) execute update_part_abc_view (2, 'a'); +explain (verbose, costs off) execute update_part_abc_view (2, 'a'); execute update_part_abc_view (2, 'a'); -- All pruned. -explain (costs off) execute update_part_abc_view (3, 'a'); +explain (verbose, costs off) execute update_part_abc_view (3, 'a'); execute update_part_abc_view (3, 'a'); deallocate update_part_abc_view; -- 2.43.5