Thread: Wrong results with subquery pullup and grouping sets
While working on the expansion of virtual generated columns, Dean encountered $subject in [1], which can be reproduced using the query below. create table t (a int); insert into t values (1); # select a, b from (select a, a as b from t) ss group by grouping sets(a, b) having b = 1; a | b ---+--- 1 | (1 row) Note that the having clause filters out the wrong row. In 90947674f, we fixed a similar issue by wrapping subquery outputs that are non-var expressions in PlaceHolderVars. This prevents const-simplification from merging them into the surrounding expression after subquery pullup and resulting in something that won't match the grouping set expression in setrefs.c. It seems that we still have loose ends with that fix. This issue illustrates that if the subquery's target list contains two or more identical Var expressions, we can also fail to match the Var expression to the expected grouping set expression. This is not related to const-simplification, but rather to how we match expressions to lower target items in setrefs.c. For sort/group expressions, we use ressortgroupref matching, which works well. For other expressions, we primarily rely on comparing the expressions to determine if they are the same. This is why, in the query above, the Var 'b' in the HAVING clause matches the first target entry of the subquery, rather than the expected second one. You might wonder why this issue wasn't fixed by the commit that introduced a dummy RTE representing the output of the grouping step. I think that commit prevents one expression that matches the grouping item from being const-folded or preprocessed, but it doesn't prevent setrefs.c from matching the expression to some other identical ones. It seems to me that simple Var expressions in a subquery's target list also need to retain their separate identity in order to match grouping set columns after subquery pullup, not just non-var expressions. Alternatively, is there a way to teach setrefs.c to match an expression to the expected one when there are multiple identical expressions? [1] https://postgr.es/m/CAEZATCWsKqCtZ=ud26-gGV3zHt-hjS4OKG43GCBhSaYUWyfKiw@mail.gmail.com Thanks Richard
On Wed, Mar 5, 2025 at 11:02 AM Richard Guo <guofenglinux@gmail.com> wrote: > create table t (a int); > insert into t values (1); > > # select a, b > from (select a, a as b from t) ss > group by grouping sets(a, b) > having b = 1; > a | b > ---+--- > 1 | > (1 row) > > Note that the having clause filters out the wrong row. BTW, this issue is not limited to HAVING clause; it can also happen to targetlist. # select a, b+1 from (select a, a as b from t) ss group by grouping sets(a, b); a | ?column? ---+---------- 1 | 2 | (2 rows) Surely this is wrong. Thanks Richard
On Wed, Mar 5, 2025 at 11:02 AM Richard Guo <guofenglinux@gmail.com> wrote: > It seems to me that simple Var expressions in a subquery's target list > also need to retain their separate identity in order to match grouping > set columns after subquery pullup, not just non-var expressions. I noticed the adjacent code that sets wrap_non_vars to true if we are considering an appendrel child subquery, and I doubt this is necessary. /* * If we are dealing with an appendrel member then anything that's not a * simple Var has to be turned into a PlaceHolderVar. We force this to * ensure that what we pull up doesn't get merged into a surrounding * expression during later processing and then fail to match the * expression actually available from the appendrel. */ if (containing_appendrel != NULL) rvcontext.wrap_non_vars = true; As explained in e42e31243, the only part of the upper query that could reference the appendrel child yet is the translated_vars list of the associated AppendRelInfo that we just made for this child, and we do not want to force use of PHVs in the AppendRelInfo (see the comment in perform_pullup_replace_vars). In fact, perform_pullup_replace_vars sets wrap_non_vars to false before performing pullup_replace_vars on the AppendRelInfo. It seems to me that this makes the code shown above unnecessary, and we can simply remove it. Any thoughts? Thanks Richard
On Wed, 5 Mar 2025 at 09:09, Richard Guo <guofenglinux@gmail.com> wrote: > > I noticed the adjacent code that sets wrap_non_vars to true if we > are considering an appendrel child subquery, and I doubt this is > necessary. > > /* > * If we are dealing with an appendrel member then anything that's not a > * simple Var has to be turned into a PlaceHolderVar. We force this to > * ensure that what we pull up doesn't get merged into a surrounding > * expression during later processing and then fail to match the > * expression actually available from the appendrel. > */ > if (containing_appendrel != NULL) > rvcontext.wrap_non_vars = true; > > As explained in e42e31243, the only part of the upper query that could > reference the appendrel child yet is the translated_vars list of the > associated AppendRelInfo that we just made for this child, and we do > not want to force use of PHVs in the AppendRelInfo (see the comment in > perform_pullup_replace_vars). In fact, perform_pullup_replace_vars > sets wrap_non_vars to false before performing pullup_replace_vars on > the AppendRelInfo. It seems to me that this makes the code shown > above unnecessary, and we can simply remove it. > Yes, that makes sense, and it seems like a fairly straightforward simplification, given that perform_pullup_replace_vars() sets it back to false shortly afterwards. The same thing happens in pull_up_constant_function(). Regards, Dean
On Wed, Mar 5, 2025 at 8:01 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > Yes, that makes sense, and it seems like a fairly straightforward > simplification, given that perform_pullup_replace_vars() sets it back > to false shortly afterwards. > > The same thing happens in pull_up_constant_function(). Thanks for looking. Attached are the patches. 0001 removes the code that sets wrap_non_vars to true for UNION ALL subqueries. And that leaves us only two cases where we need PHVs for identification purposes: 1. If the query uses grouping sets, we need a PHV for each expression of the subquery's targetlist items. 2. In the join quals of a full join, we need PHVs for variable-free expressions. In 0002, I changed the boolean wrap_non_vars to an enum, which indicates whether no expressions, all expressions, or only variable-free expressions need to be wrapped for identification purposes. Another thing is that when deciding whether to wrap the newnode in pullup_replace_vars_callback(), I initially thought that we didn't need to handle Vars/PHVs specifically, and could instead merge them into the branch for handling general expressions. However, doing so causes a plan diff in the regression tests. The reason is that a non-strict construct hidden within a lower-level PlaceHolderVar can lead the code for handling general expressions to mistakenly think that another PHV is needed, even when it isn't. Therefore, the branches for handling Vars/PHVs are retained in 0002. Thanks Richard
Attachment
On Mon, 10 Mar 2025 at 13:05, Richard Guo <guofenglinux@gmail.com> wrote: > > Attached are the patches. > This looks good to me. I did some more testing, and I wasn't able to break it. Some minor nitpicks: These 2 comment changes from 0002 could be made part of 0001: 1). In pull_up_simple_subquery(), removing the word "Again" from the comment following the deleted block, since this is now the only place that sets wrap_non_vars there. 2). In pull_up_constant_function(), moving "(See comments in pull_up_simple_subquery().)" to the next comment. > Another thing is that when deciding whether to wrap the newnode in > pullup_replace_vars_callback(), I initially thought that we didn't > need to handle Vars/PHVs specifically, and could instead merge them > into the branch for handling general expressions. However, doing so > causes a plan diff in the regression tests. The reason is that a > non-strict construct hidden within a lower-level PlaceHolderVar can > lead the code for handling general expressions to mistakenly think > that another PHV is needed, even when it isn't. Therefore, the > branches for handling Vars/PHVs are retained in 0002. Right. The comment addition in 0002, relating to that, confused me at first: * This analysis could be tighter: in particular, a non-strict * construct hidden within a lower-level PlaceHolderVar is not * reason to add another PHV. But for now it doesn't seem - * worth the code to be more exact. + * worth the code to be more exact. This is also why we need + * to handle Vars and PHVs in the above branches, rather than + * merging them into this one. AIUI, it's not really that it *needs* to handle Vars and PHVs separately, it's just better if it does, since that avoids unnecessarily wrapping a PHV again, if it contains non-strict constructs. Also, AFAICS there's no technical reason why simple Vars couldn't be handled here (all the tests pass if that branch is removed), but as a comment higher up says, that would be more expensive. So perhaps this new comment should say "This is why it's preferable to handle simple PHVs in the branch above, rather than this branch." Finally, ReplaceWrapOption should be added to pgindent's typedefs.list. Regards, Dean
On Wed, Mar 12, 2025 at 1:32 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On Mon, 10 Mar 2025 at 13:05, Richard Guo <guofenglinux@gmail.com> wrote: > > Attached are the patches. > These 2 comment changes from 0002 could be made part of 0001: > > 1). In pull_up_simple_subquery(), removing the word "Again" from the > comment following the deleted block, since this is now the only place > that sets wrap_non_vars there. > > 2). In pull_up_constant_function(), moving "(See comments in > pull_up_simple_subquery().)" to the next comment. Nice catch. > > Another thing is that when deciding whether to wrap the newnode in > > pullup_replace_vars_callback(), I initially thought that we didn't > > need to handle Vars/PHVs specifically, and could instead merge them > > into the branch for handling general expressions. However, doing so > > causes a plan diff in the regression tests. The reason is that a > > non-strict construct hidden within a lower-level PlaceHolderVar can > > lead the code for handling general expressions to mistakenly think > > that another PHV is needed, even when it isn't. Therefore, the > > branches for handling Vars/PHVs are retained in 0002. > Right. The comment addition in 0002, relating to that, confused me at first: > > * This analysis could be tighter: in particular, a non-strict > * construct hidden within a lower-level PlaceHolderVar is not > * reason to add another PHV. But for now it doesn't seem > - * worth the code to be more exact. > + * worth the code to be more exact. This is also why we need > + * to handle Vars and PHVs in the above branches, rather than > + * merging them into this one. > > AIUI, it's not really that it *needs* to handle Vars and PHVs > separately, it's just better if it does, since that avoids > unnecessarily wrapping a PHV again, if it contains non-strict > constructs. Also, AFAICS there's no technical reason why simple Vars > couldn't be handled here (all the tests pass if that branch is > removed), but as a comment higher up says, that would be more > expensive. So perhaps this new comment should say "This is why it's > preferable to handle simple PHVs in the branch above, rather than this > branch." Exactly. Technically, we could remove the branch for Vars. However, I chose to keep it because: 1) it's more efficient this way, and 2) it's better to keep the handling of Vars and PHVs aligned. I refined the comment in v2 and ended up with: * This analysis could be tighter: in particular, a non-strict * construct hidden within a lower-level PlaceHolderVar is not * reason to add another PHV. But for now it doesn't seem - * worth the code to be more exact. + * worth the code to be more exact. This is also why it's + * preferable to handle bare PHVs in the above branch, rather + * than this branch. We also prefer to handle bare Vars in a + * separate branch, as it's cheaper this way and parallels the + * handling of PHVs. > Finally, ReplaceWrapOption should be added to pgindent's typedefs.list. Nice catch. For backpatching, 0001 seems more like a cosmetic change, and I don't think it needs to be backpatched. 0002 is a bug fix, but I'm not confident enough to commit it to the back branches. Given that there are other wrong result issues with grouping sets fixed in version 18 but not in earlier versions, and that there are no field reports about this issue, perhaps it's OK to refrain from backpatching this one? Thanks Richard
Attachment
On Wed, 12 Mar 2025 at 07:45, Richard Guo <guofenglinux@gmail.com> wrote: > > I refined the comment in v2 and ended up with: > > * This analysis could be tighter: in particular, a non-strict > * construct hidden within a lower-level PlaceHolderVar is not > * reason to add another PHV. But for now it doesn't seem > - * worth the code to be more exact. > + * worth the code to be more exact. This is also why it's > + * preferable to handle bare PHVs in the above branch, rather > + * than this branch. We also prefer to handle bare Vars in a > + * separate branch, as it's cheaper this way and parallels the > + * handling of PHVs. > LGTM. > For backpatching, 0001 seems more like a cosmetic change, and I don't > think it needs to be backpatched. 0002 is a bug fix, but I'm not > confident enough to commit it to the back branches. Given that there > are other wrong result issues with grouping sets fixed in version 18 > but not in earlier versions, and that there are no field reports about > this issue, perhaps it's OK to refrain from backpatching this one? > Yes, I was thinking along the same lines. This particular issue feels a bit manufactured, and unlikely to occur in practice, but I'm happy to go with whatever you decide. Thanks for taking care of this. Regards, Dean
On Wed, Mar 12, 2025 at 6:29 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > On Wed, 12 Mar 2025 at 07:45, Richard Guo <guofenglinux@gmail.com> wrote: > > For backpatching, 0001 seems more like a cosmetic change, and I don't > > think it needs to be backpatched. 0002 is a bug fix, but I'm not > > confident enough to commit it to the back branches. Given that there > > are other wrong result issues with grouping sets fixed in version 18 > > but not in earlier versions, and that there are no field reports about > > this issue, perhaps it's OK to refrain from backpatching this one? > Yes, I was thinking along the same lines. This particular issue feels > a bit manufactured, and unlikely to occur in practice, but I'm happy > to go with whatever you decide. Thank you, Dean. I've decided to refrain from backpatching and have pushed the changes to the master branch. If we receive any complaints from the fields in the future, we can always perform the backpatching at that time. Hopefully, by then, this patch will have gone through a full beta test cycle. Thanks Richard