Thread: MERGE bug report
Hello Hackers, Reporting a bug with the new MERGE statement. Tested against 75edb919613ee835e7680e40137e494c7856bcf9. psql output as follows: ... psql:merge.sql:33: ERROR: variable not found in subplan target lists ROLLBACK [local] joe@joe=# \errverbose ERROR: XX000: variable not found in subplan target lists LOCATION: fix_join_expr_mutator, setrefs.c:2800 Stack trace: fix_join_expr_mutator setrefs.c:2800 expression_tree_mutator nodeFuncs.c:3348 fix_join_expr_mutator setrefs.c:2853 expression_tree_mutator nodeFuncs.c:2992 fix_join_expr_mutator setrefs.c:2853 expression_tree_mutator nodeFuncs.c:3348 fix_join_expr_mutator setrefs.c:2853 fix_join_expr setrefs.c:2753 set_plan_refs setrefs.c:1085 set_plan_references setrefs.c:315 standard_planner planner.c:498 planner planner.c:277 pg_plan_query postgres.c:883 pg_plan_queries postgres.c:975 exec_simple_query postgres.c:1169 PostgresMain postgres.c:4520 BackendRun postmaster.c:4593 BackendStartup postmaster.c:4321 ServerLoop postmaster.c:1801 PostmasterMain postmaster.c:1473 main main.c:202 __libc_start_main 0x00007fc4ccc0b1e2 _start 0x000000000048804e Reproducer script: BEGIN; DROP TABLE IF EXISTS item, incoming, source CASCADE; CREATE TABLE item (order_id INTEGER NOT NULL, item_id INTEGER NOT NULL, quantity INTEGER NOT NULL, price NUMERIC NOT NULL, CONSTRAINT pk_item PRIMARY KEY (order_id, item_id)); INSERT INTO item VALUES (100, 1, 4, 100.00), (100, 2, 9, 199.00); CREATE TABLE incoming (order_id, item_id, quantity, price) AS (VALUES (100, 1, 4, 100.00), (100, 3, 1, 200.00)); CREATE TABLE source (order_id, item_id, quantity, price) AS (SELECT order_id, item_id, incoming.quantity, incoming.price FROM item LEFT JOIN incoming USING (order_id, item_id)); MERGE INTO item a USING source b ON (a.order_id, a.item_id) = (b.order_id, b.item_id) WHEN NOT MATCHED THEN INSERT (order_id, item_id, quantity, price) VALUES (order_id, item_id, quantity, price) WHEN MATCHED AND a.* IS DISTINCT FROM b.* THEN UPDATE SET (quantity, price) = (b.quantity, b.price) WHEN MATCHED AND (b.quantity IS NULL AND b.price IS NULL) THEN DELETE; COMMIT; It seems related to the use of a.* and b.* Sorry I can't be more specific. Error manifests when planning occurs and that is well outside of my code base knowledge. Hope this helps. Cheers, -Joe
On Tue, Apr 5, 2022 at 3:18 PM Joe Wildish <joe@lateraljoin.com> wrote:
Hello Hackers,
Reporting a bug with the new MERGE statement. Tested against 75edb919613ee835e7680e40137e494c7856bcf9.
psql output as follows:
...
psql:merge.sql:33: ERROR: variable not found in subplan target lists
ROLLBACK
[local] joe@joe=# \errverbose
ERROR: XX000: variable not found in subplan target lists
LOCATION: fix_join_expr_mutator, setrefs.c:2800
Stack trace:
fix_join_expr_mutator setrefs.c:2800
expression_tree_mutator nodeFuncs.c:3348
fix_join_expr_mutator setrefs.c:2853
expression_tree_mutator nodeFuncs.c:2992
fix_join_expr_mutator setrefs.c:2853
expression_tree_mutator nodeFuncs.c:3348
fix_join_expr_mutator setrefs.c:2853
fix_join_expr setrefs.c:2753
set_plan_refs setrefs.c:1085
set_plan_references setrefs.c:315
standard_planner planner.c:498
planner planner.c:277
pg_plan_query postgres.c:883
pg_plan_queries postgres.c:975
exec_simple_query postgres.c:1169
PostgresMain postgres.c:4520
BackendRun postmaster.c:4593
BackendStartup postmaster.c:4321
ServerLoop postmaster.c:1801
PostmasterMain postmaster.c:1473
main main.c:202
__libc_start_main 0x00007fc4ccc0b1e2
_start 0x000000000048804e
Reproducer script:
BEGIN;
DROP TABLE IF EXISTS item, incoming, source CASCADE;
CREATE TABLE item
(order_id INTEGER NOT NULL,
item_id INTEGER NOT NULL,
quantity INTEGER NOT NULL,
price NUMERIC NOT NULL,
CONSTRAINT pk_item PRIMARY KEY (order_id, item_id));
INSERT INTO item VALUES (100, 1, 4, 100.00), (100, 2, 9, 199.00);
CREATE TABLE incoming (order_id, item_id, quantity, price)
AS (VALUES (100, 1, 4, 100.00), (100, 3, 1, 200.00));
CREATE TABLE source (order_id, item_id, quantity, price) AS
(SELECT order_id, item_id, incoming.quantity, incoming.price
FROM item LEFT JOIN incoming USING (order_id, item_id));
MERGE INTO item a
USING source b
ON (a.order_id, a.item_id) =
(b.order_id, b.item_id)
WHEN NOT MATCHED
THEN INSERT (order_id, item_id, quantity, price)
VALUES (order_id, item_id, quantity, price)
WHEN MATCHED
AND a.* IS DISTINCT FROM b.*
THEN UPDATE SET (quantity, price) = (b.quantity, b.price)
WHEN MATCHED
AND (b.quantity IS NULL AND b.price IS NULL)
THEN DELETE;
COMMIT;
It seems related to the use of a.* and b.*
Sorry I can't be more specific. Error manifests when planning occurs and that is well outside of my code base knowledge.
Hope this helps.
Cheers,
-Joe
Hi,
It seems all the calls to fix_join_expr_mutator() are within setrefs.c
I haven't found where in nodeFuncs.c fix_join_expr_mutator is called.
I am on commit 75edb919613ee835e7680e40137e494c7856bcf9 .
On Tue, Apr 5, 2022 at 3:35 PM Zhihong Yu <zyu@yugabyte.com> wrote:
On Tue, Apr 5, 2022 at 3:18 PM Joe Wildish <joe@lateraljoin.com> wrote:Hello Hackers,
Reporting a bug with the new MERGE statement. Tested against 75edb919613ee835e7680e40137e494c7856bcf9.
psql output as follows:
...
psql:merge.sql:33: ERROR: variable not found in subplan target lists
ROLLBACK
[local] joe@joe=# \errverbose
ERROR: XX000: variable not found in subplan target lists
LOCATION: fix_join_expr_mutator, setrefs.c:2800
Stack trace:
fix_join_expr_mutator setrefs.c:2800
expression_tree_mutator nodeFuncs.c:3348
fix_join_expr_mutator setrefs.c:2853
expression_tree_mutator nodeFuncs.c:2992
fix_join_expr_mutator setrefs.c:2853
expression_tree_mutator nodeFuncs.c:3348
fix_join_expr_mutator setrefs.c:2853
fix_join_expr setrefs.c:2753
set_plan_refs setrefs.c:1085
set_plan_references setrefs.c:315
standard_planner planner.c:498
planner planner.c:277
pg_plan_query postgres.c:883
pg_plan_queries postgres.c:975
exec_simple_query postgres.c:1169
PostgresMain postgres.c:4520
BackendRun postmaster.c:4593
BackendStartup postmaster.c:4321
ServerLoop postmaster.c:1801
PostmasterMain postmaster.c:1473
main main.c:202
__libc_start_main 0x00007fc4ccc0b1e2
_start 0x000000000048804e
Reproducer script:
BEGIN;
DROP TABLE IF EXISTS item, incoming, source CASCADE;
CREATE TABLE item
(order_id INTEGER NOT NULL,
item_id INTEGER NOT NULL,
quantity INTEGER NOT NULL,
price NUMERIC NOT NULL,
CONSTRAINT pk_item PRIMARY KEY (order_id, item_id));
INSERT INTO item VALUES (100, 1, 4, 100.00), (100, 2, 9, 199.00);
CREATE TABLE incoming (order_id, item_id, quantity, price)
AS (VALUES (100, 1, 4, 100.00), (100, 3, 1, 200.00));
CREATE TABLE source (order_id, item_id, quantity, price) AS
(SELECT order_id, item_id, incoming.quantity, incoming.price
FROM item LEFT JOIN incoming USING (order_id, item_id));
MERGE INTO item a
USING source b
ON (a.order_id, a.item_id) =
(b.order_id, b.item_id)
WHEN NOT MATCHED
THEN INSERT (order_id, item_id, quantity, price)
VALUES (order_id, item_id, quantity, price)
WHEN MATCHED
AND a.* IS DISTINCT FROM b.*
THEN UPDATE SET (quantity, price) = (b.quantity, b.price)
WHEN MATCHED
AND (b.quantity IS NULL AND b.price IS NULL)
THEN DELETE;
COMMIT;
It seems related to the use of a.* and b.*
Sorry I can't be more specific. Error manifests when planning occurs and that is well outside of my code base knowledge.
Hope this helps.
Cheers,
-JoeHi,It seems all the calls to fix_join_expr_mutator() are within setrefs.cI haven't found where in nodeFuncs.c fix_join_expr_mutator is called.I am on commit 75edb919613ee835e7680e40137e494c7856bcf9 .
Pardon - I typed too fast:
The call to fix_join_expr_mutator() is on this line (3348):
mutator((Node *) lfirst(temp),
context));
On Wed, Apr 6, 2022 at 6:18 AM Joe Wildish <joe@lateraljoin.com> wrote:
Hello Hackers,
Reporting a bug with the new MERGE statement. Tested against 75edb919613ee835e7680e40137e494c7856bcf9.
psql output as follows:
...
psql:merge.sql:33: ERROR: variable not found in subplan target lists
ROLLBACK
[local] joe@joe=# \errverbose
ERROR: XX000: variable not found in subplan target lists
LOCATION: fix_join_expr_mutator, setrefs.c:2800
Stack trace:
fix_join_expr_mutator setrefs.c:2800
expression_tree_mutator nodeFuncs.c:3348
fix_join_expr_mutator setrefs.c:2853
expression_tree_mutator nodeFuncs.c:2992
fix_join_expr_mutator setrefs.c:2853
expression_tree_mutator nodeFuncs.c:3348
fix_join_expr_mutator setrefs.c:2853
fix_join_expr setrefs.c:2753
set_plan_refs setrefs.c:1085
set_plan_references setrefs.c:315
standard_planner planner.c:498
planner planner.c:277
pg_plan_query postgres.c:883
pg_plan_queries postgres.c:975
exec_simple_query postgres.c:1169
PostgresMain postgres.c:4520
BackendRun postmaster.c:4593
BackendStartup postmaster.c:4321
ServerLoop postmaster.c:1801
PostmasterMain postmaster.c:1473
main main.c:202
__libc_start_main 0x00007fc4ccc0b1e2
_start 0x000000000048804e
Reproducer script:
BEGIN;
DROP TABLE IF EXISTS item, incoming, source CASCADE;
CREATE TABLE item
(order_id INTEGER NOT NULL,
item_id INTEGER NOT NULL,
quantity INTEGER NOT NULL,
price NUMERIC NOT NULL,
CONSTRAINT pk_item PRIMARY KEY (order_id, item_id));
INSERT INTO item VALUES (100, 1, 4, 100.00), (100, 2, 9, 199.00);
CREATE TABLE incoming (order_id, item_id, quantity, price)
AS (VALUES (100, 1, 4, 100.00), (100, 3, 1, 200.00));
CREATE TABLE source (order_id, item_id, quantity, price) AS
(SELECT order_id, item_id, incoming.quantity, incoming.price
FROM item LEFT JOIN incoming USING (order_id, item_id));
MERGE INTO item a
USING source b
ON (a.order_id, a.item_id) =
(b.order_id, b.item_id)
WHEN NOT MATCHED
THEN INSERT (order_id, item_id, quantity, price)
VALUES (order_id, item_id, quantity, price)
WHEN MATCHED
AND a.* IS DISTINCT FROM b.*
THEN UPDATE SET (quantity, price) = (b.quantity, b.price)
WHEN MATCHED
AND (b.quantity IS NULL AND b.price IS NULL)
THEN DELETE;
COMMIT;
It seems related to the use of a.* and b.*
case these whole-row Vars are not included in the targetlist.
Attached is an attempt for the fix.
Thanks
Richard
Attachment
On 2022-Apr-06, Richard Guo wrote: > That's right. The varattno is set to zero for whole-row Var. And in this > case these whole-row Vars are not included in the targetlist. > > Attached is an attempt for the fix. Wow, this is very interesting. I was surprised that this patch was necessary at all -- I mean, if wholerow refs don't work, then why do references to any other columns work? The answer is that parse_merge.c is already setting up the subplan's targetlist by expanding all vars of the source relation. I then remembered than in Simon's (or Pavan's) original coding, parse_merge.c had a hack to include a var with the source's wholerow in that targetlist, which I had later removed ... I eventually realized that there's no need for parse_merge.c to expand the source rel at all, and indeed it's wasteful: we can just let preprocess_targetlist include the vars that are referenced by either quals or each action's targetlist instead. That led me to the attached patch, which is not commit-quality yet but it should show what I have in mind. I added a test query to tickle this problematic case. Another point, not completely connected to this bug but appearing in the same function, is that we have some redundant code: we can just let the stanza for UPDATE/DELETE do the identity columns dance. This saves a few lines in the MERGE-specific stanza there, which was doing exactly the same thing. (There's a difference in the "inh" test, but I think that was just outdated.) I also discovered that the comment for fix_join_expr needed an update, since it doesn't mention MERGE, and it does mention all other situations in which it is used. Added that too. This patch is a comment about "aggregates, window functions and placeholder vars". This was relevant and correct when only the qual of each action was being handled (i.e., Richard's patch). Now that we're also handling the action's targetlist, I think I need to put the PVC flags back. But no tests broke, which probably means we also need some additional tests cases. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Attachment
On Sat, Apr 9, 2022 at 5:26 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Apr-06, Richard Guo wrote:
> That's right. The varattno is set to zero for whole-row Var. And in this
> case these whole-row Vars are not included in the targetlist.
>
> Attached is an attempt for the fix.
Wow, this is very interesting. I was surprised that this patch was
necessary at all -- I mean, if wholerow refs don't work, then why do
references to any other columns work? The answer is that parse_merge.c
is already setting up the subplan's targetlist by expanding all vars of
the source relation. I then remembered than in Simon's (or Pavan's)
original coding, parse_merge.c had a hack to include a var with the
source's wholerow in that targetlist, which I had later removed ...
At first I was wondering whether we need to also include vars used in
each action's targetlist, just as what we did for each action's qual.
Then later I realized parse_merge.c already did that. But now it looks
much better to process them two in preprocess_targetlist.
each action's targetlist, just as what we did for each action's qual.
Then later I realized parse_merge.c already did that. But now it looks
much better to process them two in preprocess_targetlist.
I eventually realized that there's no need for parse_merge.c to expand
the source rel at all, and indeed it's wasteful: we can just let
preprocess_targetlist include the vars that are referenced by either
quals or each action's targetlist instead. That led me to the attached
patch, which is not commit-quality yet but it should show what I have in
mind.
This patch looks in a good shape to me.
A minor comment is that we can use list_concat_copy(list1, list2)
instead of list_concat(list_copy(list1), list2) for better efficiency.
A minor comment is that we can use list_concat_copy(list1, list2)
instead of list_concat(list_copy(list1), list2) for better efficiency.
Thanks
Richard
Richard
On 2022-Apr-11, Richard Guo wrote: > At first I was wondering whether we need to also include vars used in > each action's targetlist, just as what we did for each action's qual. > Then later I realized parse_merge.c already did that. But now it looks > much better to process them two in preprocess_targetlist. Yeah. I pushed that. However, now EXPLAIN VERBOSE doesn't show the columns from the source relation in the Output line --- I think only those that are used as join quals are shown, thanks to distribute_quals_to_rels. I think it would be better to fix this. Maybe expanding the source target list earlier is called for, after all. I looked at transformUpdateStmt and siblings for inspiration, but came out blank. > A minor comment is that we can use list_concat_copy(list1, list2) > instead of list_concat(list_copy(list1), list2) for better efficiency. Thanks for that tip. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "La vida es para el que se aventura"