Some progress on INSERT/SELECT/GROUP BY bugs - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Some progress on INSERT/SELECT/GROUP BY bugs |
Date | |
Msg-id | 13811.926640105@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: [HACKERS] Some progress on INSERT/SELECT/GROUP BY bugs
Re: [HACKERS] Some progress on INSERT/SELECT/GROUP BY bugs Re: [HACKERS] Some progress on INSERT/SELECT/GROUP BY bugs |
List | pgsql-hackers |
I believe I've identified the main cause of the peculiar behavior we are seeing with INSERT ... SELECT ... GROUP/ORDER BY: it's a subtle parser bug. Here is the test case I'm looking at: CREATE TABLE si_tmpverifyaccountbalances ( type int4 NOT NULL, memberid int4 NOT NULL, categoriesid int4NOT NULL, amount numeric); CREATE TABLE invoicelinedetails ( invoiceid int4, memberid int4, totshippinghandling numeric, invoicelinesidint4); INSERT INTO si_tmpverifyaccountbalances SELECT invoiceid+3, memberid, 1, totshippinghandling FROM invoicelinedetails GROUP BY invoiceid+3, memberid, totshippinghandling; ERROR: INSERT has more expressions than target columns The reason this is coming out is that the matching of GROUP BY (also ORDER BY) items to targetlist entries is fundamentally broken in this context. The GROUP BY items "memberid" and "totshippinghandling" are simply unvarnished Ident nodes when they arrive at findTargetlistEntry() in parse_clause.c; what findTargetlistEntry() does with them is to try to match them against the resdom names of the existing targetlist items. I think that's correct behavior in the plain SELECT case (but note it means "SELECT a AS b, b AS c GROUP BY b" will really group by a not b --- is that per spec??). But it fails miserably in the INSERT/SELECT case, because by the time control gets here, the targetlist items have been given resdom names *corresponding to the column names of the target table*. So, in the example at hand, "memberid" is matched to the correct column by pure luck (because it has the same name in the destination table), and then "totshippinghandling" is not recognized as one of the existing TLEs because it does not match any destination column name. Now, call me silly, but it seems to me that SELECT ... GROUP BY ought to mean the same thing no matter whether there is an INSERT in front of it or not, and thus that letting target column names affect the meaning of GROUP BY items is dead wrong. (Don't have a spec to check this with, however.) I believe the most reasonable fix for this is to postpone relabeling of the targetlist entries with destination column names until after analysis of the SELECT's subsidiary clauses is complete. In particular, it should *not* be done instantly when each TLE is made, which is what MakeTargetEntryIdent currently does. The TLEs should have the same resnames as in the SELECT case until after subsidiary clause processing is done. (MakeTargetEntryIdent is broken anyway because it tries to associate a destination column with every TLE, even the resjunk ones. The reason we see the quoted error message in this situation is that after findTargetlistEntry fails to detect that totshippinghandling is already a TLE, it calls MakeTargetEntryIdent to make a junk TLE for totshippinghandling, and then MakeTargetEntryIdent tries to find a target column to go with the junk TLE. So the revised code should only assign dest column names to non-junk TLEs.) I'm not really familiar enough with the parser to want to tackle this size of change by myself --- Thomas, do you want to do it? I think it's largely a matter of moving code around, but I'm not sure where is the right place for it... regards, tom lane
pgsql-hackers by date: