Extending error-location reports deeper into the system - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Extending error-location reports deeper into the system |
Date | |
Msg-id | 24172.1219598234@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Extending error-location reports deeper into the system
|
List | pgsql-hackers |
We currently have the ability to generate error location pointers, such as regression=# select nosuchcolumn from int8_tbl; ERROR: column "nosuchcolumn" does not exist LINE 1: select nosuchcolumn from int8_tbl; ^ for grammar-detected syntax errors and for errors during first-level semantic analysis of certain constructs (variables, operators, functions are about all IIRC). Beyond that, it's not possible because we don't include location fields in any node types used in post-parse-analysis query trees. There was some discussion about this back in March in connection with a patch proposed by Peter, and in http://archives.postgresql.org/pgsql-hackers/2008-03/msg00631.php I wrote > the current parser location mechanism stores locations only for > nodes that appear in raw grammar trees (gram.y output), *not* in > analyzed expressions (transformExpr output). This was an intentional > choice based on a couple of factors: > > * Once we no longer have the parser input string available, the location > information would be just so much wasted space. > > * It would add a weird special case to the equalfuncs.c routines: > should location fields be compared? (Probably not, but it seems a bit > unprincipled to ignore them.) And other places might have comparable > uncertainties what to do with 'em. It occurred to me today that one of the foundational assumptions of that old decision has changed, namely that the original query text isn't available after parsing. We have recently fixed things so that it almost always *is* available. So I think a reasonable case can be made for extending most or all querytree node types to include a location field, and then using these fields in the way sketched in the above-referenced message to include location pointers in many more error messages than we do now. The point about equalfuncs behavior isn't bothering me a lot at the moment. It seems clear that we *do* want equal() to ignore location fields, because one of the main purposes it's used for is to note whether, eg, "ORDER BY x" and "GROUP BY x" are referring to the same variable, and of course those two occurrences aren't going to have the same syntactic position. Another interesting point is outfuncs/readfuncs behavior. I think that we'd want outfuncs to print the location fields, because they're possibly useful for debugging; but readfuncs should ignore the data and set the fields to -1 (unknown) when reading nodes back in. The reason for this is that the only application for reading nodes in is sucking in stored rules, default expressions, etc. And these are exactly the cases where we indeed no longer have the original source text handy. If we didn't set the locations to unknown, then errors complaining about problems arising within a rule would try to print pointers to locations in the calling query's text having the same offsets as the problematic item had in the original CREATE RULE or similar command. Not what we want. There is going to be some small distributed overhead from adding an additional integer field to so many common Node types, but I find it hard to believe that it'd be measurable. So this all seems doable and probably not a very large task to get the infrastructure in place, though of course actually extending many error messages to include location pointers will probably happen piecemeal over time. Thoughts, objections? If there are none I'm tempted to work on this in the week remaining before September commitfest. regards, tom lane
pgsql-hackers by date: