Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries |
Date | |
Msg-id | 10619.1484231378@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Responses |
Re: [HACKERS] BUG: pg_stat_statements query normalization issueswith combined queries
|
List | pgsql-hackers |
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > At Fri, 30 Dec 2016 15:10:42 +0100 (CET), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.20.1612301453280.32017@lancre> >> - I cannot use the intermediate node trick suggested by Tom because >> it does not work for utility statements which do not have plans, so >> the code still adds location & length, sorry. I still really, really don't like this patch. I think it's going to create continuing maintenance problems, because of shortcuts like this: +#define isParseNodeTag(tag) ((T_Query <= (tag)) && ((tag) < T_A_Expr)) We've never had any code that depends on ranges of nodetag numbers, and now is not the time to start. (The fact that you had to randomly relocate some existing tags to make this work didn't make me any happier about it.) >> - I still use the 'last_semicolon' lexer variable. The alternative is to >> change rules so as not to skip empty statements, then write a loop to >> compute the length based on successor location, and remove the empty >> statements. It can be done, I do not think it is better, it is only >> different and more verbose. I'll do it if required by a committer. > I think this is doable in the way shown in this patch. But this > seems somewhat bogus, too.. Yeah, I doubt that this technique for getting the raw locations in the grammar+lexer works reliably. In general, Bison is a bit asynchronous in terms of scanning, and may or may not have scanned a token ahead of what the current production covers. So having production rules that look at the scanner state is just dangerous. You should be relying on the Bison locations mechanism (@N), instead. That has some gotchas of its own with respect to locations of nonterminals, but at least they're known --- and we could fix them if we had to. Anyway, I decided to see what it would take to do it the way I had in mind, which was to stick a separate RawStmt node atop each statement parsetree. The project turned out a bit larger than I hoped :-(, but I'm really pleased with the end result, because it makes the APIs around lists of statements much simpler and more regular than they used to be. I would probably propose most of the attached patch for adoption even if we weren't interested in tracking statement boundaries. In brief, what this does is: * The output of raw parsing is now always a list of RawStmt nodes; the statement-type-dependent nodes are one level down from that. This is similar to the existing rule that the output of parse analysis is always a list of Query nodes, even though the Query struct is mostly empty in the CMD_UTILITY case. Furthermore, the output of pg_plan_queries is now always a list of PlannedStmt nodes, even for utility statements. In the case of a utility statement, "planning" just consists of wrapping a CMD_UTILITY PlannedStmt around the utility node. Now, every list of statements has a consistent head-node type depending on how far along it is in processing. * This allows us to keep statement location/length in RawStmt, Query, and PlannedStmt nodes, and not anywhere else. * This results in touching quite a bit of code that is concerned with lists of statements in different formats, but in most places, it's simple changes like replacing generic Node* pointers with specific RawStmt* or PlannedStmt* pointers. I think that's an unalloyed good. * To preserve the new rule that the node passed to top-level parse_analyze() is now always a RawStmt, I made the grammar insert RawStmts in the sub-statements of EXPLAIN, DECLARE CURSOR, etc. This adds a bit of cruft in gram.y but avoids cruft elsewhere. * In addition, I had to change the API of ProcessUtility() so that what it's passed is the wrapper PlannedStmt not just the bare utility statement. This is what allows pg_stat_statements to get at the location info for a utility statement. This will affect all users of ProcessUtility_hook, but the changes are pretty trivial for those that don't care about location info --- see the changes to contrib/sepgsql/hooks.c for an example. * Because PlannedStmt also carries a canSetTag field, we're able to get rid of some ad-hoc rules about how to reconstruct canSetTag for a bare utility statement (specifically, the assumption that a utility is canSetTag if and only if it's the only one in its list). While I see no near-term need for relaxing that, it's nice to get rid of the ad-hocery. * I chose to also rearrange the post-parse-analysis representation of DECLARE CURSOR so that it looks more like EXPLAIN, PREPARE, etc. That is, the contained SELECT remains a child of the DeclareCursorStmt rather than getting flipped around to be the other way. Possibly this patch could have been made to work without that, but the existing code had some weird rules about how the presence of a PlannedStmt where a utility command was expected meant it was DECLARE CURSOR, and I was out to get rid of weird rules like that one. With these changes, it's true for both Query and PlannedStmt that utilityStmt is non-null if and only if commandType is CMD_UTILITY. That allows simplifying a whole lot of places that were testing both fields --- I think a few of those were just defensive programming, but in a lot of them, it was actually necessary to avoid confusing DECLARE CURSOR with SELECT. So this ends up costing one extra palloc per statement, or two extra in the case of a utility statement, but really that's quite negligible compared to everything else that happens in processing a statement. As against that, I think it makes the related code a lot clearer. The sheer size of the patch is a bit more than Fabien's patch, but what it is touching is not per-statement-type code but code that works generically with lists of statements. So I think it's less likely to cause problems for other patches. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: