Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation) - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation) |
Date | |
Msg-id | CAEYLb_UhGcWR2iy_emRzw1etBsMrYxQM7A-QrnqABw5UwR4SLA@mail.gmail.com Whole thread Raw |
In response to | Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation) (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Re: pg_stat_statements normalisation without invasive
changes to the parser (was: Next steps on pg_stat_statements normalisation)
|
List | pgsql-hackers |
On 27 February 2012 06:23, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think that what Peter is on about in > http://archives.postgresql.org/pgsql-hackers/2012-02/msg01152.php > is the question of what location to use for the *result* of > 'literal string'::typename, assuming that the type's input function > doesn't complain. Generally we consider that we should use the > leftmost token's location for the location of any expression composed > of more than one input token. This is of course the same place for > 'literal string'::typename, but not for the alternate syntaxes > typename 'literal string' and cast('literal string' as typename). > I'm not terribly impressed by the proposal to put in an arbitrary > exception to that general rule for the convenience of this patch. Now, you don't have to be. It was a mistake on my part to bring the current user-visible behaviour into this. I don't see that there is necessarily a tension between your position that we should blame the leftmost token's location, and my contention that the Const "location" field shouldn't misrepresent the location of certain Consts involved in coercion post-analysis. Let me put that in concrete terms. In my working copy of the patch, I have made some more changes to the core system (mostly reverting things that turned out to be unnecessary), but I have also made the following change: *** a/src/backend/parser/parse_coerce.c --- b/src/backend/parser/parse_coerce.c *************** coerce_type(ParseState *pstate, Node *no *** 280,293 **** newcon->constlen = typeLen(targetType); newcon->constbyval = typeByVal(targetType); newcon->constisnull = con->constisnull; ! /* Use the leftmost of the constant's and coercion's locations */ ! if (location < 0) ! newcon->location = con->location; ! else if (con->location >= 0 && con->location < location) ! newcon->location = con->location; ! else ! newcon->location = location; ! /* * Set up to point at the constant's text if the input routine throws * an error. --- 280,286 ---- newcon->constlen = typeLen(targetType); newcon->constbyval = typeByVal(targetType); newcon->constisnull = con->constisnull; ! newcon->location = con->location; /* * Set up to point at the constant's text if the input routinethrows * an error. ********************* This does not appear to have any user-visible effect on caret position for all variations in coercion syntax, while giving me everything that I need. I had assumed that we were relying on things being this way, but apparently this is not the case. The system is correctly blaming the coercion token when it finds the coercion is at fault, and the const token when it finds the Const node at fault, just as it did before. So this looks like a case of removing what amounts to dead code. > Especially not when the only reason it's needed is that Peter is > doing the fingerprinting at what is IMO the wrong place anyway. > If he were working on the raw grammar output it wouldn't matter > what parse_coerce chooses to do afterwards. Well, I believe that your reason for preferring to do it at that stage was that we could not capture all of the system's "normalisation smarts", like the fact that the omission of noise words isn't a differentiator, so we might as well not have any. This was because much of it - like the recognition of the equivalence of explicit joins and queries with join conditions in the where clause - occurs within the planner. We can't have it all, so we might as well not have any. My solution here is that we be sufficiently vague about the behaviour of normalisation that the user has no reasonable basis to count on that kind of more advanced reduction occurring. I did very seriously consider hashing the raw parse tree, but I have several practical reasons for not doing so. Whatever way you look at it, hashing there is going to result in more code, that is more ugly. There is no uniform parent node that I can tag with a query_id. There has to be more modifications to the core system so that queryId value is carried around more places and persists for longer. The fact that I'd actually be hashing different structs at different times (that tree is accessed through a Node pointer) would necessitate lots of redundant code that operated on each of the very similar structs in an analogous way. The fact is that waiting until after parse analysis has plenty of things to recommend it, and yes, the fact that we already have working code with extensive regression tests is one of them. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
pgsql-hackers by date: