pg_stat_statements normalization: re-review - Mailing list pgsql-hackers
From | Daniel Farina |
---|---|
Subject | pg_stat_statements normalization: re-review |
Date | |
Msg-id | CAAZKuFYZ37ox+v+izHhAPhKq_xLSYD_rPnK=6-ANVW4dCE_4sA@mail.gmail.com Whole thread Raw |
Responses |
Re: pg_stat_statements normalization: re-review
|
List | pgsql-hackers |
Hello again, I'm reviewing the revised version of pg_stat_statements again. In particular, this new version has done away with the mechanical but invasive change to the lexing that preserved *both* the position and length of constant values. (See http://archives.postgresql.org/message-id/CAEYLb_WGeFCT7MfJ8FXf-CR6BSE6Lbn+O1VX3+OGrc4Bscn4=A@mail.gmail.com) I did the review front-matter again (check against a recent version, make sure it does what it says it'll do, et al..) and did trigger an assertion failure that seems to be an oversight, already reported to Peter. I found that oversight by running the SQLAlchemy Python query-generation toolkit's tests, which are voluminous. The functionality is seemingly mostly unchanged. What I'm going to do here is focus on the back-end source changes that are not part of the contrib. The size of the changes are much reduced, but their semantics are also somewhat more complex...so, here goes. Conclusion first: * The small changes to hashing are probably not strictly required, unless collisions are known to get terrible. * The hook to analyze is pretty straight-forward and seem like the other hooks * The addition of a field to scribble upon in the Query and Plan nodes is something I'd like to understand better, as these leave me with a bit of disquiet. What follows is in much more detail, and broken up by functionality and file grouping in the backend. src/include/access/hash.h | 4 ++-src/backend/access/hash/hashfunc.c | 21 ++++++++++--------- This adjusts the hash_any procedure to support returning two possible precisions (64 bit and 32 bit) by tacking on a Boolean flag to make the precision selectable. The hash_any operator is never used directly, and instead via macro, and a second macro to support 64-bit precision has been added. It seems a bit wonky to me to use a flag to select this rather than encapsulating the common logic in a procedure and just break this up into three symbols, though. I think the longer precision is used to try to get fewer collisions with not too much slowdown. Although it may meaningfully improve the quality of the hashing for pg_stat_statements or living without the extra hashing bits might lead to unusable results (?), per the usual semantics of hashing it is probably not *strictly* necessary. A smidgen of avoidable formatting niggles (> 80 columns) are in this section. src/backend/nodes/copyfuncs.c | 5 ++++src/backend/nodes/equalfuncs.c | 4 +++src/backend/nodes/outfuncs.c | 10 +++++++++src/backend/optimizer/plan/planner.c | 1 +src/backend/nodes/readfuncs.c | 12 +++++++++++src/include/nodes/parsenodes.h | 3 ++src/include/nodes/plannodes.h | 2 + These have all been changed in the usual manner to support one new field, the queryId, on the toplevel Plan and Query nodes. The queryId is 64-bit field copied from queries to plans to tie together plans "to be used by plugins" (quoth the source) as they flow through the system. pg_stat_statements fills them with the 64-bit hash of the query text -- a reasonable functionality to possess, I think, but the abstraction seems iffy: plugins cannot compose well if they all need to use the queryId for different reasons. Somehow I get the feeling that this field can be avoided or its use case can be satisfied in a more satisfying way. Mysteriously, in the Query representation the symbol uses an underscored-notation (query_id) and in the planner it uses a camelcase one, queryId. Overall, the other fields in those structs all use camelcase, so I'd recommend normalizing it. src/backend/parser/analyze.c | 37 ++++++++++++++++++++++++++++++---src/include/parser/analyze.h | 12+++++++++++ These changes implement hooks for the once-non-hookable symbols parse_analyze_varparams and parse_analyze, in seemingly the same way they are implemented in other hooks in the system. These are neatly symmetrical with the planner hook. I only ask if there is a way to have one hook and not two, but I suppose that would be a similar question as "why is are there two ways to symbols to enter into parsing, and not one". -- fdr
pgsql-hackers by date: