Thread: explain (verbose off, normalized) vs query planid
Hello, as described in: http://www.postgresql-archive.org/Re-FEATURE-PATCH-pg-stat-statements-with-plans-v02-td6015488.html I'm wondering about the best way to build a query planid. It seems natural (to me) to calculate a hash value based on the normalized plan text generated by explain (costs off, normalized) or maybe explain (costs off, verbose, normalized) this normalized output can be done by adding a modified get_const_expr in ruleutils.c as proposed in attached file. Would this "explain normalized" feature be interesting for core team and or users ? Regards PAscal get_const_expr_normalized.c <http://www.postgresql-archive.org/file/t348768/get_const_expr_normalized.c> -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
On Sat, May 12, 2018 at 6:07 PM, legrand legrand <legrand_legrand@hotmail.com> wrote: > It seems natural (to me) to calculate a hash value based on the normalized > plan text > generated by > > explain (costs off, normalized) > or maybe > explain (costs off, verbose, normalized) I would think it would be preferable to do it based on the node tree, like pg_stat_statements does for query fingerprinting. But I guess it might depend on what you want to do with it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
This is for tracking planid into pg_stat_statements too. A first try is available here http://www.postgresql-archive.org/Poc-pg-stat-statements-with-planid-td6014027.html reusing pg_stat_plans's "Plan tree Jumbling algorithm" from Peter Geoghegan. Hashing the normalized query plan text in one pass (that is also based on plan tree), compared to that Jumbling method seems simple to maintain (if explain works, planid is available and accurate). Today actual planid jumbling comes from pg9.3, doesn't compile anymore, and I'm not able to verify if declarativ partitionning is properly handled ... Would there be some functional or performances reasons to prefer jumbling to hashing normalized plan text? Regards PAscal -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
On Tue, May 15, 2018 at 3:45 PM, legrand legrand <legrand_legrand@hotmail.com> wrote: > Would there be some functional or performances reasons to prefer jumbling to > hashing normalized plan text? Using the text could produce different query IDs for the same plan if any information is displayed in the text format which can vary for reasons other than a plan change. I don't know if there are any, but what about, for example, the effect of GUCs on how timestamps are displayed? Or, much worse, what if a timer value creeps into the output somehow? Certainly, renaming a table is going to change the output. Even using a different table alias will change the output. Using the text could produce the same query ID for different plans if there's any relevant detail of the plan that's not shown in the text. Basically, I would be nervous about the idea of an EXPLAIN output that's required to reflect all and only the plan details that should be jumbled. The "normalized" option to EXPLAIN which you mentioned upthread doesn't exist today... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, May 15, 2018 at 3:45 PM, legrand legrand > <legrand_legrand@hotmail.com> wrote: >> Would there be some functional or performances reasons to prefer jumbling to >> hashing normalized plan text? > Basically, I would be nervous about the idea of an EXPLAIN output > that's required to reflect all and only the plan details that should > be jumbled. The "normalized" option to EXPLAIN which you mentioned > upthread doesn't exist today... Indeed, and if we did write it, I think it would largely consist of throwing away info that a jumbling mechanism could ignore far more easily. Not to mention that we'd have to expend the cycles to emit a text representation that we didn't actually have use for. It sounds like a complete loser both in terms of coding effort and runtime performance. regards, tom lane