Re: Columns correlation and adaptive query optimization - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: Columns correlation and adaptive query optimization |
Date | |
Msg-id | 318656e2-4d5d-2865-f1ba-a5c705327ac8@postgrespro.ru Whole thread Raw |
In response to | Re: Columns correlation and adaptive query optimization (Yugo NAGATA <nagata@sraoss.co.jp>) |
Responses |
Re: Columns correlation and adaptive query optimization
|
List | pgsql-hackers |
Hello,
Thank you for review.
My answers are inside.
Rebased version is attached.
I agree with you that this are two almost unrelated changes, although without clausesel patch additional statistic can not improve query planning.
But I already have too many patches at commitfest.
May be it will be enough to spit this patch into two?
Done.
Thank you for noticing the problem, fixed.
Changed.
Fixed.
Right now auto_explain create statistic without explicit specification of statistic kind.
According to the documentation all supported statistics kinds should be created in this case:
Thank you for review.
My answers are inside.
On 21.01.2021 15:30, Yugo NAGATA wrote:
Hello, On Thu, 26 Mar 2020 18:49:51 +0300 Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:Attached please find new version of the patch with more comments and descriptions added.Adaptive query optimization is very interesting feature for me, so I looked into this patch. Here are some comments and questions. (1) This patch needs rebase because clauselist_selectivity was modified to improve estimation of OR clauses.
Rebased version is attached.
(2) If I understand correctly, your proposal consists of the following two features. 1. Add a feature to auto_explain that creates an extended statistic automatically if an error on estimated rows number is large. 2. Improve rows number estimation of join results by considering functional dependencies between vars in the join qual if the qual has more than one clauses, and also functional dependencies between a var in the join qual and vars in quals of the inner/outer relation. As you said, these two parts are independent each other, so one feature will work even if we don't assume the other. I wonder it would be better to split the patch again, and register them to commitfest separately.
I agree with you that this are two almost unrelated changes, although without clausesel patch additional statistic can not improve query planning.
But I already have too many patches at commitfest.
May be it will be enough to spit this patch into two?
(3) + DefineCustomBoolVariable("auto_explain.suggest_only", + "Do not create statistic but just record in WAL suggested create statistics statement.", + NULL, + &auto_explain_suggest_on To imply that this parameter is involving to add_statistics_threshold, it seems better for me to use more related name like add_statistics_suggest_only. Also, additional documentations for new parameters are required.
Done.
(4) + /* + * Prevent concurrent access to extended statistic table + */ + stat_rel = table_open(StatisticExtRelationId, AccessExclusiveLock); + slot = table_slot_create(stat_rel, NULL); + scan = table_beginscan_catalog(stat_rel, 2, entry); (snip) + table_close(stat_rel, AccessExclusiveLock); + } When I tested the auto_explain part, I got the following WARNING. WARNING: buffer refcount leak: [097] (rel=base/12879/3381, blockNum=0, flags=0x83000000, refcount=1 2) WARNING: buffer refcount leak: [097] (rel=base/12879/3381, blockNum=0, flags=0x83000000, refcount=1 1) WARNING: relcache reference leak: relation "pg_statistic_ext" not closed WARNING: TupleDesc reference leak: TupleDesc 0x7fa439266338 (12029,-1) still referenced WARNING: Snapshot reference leak: Snapshot 0x55c332c10418 still referenced To suppress this, I think we need table_endscan(scan) and ExecDropSingleTupleTableSlot(slot) before finishing this function.
Thank you for noticing the problem, fixed.
(6) + elog(NOTICE, "Auto_explain suggestion: CREATE STATISTICS %s %s FROM %s", stat_name, create_stat_stmt, rel_name); We should use ereport instead of elog for log messages.
Changed.
Makes sense.(7) + double dep = find_var_dependency(root, innerRelid, var, clauses_attnums); + if (dep != 0.0) + { + s1 *= dep + (1 - dep) * s2; + continue; + } I found the following comment of clauselist_apply_dependencies(): * we actually combine selectivities using the formula * * P(a,b) = f * Min(P(a), P(b)) + (1-f) * P(a) * P(b) so, is it not necessary using the same formula in this patch? That is, s1 *= dep + (1-dep) * s2 (if s1 <= s2) s1 *= dep * (s2/s1) + (1-dep) * s2 (otherwise) .
(8) +/* + * Try to find dependency between variables. + * var: varaibles which dependencies are considered + * join_vars: list of variables used in other clauses + * This functions return strongest dependency and some subset of variables from the same relation + * or 0.0 if no dependency was found. + */ +static double +var_depends_on(PlannerInfo *root, Var* var, List* clause_vars) +{ The comment mentions join_vars but the actual argument name is clauses_vars, so it needs unification.
Fixed.
(9) Currently, it only consider functional dependencies statistics. Can we also consider multivariate MCV list, and is it useful?
Right now auto_explain create statistic without explicit specification of statistic kind.
According to the documentation all supported statistics kinds should be created in this case:
statistics_kind
A statistics kind to be computed in this statistics object. Currently supported kinds are
ndistinct
, which enables n-distinct statistics, anddependencies
, which enables functional dependency statistics. If this clause is omitted, all supported statistics kinds are included in the statistics object. For more information, see Section 14.2.2 and Section 68.2.
Sorry, I do not have answer for this question.(10) To achieve adaptive query optimization (AQO) in PostgreSQL, this patch proposes to use auto_explain for getting feedback from actual results. So, could auto_explain be a infrastructure of AQO in future? Or, do you have any plan or idea to make built-in infrastructure for AQO?
I just patched auto_explain extension because it is doing half of the required work (analyze expensive statements).
It can be certainly moved to separate extension. In this case it will party duplicate existed functionality and
settings of auto_explain (like statement execution time threshold). I am not sure that it is good.
But from the other side, this my patch makes auto_explain extension to do some unexpected work...
Actually task of adaptive query optimization is much bigger.
We have separate AQO extension which tries to use machine learning to correctly adjust estimations.
This my patch is much simpler and use existed mechanism (extended statistics) to improve estimations.-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
pgsql-hackers by date: