Re: [HACKERS] multivariate statistics (v25) - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [HACKERS] multivariate statistics (v25) |
Date | |
Msg-id | 7dedc58b-f9cf-fe34-ef7f-382f776ce953@2ndquadrant.com Whole thread Raw |
In response to | Re: [HACKERS] multivariate statistics (v24) (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Responses |
Re: [HACKERS] multivariate statistics (v25)
|
List | pgsql-hackers |
On 03/02/2017 07:42 AM, Kyotaro HORIGUCHI wrote: > Hello, > > At Thu, 2 Mar 2017 04:05:34 +0100, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote in <a78ffb17-70e8-a55a-c10c-66ab575e88ed@2ndquadrant.com> >> OK, >> >> attached is v24 of the patch series, addressing most of the reported >> issues and comments (at least I believe so). The main changes are: > > Unfortunately, 0002 conflicts with the current master > (4461a9b). Could you rebase them or tell us the commit where this > patches stand on? > Attached is a rebased patch series, otherwise it's the same as v24. FWIW it was based on 016c990834 from Feb 28, but apparently some recent patch caused a minor conflict. > I only saw the patch files but have some comments. > >> 1) I've mostly abandoned the "multivariate" name in favor of >> "extended", particularly in places referring to stats stored in the >> pg_statistic_ext in general. "Multivariate" is now used only in places >> talking about particular types (e.g. multivariate histograms). >> >> The "extended" name is more widely used for this type of statistics, >> and the assumption is that we'll also add other (non-multivariate) >> types of statistics - e.g. statistics on custom expressions, or some >> for of join statistics. > > In 0005, and > > @@ -184,14 +208,43 @@ clauselist_selectivity(PlannerInfo *root, > * If there are no such stats or not enough attributes, don't waste time > * simply skip to estimation using the plain per-column stats. > */ > + if (has_stats(stats, STATS_TYPE_MCV) && > ... > + /* compute the multivariate stats */ > + s1 *= clauselist_ext_selectivity(root, mvclauses, stat); > ==== > @@ -1080,10 +1136,71 @@ clauselist_ext_selectivity_deps(PlannerInfo *root, Index relid, > } > > /* > + * estimate selectivity of clauses using multivariate statistic > > These comment is left unchanged? or on purpose? 0007 adds very > similar texts. > Hmm, those comments should be probably changed to "extended". >> 2) Catalog pg_mv_statistic was renamed to pg_statistic_ext (and >> pg_mv_stats view renamed to pg_stats_ext). > > FWIW, "extended statistic" would be abbreviated as > "ext_statistic" or "extended_stats". Why have you exchanged the > words? > Because this way it's clear it's a version of pg_statistic, and it will be sorted right next to it. >> 3) The structure of pg_statistic_ext was changed as proposed by >> Alvaro, i.e. the boolean flags were removed and instead we have just a >> single "char[]" column with list of enabled statistics. >> >> 4) I also got rid of the "mv" part in most variable/function/constant >> names, replacing it by "ext" or something similar. Also mvstats.h got >> renamed to stats.h. >> >> 5) Moved the files from src/backend/utils/mvstats to >> backend/statistics. >> >> 6) Fixed the n_choose_k() overflow issues by using the algorithm >> proposed by Dean. Also, use the simple formula for num_combinations(). >> >> 7) I've tweaked data types for a few struct members (in stats.h). I've >> kept most of the uint32 fields at the top level though, because int16 >> might not be large enough for large statistics and the overhead is >> minimal (compared to the space needed e.g. for histogram buckets). > > Some formulated proof or boundary value test cases might be > needed (to prevent future trouble). Or any defined behavior on > overflow of them might be enough. I belive all (or most) of > overflow-able data has such behavior. > That is probably a good idea and I plan to do that. >> The renames/changes were quite widespread, but I've done my best to >> fix all the comments and various other places. >> >> regards > > regards, > regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: