Re: proposal : cross-column stats - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: proposal : cross-column stats |
Date | |
Msg-id | 4D06727E.7030009@fuzzy.cz Whole thread Raw |
In response to | Re: proposal : cross-column stats (Joshua Tolley <eggyknap@gmail.com>) |
Responses |
Re: proposal : cross-column stats
|
List | pgsql-hackers |
Dne 13.12.2010 18:59, Joshua Tolley napsal(a): > On Sun, Dec 12, 2010 at 07:10:44PM -0800, Nathan Boley wrote: >> Another quick note: I think that storing the full contingency table is >> wasteful since the marginals are already stored in the single column >> statistics. Look at copulas [2] ( FWIW I think that Josh Tolley was >> looking at this a couple years back ). > > Josh Tolley still looks at it occasionally, though time hasn't permitted any > sort of significant work for quite some time. The multicolstat branch on my > git.postgresql.org repository will create an empirical copula each > multi-column index, and stick it in pg_statistic. It doesn't yet do anything > useful with that information, nor am I convinced it's remotely bug-free. In a > brief PGCon discussion with Tom a while back, it was suggested a good place > for the planner to use these stats would be clausesel.c, which is responsible > for handling code such as "...WHERE foo > 4 AND foo > 5". Well, that's good news ;-) I've done a bit of research today, and I've found some quite interesting papers on this topic (probably, I did not have time to read them, in most cases I've read just the title and abstract). [1] Selectivity estimators for multidimensional range queries over real attributes http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.122.914 This seems like a very good starting point. AFAIK it precisely describes what data need to be collected, how to do themath etc. [2] Selectivity Estimation Without the Attribute Value Independence Assumption http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.105.8126 This obviously deals with the independence problem. Haven't investigated it further, but it seems worth to read. [3] On Analyzing the Cost of Queries with Multi-Attribute Restrictions and Sort Operations (A Cost Function for UniformlyPartitioned UB-Trees) http://mistral.in.tum.de/results/publications/MB00_ideas.pdf Describes something called UB-Tree, and shows how it may be used to do estimates. Might be interesting as an alternativeto the traditional histograms. There are more details about UB-Trees at http://mistral.in.tum.de/results/publications/ [4] http://www.dbnet.ece.ntua.gr/~nikos/edith/qopt_bibl/ A rather nice collection of papers related to estimation (including some of the papers listed above). Hm, I planned to finally read the "Understanding MySQL Internals" over the Xmas ... that obviously won't happen. regards Tomas
pgsql-hackers by date: