Home > mailing lists

Re: proposal : cross-column stats - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: proposal : cross-column stats
Date	December 13, 2010 15:22:48
Msg-id	4D06727E.7030009@fuzzy.cz Whole thread Raw
In response to	Re: proposal : cross-column stats (Joshua Tolley <eggyknap@gmail.com>)
Responses	Re: proposal : cross-column stats
List	pgsql-hackers

Tree view

Dne 13.12.2010 18:59, Joshua Tolley napsal(a):
> On Sun, Dec 12, 2010 at 07:10:44PM -0800, Nathan Boley wrote:
>> Another quick note: I think that storing the full contingency table is
>> wasteful since the marginals are already stored in the single column
>> statistics. Look at copulas [2] ( FWIW I think that Josh Tolley was
>> looking at this a couple years back ).
> 
> Josh Tolley still looks at it occasionally, though time hasn't permitted any
> sort of significant work for quite some time. The multicolstat branch on my
> git.postgresql.org repository will create an empirical copula each
> multi-column index, and stick it in pg_statistic. It doesn't yet do anything
> useful with that information, nor am I convinced it's remotely bug-free. In a
> brief PGCon discussion with Tom a while back, it was suggested a good place
> for the planner to use these stats would be clausesel.c, which is responsible
> for handling code such as "...WHERE foo > 4 AND foo > 5".

Well, that's good news ;-)

I've done a bit of research today, and I've found some quite interesting
papers on this topic (probably, I did not have time to read them, in
most cases I've read just the title and abstract).

[1] Selectivity estimators for multidimensional range queries over real   attributes
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.122.914
   This seems like a very good starting point. AFAIK it precisely   describes what data need to be collected, how to do
themath etc.

[2] Selectivity Estimation Without the Attribute Value Independence   Assumption
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.105.8126
   This obviously deals with the independence problem. Haven't   investigated it further, but it seems worth to read.

[3] On Analyzing the Cost of Queries with Multi-Attribute Restrictions   and Sort Operations (A Cost Function for
UniformlyPartitioned   UB-Trees)   http://mistral.in.tum.de/results/publications/MB00_ideas.pdf

   Describes something called UB-Tree, and shows how it may be used to   do estimates. Might be interesting as an
alternativeto the   traditional histograms.

   There are more details about UB-Trees at
   http://mistral.in.tum.de/results/publications/

[4] http://www.dbnet.ece.ntua.gr/~nikos/edith/qopt_bibl/
   A rather nice collection of papers related to estimation (including   some of the papers listed above).

Hm, I planned to finally read the "Understanding MySQL Internals" over
the Xmas ... that obviously won't happen.

regards
Tomas

pgsql-hackers by date:

From: David Fetter
Date: 13 December 2010, 15:19:17
Subject: Re: CommitFest wrap-up

From: Robert Haas
Date: 13 December 2010, 15:49:43
Subject: Re: CommitFest wrap-up

Re: proposal : cross-column stats - Mailing list pgsql-hackers

Previous

Next