Home > mailing lists

Re: Improving N-Distinct estimation by ANALYZE - Mailing list pgsql-hackers

From	Josh Berkus
Subject	Re: Improving N-Distinct estimation by ANALYZE
Date	January 4, 2006 19:20:18
Msg-id	200601041525.55084.josh@agliodbs.com Whole thread Raw
In response to	Re: Improving N-Distinct estimation by ANALYZE (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Improving N-Distinct estimation by ANALYZE
List	pgsql-hackers

Tree view

Tom,

> In general, estimating n-distinct from a sample is just plain a hard
> problem, and it's probably foolish to suppose we'll ever be able to
> do it robustly.  What we need is to minimize the impact when we get
> it wrong.  

Well, I think it's pretty well proven that to be accurate at all you need 
to be able to sample at least 5%, even if some users choose to sample 
less.   Also I don't think anyone on this list disputes that the current 
algorithm is very inaccurate for large tables.  Or do they?

While I don't think that we can estimate N-distinct completely accurately, 
I do think that we can get within +/- 5x for 80-90% of all cases, instead 
of 40-50% of cases like now.  We can't be perfectly accurate, but we can 
be *more* accurate.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

pgsql-hackers by date:

From: Tom Lane
Date: 04 January 2006, 18:17:17
Subject: Re: Vacuum Blocking A Deleteion - Why?

From: Tom Lane
Date: 04 January 2006, 19:23:02
Subject: back-patching locale environment fix

Re: Improving N-Distinct estimation by ANALYZE - Mailing list pgsql-hackers

Previous

Next