Re: Estimating geometric distributions - Mailing list pgsql-hackers

From Stephen Denne
Subject Re: Estimating geometric distributions
Date
Msg-id F0238EBA67824444BC1CB4700960CB4804DD9399@dmpeints002.isotach.com
Whole thread Raw
In response to Estimating geometric distributions  ("Stephen Denne" <Stephen.Denne@datamail.co.nz>)
List pgsql-hackers
I wrote:
> > I have a field whose distribution of frequencies of values is
> > roughly geometric, rather than flat.


> My problem is frequent
> > over-estimation of rows when restricting by this field with
> > values not known at plan time.


> Is there any facility already in PostgreSQL to help me here?
>
> Hopefully an index type that I don't know about yet?
> (Geometric distributions are similar to those found in word
> count distributions).
>
> If not... is there any merit in this idea:
>
> During the analyze process, the geometric mean of sampled
> rows was calculated, and if determined to be significantly
> different from the arithmetic mean, stored in a new stats
> column. When estimating the number of rows that will be
> returned by queries of the form shown above, if there is a
> geometric mean stored, use it instead of the arithmetic mean.

I came up with another (much easier) means of adjusting the planners estimation of how many rows will be returned:

Increase the number of distinct values in the statistics.
For example:
update pg_statistic set stadistinct=2691 where starelid=29323 and staattnum=2;

I can then pick a number of distinct values such that the effective arithmetic mean is equal to what I calculated the
geometricmean to be. 

Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any
attachmentsis confidential and may be subject to legal privilege.  If it is not intended for you please advise by reply
immediately,destroy it and do not copy, disclose or use it in any way. 

__________________________________________________________________ This email has been scanned by the DMZGlobal
BusinessQuality              Electronic Messaging Suite. 
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________




pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Autovacuum vs statement_timeout
Next
From: Tom Lane
Date:
Subject: Re: Autovacuum vs statement_timeout