Home > mailing lists

Re: Odd statistics behaviour in 7.2 - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Odd statistics behaviour in 7.2
Date	February 13, 2002 11:58:03
Msg-id	29027.1013617315@sss.pgh.pa.us Whole thread Raw
In response to	Odd statistics behaviour in 7.2 (Gordon Runkle <gar@integrated-dynamics.com>)
Responses	Re: Odd statistics behaviour in 7.2
List	pgsql-hackers

Tree view

Gordon Runkle <gar@integrated-dynamics.com> writes:
> I have a table with 1,066,673 rows.  The column I'm interested in has
> this distribution of values:

>  tdnr_ct |   ct   
> ---------+--------
>       16 |      1
>        4 |      1
>        3 |     58
>        2 |  68904
>        1 | 928171

> This means that 'ct' records have 'tdnr_ct' duplicate values.

I'm confused.  You mean that there is one value that appears 16 times,
one that appears 4 times, etc etc, and 928171 values that appear only
once?

> Under v7.2, it only sometimes does.  I've looked at the statistics,
> thanks to what I learned from Tom and Marc's discussion, and I see that
> sometimes when I VACUUM ANALYZE the table, 'n_distinct' for this column
> gets a value of '-1' (desireable), and other times a value such as 59483
> or something.

This seems quite bizarre; given those stats it's hard to see how you
could get anything but -1 or close to it, even with a very unlucky
statistical sampling.  Don't suppose you'd want to trace through the
ANALYZE code and find out why it's computing a bad value?

Alternatively, if you could send me a dump of just the ct column,
I could try to reproduce the behavior here.  (CREATE TABLE foo AS
SELECT ct FROM yourtab and then pg_dump -t foo should do it.)
        regards, tom lane

pgsql-hackers by date:

From: Tom Lane
Date: 13 February 2002, 11:22:06
Subject: Re: pg_stats explained ... ?

From: "Ross J. Reedstrom"
Date: 13 February 2002, 12:10:00
Subject: Re: benchmarking postgres

Re: Odd statistics behaviour in 7.2 - Mailing list pgsql-hackers

Previous

Next