Re: Odd statistics behaviour in 7.2 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Odd statistics behaviour in 7.2
Date
Msg-id 29027.1013617315@sss.pgh.pa.us
Whole thread Raw
In response to Odd statistics behaviour in 7.2  (Gordon Runkle <gar@integrated-dynamics.com>)
Responses Re: Odd statistics behaviour in 7.2
List pgsql-hackers
Gordon Runkle <gar@integrated-dynamics.com> writes:
> I have a table with 1,066,673 rows.  The column I'm interested in has
> this distribution of values:

>  tdnr_ct |   ct   
> ---------+--------
>       16 |      1
>        4 |      1
>        3 |     58
>        2 |  68904
>        1 | 928171

> This means that 'ct' records have 'tdnr_ct' duplicate values.

I'm confused.  You mean that there is one value that appears 16 times,
one that appears 4 times, etc etc, and 928171 values that appear only
once?

> Under v7.2, it only sometimes does.  I've looked at the statistics,
> thanks to what I learned from Tom and Marc's discussion, and I see that
> sometimes when I VACUUM ANALYZE the table, 'n_distinct' for this column
> gets a value of '-1' (desireable), and other times a value such as 59483
> or something.

This seems quite bizarre; given those stats it's hard to see how you
could get anything but -1 or close to it, even with a very unlucky
statistical sampling.  Don't suppose you'd want to trace through the
ANALYZE code and find out why it's computing a bad value?

Alternatively, if you could send me a dump of just the ct column,
I could try to reproduce the behavior here.  (CREATE TABLE foo AS
SELECT ct FROM yourtab and then pg_dump -t foo should do it.)
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_stats explained ... ?
Next
From: "Ross J. Reedstrom"
Date:
Subject: Re: benchmarking postgres