Re: Bad n_distinct estimation; hacks suggested? - Mailing list pgsql-performance

From Josh Berkus
Subject Re: Bad n_distinct estimation; hacks suggested?
Date
Msg-id 200504221336.08325.josh@agliodbs.com
Whole thread Raw
In response to Re: Bad n_distinct estimation; hacks suggested?  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Bad n_distinct estimation; hacks suggested?
List pgsql-performance
> > Solaris is unknown to me. Maybe the used random number generator there
> > isn't good enough?
>
> Hmmm.  Good point.  Will have to test on Linux.

Nope:

Linux 2.4.20:

test=# select tablename, attname, n_distinct from pg_stats where tablename =
'web_site_activity_fa';
      tablename       |       attname       | n_distinct
----------------------+---------------------+------------
 web_site_activity_fa | session_id          |     626127

test=# select count(distinct session_id) from web_site_activity_fa;
  count
---------
 3174813
(1 row)

... I think the problem is in our heuristic sampling code.  I'm not the first
person to have this kind of a problem.  Will be following up with tests ...

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Bad n_distinct estimation; hacks suggested?
Next
From: Mischa Sandberg
Date:
Subject: Re: Joel's Performance Issues WAS : Opteron vs Xeon