n_distinct way off, but following a pattern. - Mailing list pgsql-performance
From | Nick Fankhauser |
---|---|
Subject | n_distinct way off, but following a pattern. |
Date | |
Msg-id | NEBBLAAHGLEEPCGOBHDGEEFLJHAA.nickf@ontko.com Whole thread Raw |
Responses |
Re: n_distinct way off, but following a pattern.
|
List | pgsql-performance |
Hi- I'm seeing estimates for n_distinct that are way off for a large table (8,700,000 rows). They get better by setting the stats target higher, but are still off by a factor of 10 with the stats set to 1000. I've noticed and reported a similar pattern before on another table. Because this follows the same very consistent pattern, I figured it was worth reporting again. This looks more like a bug than randomness. If the poor result was simply due to having a small sample to work from, the estimates should be all over the map, but these are consistently low, and vary in almost exact inverse proportion to the stats target: run 1: run2: run3: n_distinct estimate, statistics = 10: 3168 3187 3212 n_distinct estimate, statistics = 100: 23828 24059 23615 n_distinct estimate, statistics = 1000: 194690 194516 194081 Actual distinct values: 3340724 Or to put it another way, if you were to take the estimate from analyze, divide by the stats target and multiply by 10000, the result would be pretty close to exact. (Within a factor of 2, which ought to be plenty close for planning purposes.) I'm running version 7.3.2 Any thoughts from folks familiar with this part of the source code? Regards, -Nick PS- Here's a log of the session that I got this from. alpha=# select count(distinct actor_id) from actor_case_assignment; -[ RECORD 1 ]-- count | 3340724 alpha=# analyze; ANALYZE alpha=# SELECT * FROM pg_stats alpha-# WHERE tablename='actor_case_assignment' AND attname='actor_id'; -[ RECORD 1 ]-----+------------------------------------------------------------------- ---------------------------------------------------------------------------- ---------------- schemaname | public tablename | actor_case_assignment attname | actor_id null_frac | 0 avg_width | 16 n_distinct | 3168 most_common_vals | {18105XS,18115XS,18106XS,18113JD02,18115JD02,18106J27,18113XS,18113A10656,18 115LST,18108XS} most_common_freqs | {0.0206667,0.0206667,0.0196667,0.019,0.0176667,0.0173333,0.0163333,0.015,0.0 14,0.0136667} histogram_bounds | {18067A000-07P,18067PD397SC1574-1,18105LBPD,18106A2119-49,18106PD399IF845-1, 18108A03068-20,18108LECS207,18108PTW03737278-2,18111A19788-77,18115A50420,18 115XC} correlation | 0.876795 alpha=# alpha=# alter table actor_case_assignment alter column actor_id set statistics 100; ALTER TABLE alpha=# analyze actor_case_assignment; ANALYZE alpha=# SELECT * FROM pg_stats alpha-# WHERE tablename='actor_case_assignment' AND attname='actor_id'; -[ RECORD 1 ] <Header snipped> schemaname | public tablename | actor_case_assignment attname | actor_id null_frac | 0 avg_width | 17 n_distinct | 23828 most_common_vals | {18115XS,18113JD02,18106XS,1.... <Rest of values snipped> alpha=# alter table actor_case_assignment alter column actor_id set statistics 1000; ALTER TABLE alpha=# analyze actor_case_assignment; ANALYZE alpha=# SELECT * FROM pg_stats alpha-# WHERE tablename='actor_case_assignment' AND attname='actor_id'; -[ RECORD 1 ]----- <Header snipped> schemaname | public tablename | actor_case_assignment attname | actor_id null_frac | 0 avg_width | 16 n_distinct | 194690 most_common_vals | {18106XS,18115XS,18115... <Rest of values snipped> alpha=# \x Expanded display is off. alpha=# alter table actor_case_assignment alter column actor_id set statistics 10; ALTER TABLE alpha=# analyze actor_case_assignment; ANALYZE alpha=# select n_distinct from pg_stats where tablename='actor_case_assignment and attname='actor_id'; alpha'# '; ERROR: parser: parse error at or near "actor_id" at character 85 alpha=# select n_distinct from pg_stats where tablename='actor_case_assignment' and attname='actor_id'; n_distinct ------------ 3187 (1 row) alpha=# alter table actor_case_assignment alter column actor_id set statistics 10; ALTER TABLE alpha=# analyze actor_case_assignment; ANALYZE alpha=# select n_distinct from pg_stats where tablename='actor_case_assignment' and attname='actor_id'; n_distinct ------------ 3212 (1 row) alpha=# analyze actor_case_assignment; ANALYZE alpha=# alter table actor_case_assignment alter column actor_id set statistics 100; ALTER TABLE alpha=# analyze actor_case_assignment; ANALYZE alpha=# select n_distinct from pg_stats where tablename='actor_case_assignment' and attname='actor_id'; n_distinct ------------ 24059 (1 row) alpha=# analyze actor_case_assignment; ANALYZE alpha=# alter table actor_case_assignment alter column actor_id set statistics 100; ALTER TABLE alpha=# select n_distinct from pg_stats where tablename='actor_case_assignment' and attname='actor_id'; n_distinct ------------ 23615 (1 row) alpha=# alter table actor_case_assignment alter column actor_id set statistics 1000; ALTER TABLE alpha=# analyze actor_case_assignment; ANALYZE alpha=# select n_distinct from pg_stats where tablename='actor_case_assignment' and attname='actor_id'; n_distinct ------------ 194516 (1 row) alpha=# analyze actor_case_assignment; ANALYZE alpha=# select n_distinct from pg_stats where tablename='actor_case_assignment' and attname='actor_id'; n_distinct ------------ 194081 (1 row) alpha=#
pgsql-performance by date: