Re: Index only scan sometimes switches to sequential scan for small amount of rows - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: Index only scan sometimes switches to sequential scan for small amount of rows
Date
Msg-id 5512ADD2.7080402@2ndquadrant.com
Whole thread Raw
In response to Index only scan sometimes switches to sequential scan for small amount of rows  (Feike Steenbergen <feikesteenbergen@gmail.com>)
Responses Re: Index only scan sometimes switches to sequential scan for small amount of rows
Re: Index only scan sometimes switches to sequential scan for small amount of rows
List pgsql-performance
On 25.3.2015 13:04, Feike Steenbergen wrote:
...
> When analyzing pg_stats we have sometimes have the following: (Note:
> 'NOT_YET_PRINTED' has not been found during this analyze, these are
> real values)
>
>  attname                | status
>  inherited              | f
>  null_frac              | 0
>  avg_width              | 4
>  n_distinct             | 3
>  most_common_vals       | {PRINTED}
>  most_common_freqs      | {0.996567}
>  histogram_bounds       | {PREPARED,ERROR}
>  correlation            | 0.980644
>
> A question about this specific entry, which some of you may be able to
> shed some light on:
>
> most_common_vals contains only 1 entry, why is this? I would expect to
> see 3 entries, as it has n_distinct=3

To be included in the MCV list, the value has to actually appear in the
random sample at least twice, IIRC. If the values are very rare (e.g. if
you only have such 10 rows out of 3.5M), that may not happen.

You may try increasing the statistics target for this column, which
should make the sample larger and stats more detailed (max is 10000,
which should use sample ~3M rows, i.e. almost the whole table).

> When looking at
> http://www.postgresql.org/docs/current/static/row-estimation-examples.html
> we can see that an estimate > 5000 is what is to be expected for these
> statistics:
>
> # select ( (1 - 0.996567)/2 * 3500000 )::int;
>  int4
> ------
>  6008
> (1 row)
>
> But why does it not record the frequency of 'PREPARED' and 'ERROR'
> in most_common_*?

Can you post results for this query?

SELECT stats, COUNT(*) FROM print_list group by 1

I'd like to know how frequent the other values are.

>
> Our current strategies in mitigating this problem is decreasing the
> autovacuum_*_scale_factor for this specific table, therefore
> triggering more analyses and vacuums.

I'm not sure this is a good solution. The problem is elsewhere, IMHO.

> This is helping somewhat, as if the problem occurs it often solved
> automatically if autoanalyze analyzes this table, it is analyzed
> many times an hour currently.
>
> We can also increase the 'Stats target' for this table, which will
> cause the statistics to contain information about 'NOT_YET_PRINTED'
> more often, but even then, it may not find any of these records, as
> they sometimes do not exist.

This is a better solution, IMHO.

>
> Could you help us to find a strategy to troubleshoot this issue
> further?

You might also make the index scans cheaper, so that the switch to
sequential scan happens later (when more rows are estimated). Try to
decreasing random_page_cost from 4 (default) to 1.5 or something like that.

It may hurt other queries, though, depending on the dataset size etc.

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-performance by date:

Previous
From: Feike Steenbergen
Date:
Subject: Index only scan sometimes switches to sequential scan for small amount of rows
Next
From: Feike Steenbergen
Date:
Subject: Re: Index only scan sometimes switches to sequential scan for small amount of rows