Home > mailing lists

Re: Index only scan sometimes switches to sequential scan for small amount of rows - Mailing list pgsql-performance

From	Tomas Vondra
Subject	Re: Index only scan sometimes switches to sequential scan for small amount of rows
Date	March 25, 2015 12:45:31
Msg-id	5512ADD2.7080402@2ndquadrant.com Whole thread Raw
In response to	Index only scan sometimes switches to sequential scan for small amount of rows (Feike Steenbergen <feikesteenbergen@gmail.com>)
Responses	Re: Index only scan sometimes switches to sequential scan for small amount of rows Re: Index only scan sometimes switches to sequential scan for small amount of rows
List	pgsql-performance

Tree view

On 25.3.2015 13:04, Feike Steenbergen wrote:
...
> When analyzing pg_stats we have sometimes have the following: (Note:
> 'NOT_YET_PRINTED' has not been found during this analyze, these are
> real values)
>
>  attname                | status
>  inherited              | f
>  null_frac              | 0
>  avg_width              | 4
>  n_distinct             | 3
>  most_common_vals       | {PRINTED}
>  most_common_freqs      | {0.996567}
>  histogram_bounds       | {PREPARED,ERROR}
>  correlation            | 0.980644
>
> A question about this specific entry, which some of you may be able to
> shed some light on:
>
> most_common_vals contains only 1 entry, why is this? I would expect to
> see 3 entries, as it has n_distinct=3

To be included in the MCV list, the value has to actually appear in the
random sample at least twice, IIRC. If the values are very rare (e.g. if
you only have such 10 rows out of 3.5M), that may not happen.

You may try increasing the statistics target for this column, which
should make the sample larger and stats more detailed (max is 10000,
which should use sample ~3M rows, i.e. almost the whole table).

> When looking at
> http://www.postgresql.org/docs/current/static/row-estimation-examples.html
> we can see that an estimate > 5000 is what is to be expected for these
> statistics:
>
> # select ( (1 - 0.996567)/2 * 3500000 )::int;
>  int4
> ------
>  6008
> (1 row)
>
> But why does it not record the frequency of 'PREPARED' and 'ERROR'
> in most_common_*?

Can you post results for this query?

SELECT stats, COUNT(*) FROM print_list group by 1

I'd like to know how frequent the other values are.

>
> Our current strategies in mitigating this problem is decreasing the
> autovacuum_*_scale_factor for this specific table, therefore
> triggering more analyses and vacuums.

I'm not sure this is a good solution. The problem is elsewhere, IMHO.

> This is helping somewhat, as if the problem occurs it often solved
> automatically if autoanalyze analyzes this table, it is analyzed
> many times an hour currently.
>
> We can also increase the 'Stats target' for this table, which will
> cause the statistics to contain information about 'NOT_YET_PRINTED'
> more often, but even then, it may not find any of these records, as
> they sometimes do not exist.

This is a better solution, IMHO.

>
> Could you help us to find a strategy to troubleshoot this issue
> further?

You might also make the index scans cheaper, so that the switch to
sequential scan happens later (when more rows are estimated). Try to
decreasing random_page_cost from 4 (default) to 1.5 or something like that.

It may hurt other queries, though, depending on the dataset size etc.

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-performance by date:

From: Feike Steenbergen
Date: 25 March 2015, 12:04:29
Subject: Index only scan sometimes switches to sequential scan for small amount of rows

From: Feike Steenbergen
Date: 25 March 2015, 14:22:44
Subject: Re: Index only scan sometimes switches to sequential scan for small amount of rows

Re: Index only scan sometimes switches to sequential scan for small amount of rows - Mailing list pgsql-performance

Previous

Next