Thread: BUG #15227: Planner often ignores covering indexes (with includeclause)

BUG #15227: Planner often ignores covering indexes (with includeclause)

From

PG Bug reporting form

Date:

03 June 2018, 21:48:03

The following bug has been logged on the website:

Bug reference:      15227
Logged by:          Hans Buschmann
Email address:      buschmann@nidsa.net
PostgreSQL version: 11beta1
Operating system:   Fedora 28 64bit
Description:

I am trying to prepare our indexing schema for the use of covering
indexes.
Currently our Production runs on Windows 64bit PG 10.4

I have a newly set up test environment under FEDORA 28 64bit, PG 11beta self
compiled

 PostgreSQL 11beta1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.1.1
20180502 (Red Hat 8.1.1-1), 64-bit

The goal is to achieve more index only scans with the most common columns
for our OLTP environment.
A copy of the production Data has been imported and analyzed.
The normal query performance is comparable to PG 10.4

I encountered different problem cases where specially changed indexes with
include clause where not used:


1. Primary key

I changed the primary key on table projects from
Indexes:
    "projects_active_pkey" PRIMARY KEY, btree (id_pr)
    "projects_active_pr_season_pr_cli_code_pr_name_key" UNIQUE CONSTRAINT,
btree (pr_season, pr_cli_code, pr_name)
to
Indexes:
    "projects_pkeyp" PRIMARY KEY, btree (pr_season, id_pr) INCLUDE
(pr_cli_code, pr_fac_code, pr_name, pr_style, pr_photo_default, pr_last)
    "projects_active_pr_season_pr_cli_code_pr_name_key" UNIQUE CONSTRAINT,
btree (pr_season, pr_cli_code, pr_name) INVALID
and disabled the other index with 

update pg_index set indisvalid = false where indexrelid =
'projects_active_pr_season_pr_cli_code_pr_name_key'::regclass;

Then I got a plan change from     

Index Scan using projects_active_pr_season_pr_cli_code_pr_name_key on
projects (cost=0.28..106.74 rows=1,990 width=53) (actual time=0.007..0.596
rows=1,990 loops=1)

    Index Cond: (pr_season = 26)
to  
Seq Scan on public.projects (cost=0.00..114.80 rows=1,990 width=53) (actual
time=0.036..0.451 rows=1,990 loops=1)

    Output: projects.id_pr, projects.pr_last, projects.pr_style,
projects.pr_photo_default, projects.pr_cli_code, projects.pr_season
    Filter: (projects.pr_season = 26)
    Rows Removed by Filter: 2154

the table projects is joined with
    LEFT JOIN ONLY projects ON id_pr=am_id_pr and pr_season=of_season
the of_season/pr_season in this query is literal constant 26

When the other index (projects_active_pr_season_pr_cli_code_pr_name_key) is
active, it is choosen over the changed primary key for index scan.

The changed primary key is never used.


2. Choosing between comparable indexes

On another table (models) I created an extra index:
from (10.4)
Indexes:
    "models_active_pkey" PRIMARY KEY, btree (id_am)
    "models_active_am_season_am_id_pr_am_style_ref_am_clis_sub_c_key" UNIQUE
CONSTRAINT, btree (am_season, am_id_pr, am_style_ref, am_clis_sub_code)
    "models_active_am_cancel_am_conf_date_idx" btree (am_cancel,
am_conf_date)
to (11beta1)
Indexes:
    "models_active_pkey" PRIMARY KEY, btree (id_am)
    "models_active_am_season_am_id_pr_am_style_ref_am_clis_sub_c_key" UNIQUE
CONSTRAINT, btree (am_season, am_id_pr, am_style_ref, am_clis_sub_code)
INVALID
    "ukp_models_season_id" UNIQUE, btree (am_season, id_am) INCLUDE
(am_fac_code, am_id_pr, am_clis_sub_code, am_fac_id_cu, am_our_id_cu,
am_style_ref)
    "models_active_am_cancel_am_conf_date_idx" btree (am_cancel,
am_conf_date)

when the second index
(models_active_am_season_am_id_pr_am_style_ref_am_clis_sub_c_key) is not
invalidated, it is choosen over the new third index ukp_models_season_id
This gives the same plan as in 10.4

Index Scan using
models_active_am_season_am_id_pr_am_style_ref_am_clis_sub_c_key on
public.models (cost=0.28..73.01 rows=990 width=32) (actual time=0.006..0.222
rows=990 loops=1) 

When the second index
models_active_am_season_am_id_pr_am_style_ref_am_clis_sub_c_key is
disabled,
The plan changes to the Index Only scan using the new Index as intended:


Index Only Scan using ukp_models_season_id on public.models
(cost=0.28..74.01 rows=990 width=32) (actual time=0.011..0.270 rows=990
loops=1)

    Output: models.am_season, models.id_am, models.am_fac_code,
models.am_id_pr, models.am_clis_sub_code, models.am_fac_id_cu,
models.am_our_id_cu, models.am_style_ref
    Index Cond: (models.am_season = 26)
    Heap Fetches: 990

It is not clear which indexes get prioritized and if a possible
index_only_scan is preferrable considered with the included extra columns.


3. Correct usage of a covering index when a partial index is changed

After adding the third index to the table clients (iotp_recent_cli_codigo)
Indexes:
    "clients_pkey" PRIMARY KEY, btree (cli_codigo)
    "clients_cli_nombre_key" UNIQUE CONSTRAINT, btree (cli_nombre)
    "iotp_recent_cli_codigo" UNIQUE, btree (cli_codigo) INCLUDE (cli_id_off,
cli_id_ctry, cli_id_usr_responsible, cli_nombre, cli_short_name, cli_group)
WHERE cli_recent_act IS NOT NULL
    "iot_cli_id_off_codigo_nombre_group" btree (cli_id_off, cli_codigo,
cli_group, cli_nombre)
    "iotp_clients_nombre_rlv_codigo_responsible" btree (cli_nombre,
cli_id_off, cli_codigo, cli_id_usr_responsible) WHERE cli_id_rlv <= 2

the plan changes correctly to 
 Index Only Scan using iotp_recent_cli_codigo on public.clients
(cost=0.14..13.57 rows=98 width=16) (actual time=0.006..0.035 rows=98
loops=1)

    Output: clients.cli_id_off, clients.cli_group,
clients.cli_id_usr_responsible, clients.cli_codigo
    Heap Fetches: 98

So partial index is no obstacle

The query is quite complex (refresh materialized view for caching), so the
complete data definitions and Explain analyze plans are not shown here.

I have tried to isolate the two problematic cases I encountered so far.

Please inform me, when I have missed something

Thanks

Hans Buschmann

Re: BUG #15227: Planner often ignores covering indexes (with include clause)

From

Tom Lane

Date:

04 June 2018, 00:33:50

=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
> I encountered different problem cases where specially changed indexes with
> include clause where not used:
> ...
> Then I got a plan change from     
> Index Scan using projects_active_pr_season_pr_cli_code_pr_name_key on
> projects (cost=0.28..106.74 rows=1,990 width=53) (actual time=0.007..0.596
> rows=1,990 loops=1)
>     Index Cond: (pr_season = 26)
> to  
> Seq Scan on public.projects (cost=0.00..114.80 rows=1,990 width=53) (actual
> time=0.036..0.451 rows=1,990 loops=1)
>     Output: projects.id_pr, projects.pr_last, projects.pr_style,
> projects.pr_photo_default, projects.pr_cli_code, projects.pr_season
>     Filter: (projects.pr_season = 26)
>     Rows Removed by Filter: 2154

While it's certainly possible that there are costing bugs in the new
covering-index code, this example doesn't seem to prove that.  The row
counts show that this query is selecting close to 50% of the table,
which is a situation in which an indexscan is usually a loser compared
to seqscan-and-filter anyway.  Indeed, the seqscan is *faster* than the
indexscan on the non-PK index according to your results above (0.451 ms
versus 0.596 ms).  The covering index would be substantially bigger than
the non-PK index, hence even slower to scan, so I think the planner made
the right choice.

This conclusion might change if you had an index-only scan, but probably
not by much, especially since the table is so small.  (The mere fact that
the index is covering doesn't guarantee an IOS; you also need a table
that is mostly all-visible.  I speculate that maybe you didn't vacuum
the test table, or modified a lot of it since the last vacuum.)

In general, I'm suspicious of the idea of putting the entire table
into a covering index as you've done here.  The covering index will
almost certainly be significantly larger than the table itself ---
remember the old rule of thumb that b-trees tend to have about 1/3rd
empty space.  So a query that fetches much or all of the table is
still going to be better off with a seqscan, as that will be less I/O
and it'll use more-sequential disk accesses.  Creating an index like
this also disables HOT updates altogether, which may be a significant
penalty depending on what your update patterns are.

> When the other index (projects_active_pr_season_pr_cli_code_pr_name_key) is
> active, it is choosen over the changed primary key for index scan.

Probably because it's smaller.  Again, this suggests that you're not in
a situation where IOS is possible, else perhaps the covering index would
have an advantage from that.

> 2. Choosing between comparable indexes

> This gives the same plan as in 10.4

> Index Scan using
> models_active_am_season_am_id_pr_am_style_ref_am_clis_sub_c_key on
> public.models (cost=0.28..73.01 rows=990 width=32) (actual time=0.006..0.222
> rows=990 loops=1) 

> When the second index
> models_active_am_season_am_id_pr_am_style_ref_am_clis_sub_c_key is
> disabled,
> The plan changes to the Index Only scan using the new Index as intended:

> Index Only Scan using ukp_models_season_id on public.models
> (cost=0.28..74.01 rows=990 width=32) (actual time=0.011..0.270 rows=990
> loops=1)
>     Output: models.am_season, models.id_am, models.am_fac_code,
> models.am_id_pr, models.am_clis_sub_code, models.am_fac_id_cu,
> models.am_our_id_cu, models.am_style_ref
>     Index Cond: (models.am_season = 26)
>     Heap Fetches: 990

This example isn't exactly proving that the planner did the wrong
thing, either.  The estimated costs and actual times are close
enough together that I'd freely concede that maybe it was luck that
the planner preferred the in-fact-faster plan; but it did.

            regards, tom lane