Thread: BUG #15227: Planner often ignores covering indexes (with includeclause)
BUG #15227: Planner often ignores covering indexes (with includeclause)
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 15227 Logged by: Hans Buschmann Email address: buschmann@nidsa.net PostgreSQL version: 11beta1 Operating system: Fedora 28 64bit Description: I am trying to prepare our indexing schema for the use of covering indexes. Currently our Production runs on Windows 64bit PG 10.4 I have a newly set up test environment under FEDORA 28 64bit, PG 11beta self compiled PostgreSQL 11beta1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.1.1 20180502 (Red Hat 8.1.1-1), 64-bit The goal is to achieve more index only scans with the most common columns for our OLTP environment. A copy of the production Data has been imported and analyzed. The normal query performance is comparable to PG 10.4 I encountered different problem cases where specially changed indexes with include clause where not used: 1. Primary key I changed the primary key on table projects from Indexes: "projects_active_pkey" PRIMARY KEY, btree (id_pr) "projects_active_pr_season_pr_cli_code_pr_name_key" UNIQUE CONSTRAINT, btree (pr_season, pr_cli_code, pr_name) to Indexes: "projects_pkeyp" PRIMARY KEY, btree (pr_season, id_pr) INCLUDE (pr_cli_code, pr_fac_code, pr_name, pr_style, pr_photo_default, pr_last) "projects_active_pr_season_pr_cli_code_pr_name_key" UNIQUE CONSTRAINT, btree (pr_season, pr_cli_code, pr_name) INVALID and disabled the other index with update pg_index set indisvalid = false where indexrelid = 'projects_active_pr_season_pr_cli_code_pr_name_key'::regclass; Then I got a plan change from Index Scan using projects_active_pr_season_pr_cli_code_pr_name_key on projects (cost=0.28..106.74 rows=1,990 width=53) (actual time=0.007..0.596 rows=1,990 loops=1) Index Cond: (pr_season = 26) to Seq Scan on public.projects (cost=0.00..114.80 rows=1,990 width=53) (actual time=0.036..0.451 rows=1,990 loops=1) Output: projects.id_pr, projects.pr_last, projects.pr_style, projects.pr_photo_default, projects.pr_cli_code, projects.pr_season Filter: (projects.pr_season = 26) Rows Removed by Filter: 2154 the table projects is joined with LEFT JOIN ONLY projects ON id_pr=am_id_pr and pr_season=of_season the of_season/pr_season in this query is literal constant 26 When the other index (projects_active_pr_season_pr_cli_code_pr_name_key) is active, it is choosen over the changed primary key for index scan. The changed primary key is never used. 2. Choosing between comparable indexes On another table (models) I created an extra index: from (10.4) Indexes: "models_active_pkey" PRIMARY KEY, btree (id_am) "models_active_am_season_am_id_pr_am_style_ref_am_clis_sub_c_key" UNIQUE CONSTRAINT, btree (am_season, am_id_pr, am_style_ref, am_clis_sub_code) "models_active_am_cancel_am_conf_date_idx" btree (am_cancel, am_conf_date) to (11beta1) Indexes: "models_active_pkey" PRIMARY KEY, btree (id_am) "models_active_am_season_am_id_pr_am_style_ref_am_clis_sub_c_key" UNIQUE CONSTRAINT, btree (am_season, am_id_pr, am_style_ref, am_clis_sub_code) INVALID "ukp_models_season_id" UNIQUE, btree (am_season, id_am) INCLUDE (am_fac_code, am_id_pr, am_clis_sub_code, am_fac_id_cu, am_our_id_cu, am_style_ref) "models_active_am_cancel_am_conf_date_idx" btree (am_cancel, am_conf_date) when the second index (models_active_am_season_am_id_pr_am_style_ref_am_clis_sub_c_key) is not invalidated, it is choosen over the new third index ukp_models_season_id This gives the same plan as in 10.4 Index Scan using models_active_am_season_am_id_pr_am_style_ref_am_clis_sub_c_key on public.models (cost=0.28..73.01 rows=990 width=32) (actual time=0.006..0.222 rows=990 loops=1) When the second index models_active_am_season_am_id_pr_am_style_ref_am_clis_sub_c_key is disabled, The plan changes to the Index Only scan using the new Index as intended: Index Only Scan using ukp_models_season_id on public.models (cost=0.28..74.01 rows=990 width=32) (actual time=0.011..0.270 rows=990 loops=1) Output: models.am_season, models.id_am, models.am_fac_code, models.am_id_pr, models.am_clis_sub_code, models.am_fac_id_cu, models.am_our_id_cu, models.am_style_ref Index Cond: (models.am_season = 26) Heap Fetches: 990 It is not clear which indexes get prioritized and if a possible index_only_scan is preferrable considered with the included extra columns. 3. Correct usage of a covering index when a partial index is changed After adding the third index to the table clients (iotp_recent_cli_codigo) Indexes: "clients_pkey" PRIMARY KEY, btree (cli_codigo) "clients_cli_nombre_key" UNIQUE CONSTRAINT, btree (cli_nombre) "iotp_recent_cli_codigo" UNIQUE, btree (cli_codigo) INCLUDE (cli_id_off, cli_id_ctry, cli_id_usr_responsible, cli_nombre, cli_short_name, cli_group) WHERE cli_recent_act IS NOT NULL "iot_cli_id_off_codigo_nombre_group" btree (cli_id_off, cli_codigo, cli_group, cli_nombre) "iotp_clients_nombre_rlv_codigo_responsible" btree (cli_nombre, cli_id_off, cli_codigo, cli_id_usr_responsible) WHERE cli_id_rlv <= 2 the plan changes correctly to Index Only Scan using iotp_recent_cli_codigo on public.clients (cost=0.14..13.57 rows=98 width=16) (actual time=0.006..0.035 rows=98 loops=1) Output: clients.cli_id_off, clients.cli_group, clients.cli_id_usr_responsible, clients.cli_codigo Heap Fetches: 98 So partial index is no obstacle The query is quite complex (refresh materialized view for caching), so the complete data definitions and Explain analyze plans are not shown here. I have tried to isolate the two problematic cases I encountered so far. Please inform me, when I have missed something Thanks Hans Buschmann
=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes: > I encountered different problem cases where specially changed indexes with > include clause where not used: > ... > Then I got a plan change from > Index Scan using projects_active_pr_season_pr_cli_code_pr_name_key on > projects (cost=0.28..106.74 rows=1,990 width=53) (actual time=0.007..0.596 > rows=1,990 loops=1) > Index Cond: (pr_season = 26) > to > Seq Scan on public.projects (cost=0.00..114.80 rows=1,990 width=53) (actual > time=0.036..0.451 rows=1,990 loops=1) > Output: projects.id_pr, projects.pr_last, projects.pr_style, > projects.pr_photo_default, projects.pr_cli_code, projects.pr_season > Filter: (projects.pr_season = 26) > Rows Removed by Filter: 2154 While it's certainly possible that there are costing bugs in the new covering-index code, this example doesn't seem to prove that. The row counts show that this query is selecting close to 50% of the table, which is a situation in which an indexscan is usually a loser compared to seqscan-and-filter anyway. Indeed, the seqscan is *faster* than the indexscan on the non-PK index according to your results above (0.451 ms versus 0.596 ms). The covering index would be substantially bigger than the non-PK index, hence even slower to scan, so I think the planner made the right choice. This conclusion might change if you had an index-only scan, but probably not by much, especially since the table is so small. (The mere fact that the index is covering doesn't guarantee an IOS; you also need a table that is mostly all-visible. I speculate that maybe you didn't vacuum the test table, or modified a lot of it since the last vacuum.) In general, I'm suspicious of the idea of putting the entire table into a covering index as you've done here. The covering index will almost certainly be significantly larger than the table itself --- remember the old rule of thumb that b-trees tend to have about 1/3rd empty space. So a query that fetches much or all of the table is still going to be better off with a seqscan, as that will be less I/O and it'll use more-sequential disk accesses. Creating an index like this also disables HOT updates altogether, which may be a significant penalty depending on what your update patterns are. > When the other index (projects_active_pr_season_pr_cli_code_pr_name_key) is > active, it is choosen over the changed primary key for index scan. Probably because it's smaller. Again, this suggests that you're not in a situation where IOS is possible, else perhaps the covering index would have an advantage from that. > 2. Choosing between comparable indexes > This gives the same plan as in 10.4 > Index Scan using > models_active_am_season_am_id_pr_am_style_ref_am_clis_sub_c_key on > public.models (cost=0.28..73.01 rows=990 width=32) (actual time=0.006..0.222 > rows=990 loops=1) > When the second index > models_active_am_season_am_id_pr_am_style_ref_am_clis_sub_c_key is > disabled, > The plan changes to the Index Only scan using the new Index as intended: > Index Only Scan using ukp_models_season_id on public.models > (cost=0.28..74.01 rows=990 width=32) (actual time=0.011..0.270 rows=990 > loops=1) > Output: models.am_season, models.id_am, models.am_fac_code, > models.am_id_pr, models.am_clis_sub_code, models.am_fac_id_cu, > models.am_our_id_cu, models.am_style_ref > Index Cond: (models.am_season = 26) > Heap Fetches: 990 This example isn't exactly proving that the planner did the wrong thing, either. The estimated costs and actual times are close enough together that I'd freely concede that maybe it was luck that the planner preferred the in-fact-faster plan; but it did. regards, tom lane