Thread: Simple JOIN on heavy table not using expected index
Hi all,
I have performance issue for a pretty simple request in a PostgreSQL server 14.10
* Request
SELECT p.id_parcelle
FROM private.parcelles p
WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68'))
;FROM private.parcelles p
WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68'))
* Table definition (extract)
Table « private.parcelles »
Colonne | Type | Collationnement | NULL-able | Par défaut
-----------------------+-----------------------------+-----------------+-----------+------------
id | integer | | |
geom | geometry(MultiPolygon,2154) | | |
fid | bigint | | |
id_parcelle | character varying(14) | | not null |
insee_col | character varying(5) | | |
nom_col | character varying | | |
section | character varying(2) | | |
numero | character varying(4) | | |
contenance | bigint | | |
epci_nom | character varying | | |
dep | character varying | | |
dep_nom | character varying | | |
Index :
"foncier_pkey" PRIMARY KEY, btree (id_parcelle)
"idx_extension_eol_parcelle" btree (extension_eol)
"idx_lien_hubspot_parcelels" btree (lien_hubspot)
"idx_reg_parcelle" btree (reg)
"idx_type_ener_parcelles" btree (type_d_energie)
"parcelles_dep_idx" btree (dep)
"parcelles_id_parcelle_idx" btree (id_parcelle)
"parcelles_inseecol_idx" btree (insee_col)
"parcelles_object_id_idx" btree (hs_object_id)
"parcelles_pipelinestage_idx" btree (hs_pipeline_stage)
"parcelles_synctohubspot_idx" btree (synctohubspot)
"sidx_foncier_geom" gist (geom)
Colonne | Type | Collationnement | NULL-able | Par défaut
-----------------------+-----------------------------+-----------------+-----------+------------
id | integer | | |
geom | geometry(MultiPolygon,2154) | | |
fid | bigint | | |
id_parcelle | character varying(14) | | not null |
insee_col | character varying(5) | | |
nom_col | character varying | | |
section | character varying(2) | | |
numero | character varying(4) | | |
contenance | bigint | | |
epci_nom | character varying | | |
dep | character varying | | |
dep_nom | character varying | | |
Index :
"foncier_pkey" PRIMARY KEY, btree (id_parcelle)
"idx_extension_eol_parcelle" btree (extension_eol)
"idx_lien_hubspot_parcelels" btree (lien_hubspot)
"idx_reg_parcelle" btree (reg)
"idx_type_ener_parcelles" btree (type_d_energie)
"parcelles_dep_idx" btree (dep)
"parcelles_id_parcelle_idx" btree (id_parcelle)
"parcelles_inseecol_idx" btree (insee_col)
"parcelles_object_id_idx" btree (hs_object_id)
"parcelles_pipelinestage_idx" btree (hs_pipeline_stage)
"parcelles_synctohubspot_idx" btree (synctohubspot)
"sidx_foncier_geom" gist (geom)
-> First comment, the primary Key should be on id (integer) and not on id_parcelle (a text code)
* Statistiques
lizmap_synerdev_carto=# SELECT * FROM pg_stat_all_tables WHERE schemaname = 'private' AND relname = 'parcelles';
-[ RECORD 1 ]-------+------------------------------
relid | 2364725
schemaname | private
relname | parcelles
seq_scan | 1891
seq_tup_read | 552509679
idx_scan | 19144304
idx_tup_fetch | 38926631
n_tup_ins | 3
n_tup_upd | 3073182
n_tup_del | 0
n_tup_hot_upd | 2996591
n_live_tup | 92876681
n_dead_tup | 1836882
n_mod_since_analyze | 769313
n_ins_since_vacuum | 3
last_vacuum |
last_autovacuum |
last_analyze | 2024-02-08 15:33:14.008286+01
last_autoanalyze |
vacuum_count | 0
autovacuum_count | 0
analyze_count | 1
autoanalyze_count | 0
-[ RECORD 1 ]-------+------------------------------
relid | 2364725
schemaname | private
relname | parcelles
seq_scan | 1891
seq_tup_read | 552509679
idx_scan | 19144304
idx_tup_fetch | 38926631
n_tup_ins | 3
n_tup_upd | 3073182
n_tup_del | 0
n_tup_hot_upd | 2996591
n_live_tup | 92876681
n_dead_tup | 1836882
n_mod_since_analyze | 769313
n_ins_since_vacuum | 3
last_vacuum |
last_autovacuum |
last_analyze | 2024-02-08 15:33:14.008286+01
last_autoanalyze |
vacuum_count | 0
autovacuum_count | 0
analyze_count | 1
autoanalyze_count | 0
* Plan :
It seems PostgreSQL does not use the index parcelles_dep_idx on "dep" (text field), even if the corresponding number of lines for this WHERE clause is a smal subset of the entire data:
approx 6M against 80M in total
Thanks in advance for any hint regarding this cumbersome query.
Regards
Kimaidou
can you share result for:
explain analyze SELECT p.id_parcelle
FROM private.parcelles p
WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68'));
explain analyze SELECT p.id_parcelle
FROM private.parcelles p
WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68'))
On Fri, 9 Feb 2024 at 17:14, kimaidou <kimaidou@gmail.com> wrote:
Hi all,I have performance issue for a pretty simple request in a PostgreSQL server 14.10* RequestSELECT p.id_parcelle;
FROM private.parcelles p
WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68'))* Table definition (extract)Table « private.parcelles »
Colonne | Type | Collationnement | NULL-able | Par défaut
-----------------------+-----------------------------+-----------------+-----------+------------
id | integer | | |
geom | geometry(MultiPolygon,2154) | | |
fid | bigint | | |
id_parcelle | character varying(14) | | not null |
insee_col | character varying(5) | | |
nom_col | character varying | | |
section | character varying(2) | | |
numero | character varying(4) | | |
contenance | bigint | | |
epci_nom | character varying | | |
dep | character varying | | |
dep_nom | character varying | | |
Index :
"foncier_pkey" PRIMARY KEY, btree (id_parcelle)
"idx_extension_eol_parcelle" btree (extension_eol)
"idx_lien_hubspot_parcelels" btree (lien_hubspot)
"idx_reg_parcelle" btree (reg)
"idx_type_ener_parcelles" btree (type_d_energie)
"parcelles_dep_idx" btree (dep)
"parcelles_id_parcelle_idx" btree (id_parcelle)
"parcelles_inseecol_idx" btree (insee_col)
"parcelles_object_id_idx" btree (hs_object_id)
"parcelles_pipelinestage_idx" btree (hs_pipeline_stage)
"parcelles_synctohubspot_idx" btree (synctohubspot)
"sidx_foncier_geom" gist (geom)-> First comment, the primary Key should be on id (integer) and not on id_parcelle (a text code)* Statistiqueslizmap_synerdev_carto=# SELECT * FROM pg_stat_all_tables WHERE schemaname = 'private' AND relname = 'parcelles';
-[ RECORD 1 ]-------+------------------------------
relid | 2364725
schemaname | private
relname | parcelles
seq_scan | 1891
seq_tup_read | 552509679
idx_scan | 19144304
idx_tup_fetch | 38926631
n_tup_ins | 3
n_tup_upd | 3073182
n_tup_del | 0
n_tup_hot_upd | 2996591
n_live_tup | 92876681
n_dead_tup | 1836882
n_mod_since_analyze | 769313
n_ins_since_vacuum | 3
last_vacuum |
last_autovacuum |
last_analyze | 2024-02-08 15:33:14.008286+01
last_autoanalyze |
vacuum_count | 0
autovacuum_count | 0
analyze_count | 1
autoanalyze_count | 0* Plan :It seems PostgreSQL does not use the index parcelles_dep_idx on "dep" (text field), even if the corresponding number of lines for this WHERE clause is a smal subset of the entire data:approx 6M against 80M in totalThanks in advance for any hint regarding this cumbersome query.RegardsKimaidou
The query plan is visible here :
Regards
Le vendredi 9 février 2024, Burçin Yazıcı <burcinyazici@gmail.com> a écrit :
can you share result for:;
explain analyze SELECT p.id_parcelle
FROM private.parcelles p
WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68'))On Fri, 9 Feb 2024 at 17:14, kimaidou <kimaidou@gmail.com> wrote:Hi all,I have performance issue for a pretty simple request in a PostgreSQL server 14.10* RequestSELECT p.id_parcelle;
FROM private.parcelles p
WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68'))* Table definition (extract)Table « private.parcelles »
Colonne | Type | Collationnement | NULL-able | Par défaut
-----------------------+-----------------------------+------ -----------+-----------+------ ------
id | integer | | |
geom | geometry(MultiPolygon,2154) | | |
fid | bigint | | |
id_parcelle | character varying(14) | | not null |
insee_col | character varying(5) | | |
nom_col | character varying | | |
section | character varying(2) | | |
numero | character varying(4) | | |
contenance | bigint | | |
epci_nom | character varying | | |
dep | character varying | | |
dep_nom | character varying | | |
Index :
"foncier_pkey" PRIMARY KEY, btree (id_parcelle)
"idx_extension_eol_parcelle" btree (extension_eol)
"idx_lien_hubspot_parcelels" btree (lien_hubspot)
"idx_reg_parcelle" btree (reg)
"idx_type_ener_parcelles" btree (type_d_energie)
"parcelles_dep_idx" btree (dep)
"parcelles_id_parcelle_idx" btree (id_parcelle)
"parcelles_inseecol_idx" btree (insee_col)
"parcelles_object_id_idx" btree (hs_object_id)
"parcelles_pipelinestage_idx" btree (hs_pipeline_stage)
"parcelles_synctohubspot_idx" btree (synctohubspot)
"sidx_foncier_geom" gist (geom)-> First comment, the primary Key should be on id (integer) and not on id_parcelle (a text code)* Statistiqueslizmap_synerdev_carto=# SELECT * FROM pg_stat_all_tables WHERE schemaname = 'private' AND relname = 'parcelles';
-[ RECORD 1 ]-------+------------------------------
relid | 2364725
schemaname | private
relname | parcelles
seq_scan | 1891
seq_tup_read | 552509679
idx_scan | 19144304
idx_tup_fetch | 38926631
n_tup_ins | 3
n_tup_upd | 3073182
n_tup_del | 0
n_tup_hot_upd | 2996591
n_live_tup | 92876681
n_dead_tup | 1836882
n_mod_since_analyze | 769313
n_ins_since_vacuum | 3
last_vacuum |
last_autovacuum |
last_analyze | 2024-02-08 15:33:14.008286+01
last_autoanalyze |
vacuum_count | 0
autovacuum_count | 0
analyze_count | 1
autoanalyze_count | 0* Plan :It seems PostgreSQL does not use the index parcelles_dep_idx on "dep" (text field), even if the corresponding number of lines for this WHERE clause is a smal subset of the entire data:approx 6M against 80M in totalThanks in advance for any hint regarding this cumbersome query.RegardsKimaidou--
kimaidou <kimaidou@gmail.com> writes: > It seems PostgreSQL does not use the index parcelles_dep_idx on "dep" (text > field), even if the corresponding number of lines for this WHERE clause is > a smal subset of the entire data: > approx 6M against 80M in total 6M out of 80M rows is not a "small subset". Typically I'd expect the planner to use an index-based scan for up to 1 or 2 percent of the table. Beyond that, you're going to be touching most pages of the table anyway. You can try reducing random_page_cost to favor indexscans, but you might not find that the query gets any faster. regards, tom lane
Tom, thanks a lot for your suggestion.
Indeed, setting random_page_cost to 2 instead of 4 improves this query a lot !
See the new plan :
30 seconds VS 17 minutes before
Cheers
Michaël
Le vendredi 9 février 2024, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
kimaidou <kimaidou@gmail.com> writes:
> It seems PostgreSQL does not use the index parcelles_dep_idx on "dep" (text
> field), even if the corresponding number of lines for this WHERE clause is
> a smal subset of the entire data:
> approx 6M against 80M in total
6M out of 80M rows is not a "small subset". Typically I'd expect
the planner to use an index-based scan for up to 1 or 2 percent of
the table. Beyond that, you're going to be touching most pages
of the table anyway.
You can try reducing random_page_cost to favor indexscans, but
you might not find that the query gets any faster.
regards, tom lane