Thread: Simple JOIN on heavy table not using expected index

Simple JOIN on heavy table not using expected index

From

kimaidou

Date:

09 February 2024, 14:14:31

Hi all,

I have performance issue for a pretty simple request in a PostgreSQL server 14.10

* Request

SELECT p.id_parcelle
FROM private.parcelles p
WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68'))

;

* Table definition (extract)

                                  Table « private.parcelles »
        Colonne        |            Type             | Collationnement | NULL-able | Par défaut 
-----------------------+-----------------------------+-----------------+-----------+------------
 id                    | integer                     |                 |           | 
 geom                  | geometry(MultiPolygon,2154) |                 |           | 
 fid                   | bigint                      |                 |           | 
 id_parcelle           | character varying(14)       |                 | not null  | 
 insee_col             | character varying(5)        |                 |           | 
 nom_col               | character varying           |                 |           | 
 section               | character varying(2)        |                 |           | 
 numero                | character varying(4)        |                 |           | 
 contenance            | bigint                      |                 |           | 
 epci_nom              | character varying           |                 |           | 
 dep                   | character varying           |                 |           | 
 dep_nom               | character varying           |                 |           | 
Index :
    "foncier_pkey" PRIMARY KEY, btree (id_parcelle)
    "idx_extension_eol_parcelle" btree (extension_eol)
    "idx_lien_hubspot_parcelels" btree (lien_hubspot)
    "idx_reg_parcelle" btree (reg)
    "idx_type_ener_parcelles" btree (type_d_energie)
    "parcelles_dep_idx" btree (dep)
    "parcelles_id_parcelle_idx" btree (id_parcelle)
    "parcelles_inseecol_idx" btree (insee_col)
    "parcelles_object_id_idx" btree (hs_object_id)
    "parcelles_pipelinestage_idx" btree (hs_pipeline_stage)
    "parcelles_synctohubspot_idx" btree (synctohubspot)
    "sidx_foncier_geom" gist (geom)

-> First comment, the primary Key should be on id (integer) and not on id_parcelle (a text code)

* Statistiques

lizmap_synerdev_carto=# SELECT * FROM pg_stat_all_tables WHERE schemaname = 'private' AND relname = 'parcelles';
-[ RECORD 1 ]-------+------------------------------
relid               | 2364725
schemaname          | private
relname             | parcelles
seq_scan            | 1891
seq_tup_read        | 552509679
idx_scan            | 19144304
idx_tup_fetch       | 38926631
n_tup_ins           | 3
n_tup_upd           | 3073182
n_tup_del           | 0
n_tup_hot_upd       | 2996591
n_live_tup          | 92876681
n_dead_tup          | 1836882
n_mod_since_analyze | 769313
n_ins_since_vacuum  | 3
last_vacuum         | 
last_autovacuum     | 
last_analyze        | 2024-02-08 15:33:14.008286+01
last_autoanalyze    | 
vacuum_count        | 0
autovacuum_count    | 0
analyze_count       | 1
autoanalyze_count   | 0

* Plan :

https://explain.dalibo.com/plan/47391e3g8c2589cf#plan/node/2

It seems PostgreSQL does not use the index parcelles_dep_idx on "dep" (text field), even if the corresponding number of lines for this WHERE clause is a smal subset of the entire data:

approx 6M against 80M in total

Thanks in advance for any hint regarding this cumbersome query.

Regards

Kimaidou

Re: Simple JOIN on heavy table not using expected index

From

Burçin Yazıcı

Date:

09 February 2024, 14:19:54

can you share result for:

explain analyze SELECT p.id_parcelle
FROM private.parcelles p
WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68'))

;

On Fri, 9 Feb 2024 at 17:14, kimaidou <kimaidou@gmail.com> wrote:

Hi all,

I have performance issue for a pretty simple request in a PostgreSQL server 14.10

* Request

SELECT p.id_parcelle
FROM private.parcelles p
WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68'))
;

* Table definition (extract)

Table « private.parcelles »
Colonne | Type | Collationnement | NULL-able | Par défaut
-----------------------+-----------------------------+-----------------+-----------+------------
id | integer | | |
geom | geometry(MultiPolygon,2154) | | |
fid | bigint | | |
id_parcelle | character varying(14) | | not null |
insee_col | character varying(5) | | |
nom_col | character varying | | |
section | character varying(2) | | |
numero | character varying(4) | | |
contenance | bigint | | |
epci_nom | character varying | | |
dep | character varying | | |
dep_nom | character varying | | |
Index :
"foncier_pkey" PRIMARY KEY, btree (id_parcelle)
"idx_extension_eol_parcelle" btree (extension_eol)
"idx_lien_hubspot_parcelels" btree (lien_hubspot)
"idx_reg_parcelle" btree (reg)
"idx_type_ener_parcelles" btree (type_d_energie)
"parcelles_dep_idx" btree (dep)
"parcelles_id_parcelle_idx" btree (id_parcelle)
"parcelles_inseecol_idx" btree (insee_col)
"parcelles_object_id_idx" btree (hs_object_id)
"parcelles_pipelinestage_idx" btree (hs_pipeline_stage)
"parcelles_synctohubspot_idx" btree (synctohubspot)
"sidx_foncier_geom" gist (geom)

-> First comment, the primary Key should be on id (integer) and not on id_parcelle (a text code)

* Statistiques

lizmap_synerdev_carto=# SELECT * FROM pg_stat_all_tables WHERE schemaname = 'private' AND relname = 'parcelles';
-[ RECORD 1 ]-------+------------------------------
relid | 2364725
schemaname | private
relname | parcelles
seq_scan | 1891
seq_tup_read | 552509679
idx_scan | 19144304
idx_tup_fetch | 38926631
n_tup_ins | 3
n_tup_upd | 3073182
n_tup_del | 0
n_tup_hot_upd | 2996591
n_live_tup | 92876681
n_dead_tup | 1836882
n_mod_since_analyze | 769313
n_ins_since_vacuum | 3
last_vacuum |
last_autovacuum |
last_analyze | 2024-02-08 15:33:14.008286+01
last_autoanalyze |
vacuum_count | 0
autovacuum_count | 0
analyze_count | 1
autoanalyze_count | 0

* Plan :
https://explain.dalibo.com/plan/47391e3g8c2589cf#plan/node/2

It seems PostgreSQL does not use the index parcelles_dep_idx on "dep" (text field), even if the corresponding number of lines for this WHERE clause is a smal subset of the entire data:
approx 6M against 80M in total

Thanks in advance for any hint regarding this cumbersome query.

Regards
Kimaidou

https://www.burcinyazici.com

Re: Simple JOIN on heavy table not using expected index

From

kimaidou

Date:

09 February 2024, 15:07:13

The query plan is visible here :

https://explain.dalibo.com/plan/50a719h92hde6950

Regards

Le vendredi 9 février 2024, Burçin Yazıcı <burcinyazici@gmail.com> a écrit :

can you share result for:

explain analyze SELECT p.id_parcelle
FROM private.parcelles p
WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68'))
;

On Fri, 9 Feb 2024 at 17:14, kimaidou <kimaidou@gmail.com> wrote:
Hi all,

I have performance issue for a pretty simple request in a PostgreSQL server 14.10

* Request

SELECT p.id_parcelle
FROM private.parcelles p
WHERE (p.dep IN ( '08', '10', '54', '57', '67', '68'))
;

* Table definition (extract)

Table « private.parcelles »
Colonne | Type | Collationnement | NULL-able | Par défaut
-----------------------+-----------------------------+-----------------+-----------+------------
id | integer | | |
geom | geometry(MultiPolygon,2154) | | |
fid | bigint | | |
id_parcelle | character varying(14) | | not null |
insee_col | character varying(5) | | |
nom_col | character varying | | |
section | character varying(2) | | |
numero | character varying(4) | | |
contenance | bigint | | |
epci_nom | character varying | | |
dep | character varying | | |
dep_nom | character varying | | |
Index :
"foncier_pkey" PRIMARY KEY, btree (id_parcelle)
"idx_extension_eol_parcelle" btree (extension_eol)
"idx_lien_hubspot_parcelels" btree (lien_hubspot)
"idx_reg_parcelle" btree (reg)
"idx_type_ener_parcelles" btree (type_d_energie)
"parcelles_dep_idx" btree (dep)
"parcelles_id_parcelle_idx" btree (id_parcelle)
"parcelles_inseecol_idx" btree (insee_col)
"parcelles_object_id_idx" btree (hs_object_id)
"parcelles_pipelinestage_idx" btree (hs_pipeline_stage)
"parcelles_synctohubspot_idx" btree (synctohubspot)
"sidx_foncier_geom" gist (geom)

-> First comment, the primary Key should be on id (integer) and not on id_parcelle (a text code)

* Statistiques

lizmap_synerdev_carto=# SELECT * FROM pg_stat_all_tables WHERE schemaname = 'private' AND relname = 'parcelles';
-[ RECORD 1 ]-------+------------------------------
relid | 2364725
schemaname | private
relname | parcelles
seq_scan | 1891
seq_tup_read | 552509679
idx_scan | 19144304
idx_tup_fetch | 38926631
n_tup_ins | 3
n_tup_upd | 3073182
n_tup_del | 0
n_tup_hot_upd | 2996591
n_live_tup | 92876681
n_dead_tup | 1836882
n_mod_since_analyze | 769313
n_ins_since_vacuum | 3
last_vacuum |
last_autovacuum |
last_analyze | 2024-02-08 15:33:14.008286+01
last_autoanalyze |
vacuum_count | 0
autovacuum_count | 0
analyze_count | 1
autoanalyze_count | 0

* Plan :
https://explain.dalibo.com/plan/47391e3g8c2589cf#plan/node/2

It seems PostgreSQL does not use the index parcelles_dep_idx on "dep" (text field), even if the corresponding number of lines for this WHERE clause is a smal subset of the entire data:
approx 6M against 80M in total

Thanks in advance for any hint regarding this cumbersome query.

Regards
Kimaidou

--
https://www.burcinyazici.com

Re: Simple JOIN on heavy table not using expected index

From

Tom Lane

Date:

09 February 2024, 15:12:55

kimaidou <kimaidou@gmail.com> writes:
> It seems PostgreSQL does not use the index parcelles_dep_idx on "dep" (text
> field), even if the corresponding number of lines for this WHERE clause is
> a smal subset of the entire data:
> approx 6M against 80M in total

6M out of 80M rows is not a "small subset".  Typically I'd expect
the planner to use an index-based scan for up to 1 or 2 percent of
the table.  Beyond that, you're going to be touching most pages
of the table anyway.

You can try reducing random_page_cost to favor indexscans, but
you might not find that the query gets any faster.

            regards, tom lane

Re: Simple JOIN on heavy table not using expected index

From

kimaidou

Date:

09 February 2024, 15:44:00

Tom, thanks a lot for your suggestion.

Indeed, setting random_page_cost to 2 instead of 4 improves this query a lot !

See the new plan :

https://explain.dalibo.com/plan/h924389529e11244

30 seconds VS 17 minutes before

Cheers

Michaël

Le vendredi 9 février 2024, Tom Lane <tgl@sss.pgh.pa.us> a écrit :

kimaidou <kimaidou@gmail.com> writes:
> It seems PostgreSQL does not use the index parcelles_dep_idx on "dep" (text
> field), even if the corresponding number of lines for this WHERE clause is
> a smal subset of the entire data:
> approx 6M against 80M in total

6M out of 80M rows is not a "small subset". Typically I'd expect
the planner to use an index-based scan for up to 1 or 2 percent of
the table. Beyond that, you're going to be touching most pages
of the table anyway.

You can try reducing random_page_cost to favor indexscans, but
you might not find that the query gets any faster.

regards, tom lane