Thread: BUG #15408: Postgresql bad planning with multicolumn gist and searchwith empty results

BUG #15408: Postgresql bad planning with multicolumn gist and searchwith empty results

From

PG Bug reporting form

Date:

27 September 2018, 14:59:23

The following bug has been logged on the website:

Bug reference:      15408
Logged by:          Alex Pires de Camargo
Email address:      acamargo@gmail.com
PostgreSQL version: 10.5
Operating system:   Ubuntu 5.4.0-6ubuntu1~16.04.10
Description:

I'm having an issue with a multicolumn index (integer, geom), where geom are
PostGIS points, and the integer is a point aggregator with business meaning.
I do searches by boxes in the geom column, joining the integer with other
tables. When the box covers an empty region, the planner does an index scan
looking only the secondary column, very inefficiently. I've spent some time
trying to create a scenario with a reproduction of the problem (with less
data), and I was able to do something that sounds the same issue, only with
integers. This setup will need about 2GB of disk.

/*   0<=n1<=10000 */
/*   0<=n2<=1000 || 3000<=n2<=4000 */
/* s1, r1, r2 to increase I/O needs and avoid index only scans */
create table test as select n as n1, (1000*random())::int as n2,
generate_series(1,1000) s1 , random() as r1, random() as r2 from
generate_series(1,10000) as n;
insert into test select n as n1, ((1000*random())::int) + 3000 as n2,
generate_series(1,1000) s1 , random() as r1, random() as r2 from
generate_series(1,10000) as n;
create index ind_test on test using gist (n1,n2);
create table ids as select generate_series(1,5) as n1; -- same problem with
just one row on this table
analyze ids;
alter table test alter column n1 set statistics 10000;  --excluding poor
stats
alter table test alter column n2 set statistics 10000;  --excluding poor
stats
analyze test;
explain analyze select * from test join ids using (n1) where n2 = 1001 and
n1 = 1; --Q1 Outside n2 range, Index cond n1 AND n2 < 1ms
explain analyze select * from test join ids using (n1) where n2 = 999;  --Q2
Inside n2 range, Index cond n1 AND n2 < 1ms
explain analyze select * from test join ids using (n1) where n2 = 1000;
--Q3 Inside n2 range, Index cond n1 AND n2 < 1 ms
explain analyze select * from test join ids using (n1) where n2 = 1001;
--Q4 Outside n2 range, Index cond n2 > 100 ms
explain analyze select * from test join ids using (n1) where n2 = 1002;
--Q5 Outside n2 range, Index cond n2 > 100 ms

Below the result of the explains above:


➤ psql://postgres@[local]:5432/postgres 

#     explain analyze select * from test join ids using (n1) where n2 = 1001
and n1 = 1; --Q1 Outside n2 range, Index cond n1 AND n2 < 1ms
                                                      QUERY PLAN
                                         
----------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.42..9.51 rows=1 width=28) (actual time=0.067..0.067
rows=0 loops=1)
   ->  Index Scan using ind_test on test  (cost=0.42..8.44 rows=1 width=28)
(actual time=0.067..0.067 rows=0 loops=1)
         Index Cond: ((n1 = 1) AND (n2 = 1001))
   ->  Seq Scan on ids  (cost=0.00..1.06 rows=1 width=4) (never executed)
         Filter: (n1 = 1)
 Planning time: 0.404 ms
 Execution time: 0.096 ms
(7 rows)

Time: 0.826 ms

 ➤ psql://postgres@[local]:5432/postgres 

#     explain analyze select * from test join ids using (n1) where n2 = 999;
 --Q2 Inside n2 range, Index cond n1 AND n2 < 1ms
                                                      QUERY PLAN
                                         
----------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.42..43.29 rows=5 width=28) (actual time=0.098..0.367
rows=7 loops=1)
   ->  Seq Scan on ids  (cost=0.00..1.05 rows=5 width=4) (actual
time=0.012..0.014 rows=5 loops=1)
   ->  Index Scan using ind_test on test  (cost=0.42..8.44 rows=1 width=28)
(actual time=0.064..0.068 rows=1 loops=5)
         Index Cond: ((n1 = ids.n1) AND (n2 = 999))
 Planning time: 0.994 ms
 Execution time: 0.407 ms
(6 rows)

Time: 1.713 ms

 ➤ psql://postgres@[local]:5432/postgres 

#     explain analyze select * from test join ids using (n1) where n2 =
1000;  --Q3 Inside n2 range, Index cond n1 AND n2 < 1 ms
                                                      QUERY PLAN
                                         
----------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.42..43.29 rows=3 width=28) (actual time=0.157..0.248
rows=3 loops=1)
   ->  Seq Scan on ids  (cost=0.00..1.05 rows=5 width=4) (actual
time=0.010..0.011 rows=5 loops=1)
   ->  Index Scan using ind_test on test  (cost=0.42..8.44 rows=1 width=28)
(actual time=0.044..0.046 rows=1 loops=5)
         Index Cond: ((n1 = ids.n1) AND (n2 = 1000))
 Planning time: 0.877 ms
 Execution time: 0.277 ms
(6 rows)

Time: 1.440 ms

 ➤ psql://postgres@[local]:5432/postgres 

#     explain analyze select * from test join ids using (n1) where n2 =
1001;  --Q4 Outside n2 range, Index cond n2 > 100 ms
                                                       QUERY PLAN
                                           

------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.42..9.55 rows=1 width=28) (actual time=93.247..93.247
rows=0 loops=1)
   Join Filter: (test.n1 = ids.n1)
   ->  Index Scan using ind_test on test  (cost=0.42..8.44 rows=1 width=28)
(actual time=93.246..93.246 rows=0 loops=1)
         Index Cond: (n2 = 1001)
   ->  Seq Scan on ids  (cost=0.00..1.05 rows=5 width=4) (never executed)
 Planning time: 0.716 ms
 Execution time: 93.280 ms
(7 rows)

Time: 94.242 ms

 ➤ psql://postgres@[local]:5432/postgres 

#     explain analyze select * from test join ids using (n1) where n2 =
1002;  --Q5 Outside n2 range, Index cond n2 > 100 ms
                                                       QUERY PLAN
                                           

------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.42..9.55 rows=1 width=28) (actual time=86.857..86.857
rows=0 loops=1)
   Join Filter: (test.n1 = ids.n1)
   ->  Index Scan using ind_test on test  (cost=0.42..8.44 rows=1 width=28)
(actual time=86.856..86.856 rows=0 loops=1)
         Index Cond: (n2 = 1002)
   ->  Seq Scan on ids  (cost=0.00..1.05 rows=5 width=4) (never executed)
 Planning time: 0.750 ms
 Execution time: 86.885 ms
(7 rows)

Time: 87.955 ms

# select version();
                                                                   version
                                                                

---------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 10.5 (Ubuntu 10.5-1.pgdg16.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit
(1 row)

Time: 2.372 ms
As seen in the explain results, when I search for non-existent values of n2,
the plan changes on Index condition used, giving poor plans. None of this
plan differences appear if the index is btree. It seems something gist
related, and I need to use gist due to PostGIS functions.

A workaround I'm using is creating an additional index for the problematic
queries, but they are causing I/O and storage overhead, due to the huge
amount of data. I've tried two single column indexes, but the performance
penalty was too high compared to the multi-column.

Before opening this bug I've posted this on dba.stackexchange to get some
thoughts, no comments until now:

https://dba.stackexchange.com/questions/218519/postgresql-bad-planning-with-multicolumn-gist-and-search-with-empty-results

Re: BUG #15408: Postgresql bad planning with multicolumn gist and search with empty results

From

Tom Lane

Date:

27 September 2018, 17:03:31

=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes:
> As seen in the explain results, when I search for non-existent values of n2,
> the plan changes on Index condition used, giving poor plans. None of this
> plan differences appear if the index is btree. It seems something gist
> related, and I need to use gist due to PostGIS functions.

I don't have time to dig in the code right now, but my recollection is
that btcostestimate() has fairly detailed modeling of the behavior of
queries that constrain only some columns of an index, eg it understands
that "col1 = constant" is much cheaper to scan than "col2 = constant".
On the other hand, gistcostestimate() has no such modeling and assumes
that a constraint on a lower-order column is worth the same as one on
the first column.

This is partially due to lack of effort put into that function, but I seem
to recall being told that GIST was not as sensitive to column ordering
as btree, too.  Your results indicate otherwise :-(

Depending on what other queries you use, maybe an adequate workaround
would be to switch the two columns of the index.

            regards, tom lane

Re: BUG #15408: Postgresql bad planning with multicolumn gist andsearch with empty results

From

Alex Pires de Camargo

Date:

27 September 2018, 22:54:55

Thanks for the answer.

The order is relevant because I have queries that use only the first column, and would suffer from the same problem if I switch the order...

In order to plan the best workarounds, this has a chance to be fixed in a near future?

Regards, Alex.

On Thu, Sep 27, 2018 at 11:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

PG Bug reporting form <noreply@postgresql.org> writes:
> As seen in the explain results, when I search for non-existent values of n2,
> the plan changes on Index condition used, giving poor plans. None of this
> plan differences appear if the index is btree. It seems something gist
> related, and I need to use gist due to PostGIS functions.

I don't have time to dig in the code right now, but my recollection is
that btcostestimate() has fairly detailed modeling of the behavior of
queries that constrain only some columns of an index, eg it understands
that "col1 = constant" is much cheaper to scan than "col2 = constant".
On the other hand, gistcostestimate() has no such modeling and assumes
that a constraint on a lower-order column is worth the same as one on
the first column.

This is partially due to lack of effort put into that function, but I seem
to recall being told that GIST was not as sensitive to column ordering
as btree, too. Your results indicate otherwise :-(

Depending on what other queries you use, maybe an adequate workaround
would be to switch the two columns of the index.

regards, tom lane

Alex
acamargo@gmail.com
"Por que, no mundo, os maus, tão frequentemente, sobrepujam os bons em influência?
-Pela fraqueza dos bons; Os maus são intrigantes e audaciosos, os bons são tímidos. Quando estes o quiserem, dominarão." -- Livro dos Espíritos, Q932. http://livrodosespiritos.wordpress.com/

Re: BUG #15408: Postgresql bad planning with multicolumn gist and search with empty results

From

Tom Lane

Date:

27 September 2018, 23:15:32

Alex Pires de Camargo <acamargo@gmail.com> writes:
> In order to plan the best workarounds, this has a chance to be fixed in a
> near future?

I wouldn't recommend holding your breath.  Someone would have to get
interested in the problem and design/write/test a better cost model;
and if/when that happens, we'd be unlikely to back-patch such a change
into existing release branches.

            regards, tom lane