Thread: BUG #15408: Postgresql bad planning with multicolumn gist and searchwith empty results
BUG #15408: Postgresql bad planning with multicolumn gist and searchwith empty results
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 15408 Logged by: Alex Pires de Camargo Email address: acamargo@gmail.com PostgreSQL version: 10.5 Operating system: Ubuntu 5.4.0-6ubuntu1~16.04.10 Description: I'm having an issue with a multicolumn index (integer, geom), where geom are PostGIS points, and the integer is a point aggregator with business meaning. I do searches by boxes in the geom column, joining the integer with other tables. When the box covers an empty region, the planner does an index scan looking only the secondary column, very inefficiently. I've spent some time trying to create a scenario with a reproduction of the problem (with less data), and I was able to do something that sounds the same issue, only with integers. This setup will need about 2GB of disk. /* 0<=n1<=10000 */ /* 0<=n2<=1000 || 3000<=n2<=4000 */ /* s1, r1, r2 to increase I/O needs and avoid index only scans */ create table test as select n as n1, (1000*random())::int as n2, generate_series(1,1000) s1 , random() as r1, random() as r2 from generate_series(1,10000) as n; insert into test select n as n1, ((1000*random())::int) + 3000 as n2, generate_series(1,1000) s1 , random() as r1, random() as r2 from generate_series(1,10000) as n; create index ind_test on test using gist (n1,n2); create table ids as select generate_series(1,5) as n1; -- same problem with just one row on this table analyze ids; alter table test alter column n1 set statistics 10000; --excluding poor stats alter table test alter column n2 set statistics 10000; --excluding poor stats analyze test; explain analyze select * from test join ids using (n1) where n2 = 1001 and n1 = 1; --Q1 Outside n2 range, Index cond n1 AND n2 < 1ms explain analyze select * from test join ids using (n1) where n2 = 999; --Q2 Inside n2 range, Index cond n1 AND n2 < 1ms explain analyze select * from test join ids using (n1) where n2 = 1000; --Q3 Inside n2 range, Index cond n1 AND n2 < 1 ms explain analyze select * from test join ids using (n1) where n2 = 1001; --Q4 Outside n2 range, Index cond n2 > 100 ms explain analyze select * from test join ids using (n1) where n2 = 1002; --Q5 Outside n2 range, Index cond n2 > 100 ms Below the result of the explains above: ➤ psql://postgres@[local]:5432/postgres # explain analyze select * from test join ids using (n1) where n2 = 1001 and n1 = 1; --Q1 Outside n2 range, Index cond n1 AND n2 < 1ms QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.42..9.51 rows=1 width=28) (actual time=0.067..0.067 rows=0 loops=1) -> Index Scan using ind_test on test (cost=0.42..8.44 rows=1 width=28) (actual time=0.067..0.067 rows=0 loops=1) Index Cond: ((n1 = 1) AND (n2 = 1001)) -> Seq Scan on ids (cost=0.00..1.06 rows=1 width=4) (never executed) Filter: (n1 = 1) Planning time: 0.404 ms Execution time: 0.096 ms (7 rows) Time: 0.826 ms ➤ psql://postgres@[local]:5432/postgres # explain analyze select * from test join ids using (n1) where n2 = 999; --Q2 Inside n2 range, Index cond n1 AND n2 < 1ms QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.42..43.29 rows=5 width=28) (actual time=0.098..0.367 rows=7 loops=1) -> Seq Scan on ids (cost=0.00..1.05 rows=5 width=4) (actual time=0.012..0.014 rows=5 loops=1) -> Index Scan using ind_test on test (cost=0.42..8.44 rows=1 width=28) (actual time=0.064..0.068 rows=1 loops=5) Index Cond: ((n1 = ids.n1) AND (n2 = 999)) Planning time: 0.994 ms Execution time: 0.407 ms (6 rows) Time: 1.713 ms ➤ psql://postgres@[local]:5432/postgres # explain analyze select * from test join ids using (n1) where n2 = 1000; --Q3 Inside n2 range, Index cond n1 AND n2 < 1 ms QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.42..43.29 rows=3 width=28) (actual time=0.157..0.248 rows=3 loops=1) -> Seq Scan on ids (cost=0.00..1.05 rows=5 width=4) (actual time=0.010..0.011 rows=5 loops=1) -> Index Scan using ind_test on test (cost=0.42..8.44 rows=1 width=28) (actual time=0.044..0.046 rows=1 loops=5) Index Cond: ((n1 = ids.n1) AND (n2 = 1000)) Planning time: 0.877 ms Execution time: 0.277 ms (6 rows) Time: 1.440 ms ➤ psql://postgres@[local]:5432/postgres # explain analyze select * from test join ids using (n1) where n2 = 1001; --Q4 Outside n2 range, Index cond n2 > 100 ms QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.42..9.55 rows=1 width=28) (actual time=93.247..93.247 rows=0 loops=1) Join Filter: (test.n1 = ids.n1) -> Index Scan using ind_test on test (cost=0.42..8.44 rows=1 width=28) (actual time=93.246..93.246 rows=0 loops=1) Index Cond: (n2 = 1001) -> Seq Scan on ids (cost=0.00..1.05 rows=5 width=4) (never executed) Planning time: 0.716 ms Execution time: 93.280 ms (7 rows) Time: 94.242 ms ➤ psql://postgres@[local]:5432/postgres # explain analyze select * from test join ids using (n1) where n2 = 1002; --Q5 Outside n2 range, Index cond n2 > 100 ms QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.42..9.55 rows=1 width=28) (actual time=86.857..86.857 rows=0 loops=1) Join Filter: (test.n1 = ids.n1) -> Index Scan using ind_test on test (cost=0.42..8.44 rows=1 width=28) (actual time=86.856..86.856 rows=0 loops=1) Index Cond: (n2 = 1002) -> Seq Scan on ids (cost=0.00..1.05 rows=5 width=4) (never executed) Planning time: 0.750 ms Execution time: 86.885 ms (7 rows) Time: 87.955 ms # select version(); version --------------------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 10.5 (Ubuntu 10.5-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit (1 row) Time: 2.372 ms As seen in the explain results, when I search for non-existent values of n2, the plan changes on Index condition used, giving poor plans. None of this plan differences appear if the index is btree. It seems something gist related, and I need to use gist due to PostGIS functions. A workaround I'm using is creating an additional index for the problematic queries, but they are causing I/O and storage overhead, due to the huge amount of data. I've tried two single column indexes, but the performance penalty was too high compared to the multi-column. Before opening this bug I've posted this on dba.stackexchange to get some thoughts, no comments until now: https://dba.stackexchange.com/questions/218519/postgresql-bad-planning-with-multicolumn-gist-and-search-with-empty-results
Re: BUG #15408: Postgresql bad planning with multicolumn gist and search with empty results
From
Tom Lane
Date:
=?utf-8?q?PG_Bug_reporting_form?= <noreply@postgresql.org> writes: > As seen in the explain results, when I search for non-existent values of n2, > the plan changes on Index condition used, giving poor plans. None of this > plan differences appear if the index is btree. It seems something gist > related, and I need to use gist due to PostGIS functions. I don't have time to dig in the code right now, but my recollection is that btcostestimate() has fairly detailed modeling of the behavior of queries that constrain only some columns of an index, eg it understands that "col1 = constant" is much cheaper to scan than "col2 = constant". On the other hand, gistcostestimate() has no such modeling and assumes that a constraint on a lower-order column is worth the same as one on the first column. This is partially due to lack of effort put into that function, but I seem to recall being told that GIST was not as sensitive to column ordering as btree, too. Your results indicate otherwise :-( Depending on what other queries you use, maybe an adequate workaround would be to switch the two columns of the index. regards, tom lane
Re: BUG #15408: Postgresql bad planning with multicolumn gist andsearch with empty results
From
Alex Pires de Camargo
Date:
Thanks for the answer.
The order is relevant because I have queries that use only the first column, and would suffer from the same problem if I switch the order...
In order to plan the best workarounds, this has a chance to be fixed in a near future?
Regards, Alex.
On Thu, Sep 27, 2018 at 11:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> As seen in the explain results, when I search for non-existent values of n2,
> the plan changes on Index condition used, giving poor plans. None of this
> plan differences appear if the index is btree. It seems something gist
> related, and I need to use gist due to PostGIS functions.
I don't have time to dig in the code right now, but my recollection is
that btcostestimate() has fairly detailed modeling of the behavior of
queries that constrain only some columns of an index, eg it understands
that "col1 = constant" is much cheaper to scan than "col2 = constant".
On the other hand, gistcostestimate() has no such modeling and assumes
that a constraint on a lower-order column is worth the same as one on
the first column.
This is partially due to lack of effort put into that function, but I seem
to recall being told that GIST was not as sensitive to column ordering
as btree, too. Your results indicate otherwise :-(
Depending on what other queries you use, maybe an adequate workaround
would be to switch the two columns of the index.
regards, tom lane
Alex
acamargo@gmail.com
"Por que, no mundo, os maus, tão frequentemente, sobrepujam os bons em influência?
-Pela fraqueza dos bons; Os maus são intrigantes e audaciosos, os bons são tímidos. Quando estes o quiserem, dominarão." -- Livro dos Espíritos, Q932. http://livrodosespiritos.wordpress.com/
acamargo@gmail.com
"Por que, no mundo, os maus, tão frequentemente, sobrepujam os bons em influência?
-Pela fraqueza dos bons; Os maus são intrigantes e audaciosos, os bons são tímidos. Quando estes o quiserem, dominarão." -- Livro dos Espíritos, Q932. http://livrodosespiritos.wordpress.com/
Re: BUG #15408: Postgresql bad planning with multicolumn gist and search with empty results
From
Tom Lane
Date:
Alex Pires de Camargo <acamargo@gmail.com> writes: > In order to plan the best workarounds, this has a chance to be fixed in a > near future? I wouldn't recommend holding your breath. Someone would have to get interested in the problem and design/write/test a better cost model; and if/when that happens, we'd be unlikely to back-patch such a change into existing release branches. regards, tom lane