Thread: query plan using partial index expects a much larger number of rows than is possible
query plan using partial index expects a much larger number of rows than is possible
From
"Olivier Poquet"
Date:
Hello, I have a large table (about 13 million rows) full of customer order information. Since most of that information is for ordersthat have already been fulfilled, I have a partial index to help quickly zero in on rows that have not been fulfilled.This works well, but I noticed today when joining with another large table using its primary key that even thoughthe planner was using my partial index, it decided to do a merge join to the second large table instead of the nestedloop I would have expected. Looking at it in more detail, I found that the planner is assuming that I'll get millions of rows back even when I do a simplequery that does an index scan on my partial index: => \d orderitems_committed_unfulfilled Index "public.orderitems_committed_unfulfilled" Column | Type | Key? | Definition --------+--------+------+------------ id | bigint | yes | id btree, for table "public.orderitems", predicate (LEAST(committed, quantity) > fulfilled) => explain (analyze, buffers) select oi.id from orderitems oi where LEAST(oi.committed, oi.quantity) > oi.fulfilled; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Index Only Scan using orderitems_committed_unfulfilled on orderitems oi (cost=0.41..31688.23 rows=2861527 width=8) (actualtime=0.039..2.092 rows=2274 loops=1) Heap Fetches: 2493 Buffers: shared hit=1883 Planning Time: 0.110 ms Execution Time: 2.255 ms (5 rows) So nice and quick, but the planner thought it would get back 2861527 rows instead of the 2274 I actually have. That explainswhy it thought it would make sense to do a merge join with my other large table instead of the nested loop over the2k rows. I would have expected the planner to know that there's no way it'll get back 2 million rows though, given that: => select relname, relpages, reltuples from pg_class where relname = 'orderitems_committed_unfulfilled'; relname | relpages | reltuples ----------------------------------+----------+----------- orderitems_committed_unfulfilled | 3051 | 2112 (1 row) It knows there's only 2k-ish of them from the index. The 2 mil number is the same as what the planner expects if I disableusing indexes and it does a seq scan, so I'm assuming it's just the guess from the column statistics and the planneris not using the size of the partial index. I'm running: => select version(); version ------------------------------------------------------------------------------------------- PostgreSQL 12.4 on x86_64-pc-linux-gnu, compiled by gcc, a 97d579287 p 0be2109a97, 64-bit (1 row) I'm wondering if the behavior that I'm seeing is expected in 12.4, and if so if it changes in a later version or if I shouldfile an enhancement request? Or if it's not expected is there's something I'm doing wrong, or should file a bug? Thanks for your time. -- Olivier Poquet opoquet@plumdev.com
Re: query plan using partial index expects a much larger number of rows than is possible
From
Tom Lane
Date:
"Olivier Poquet" <opoquet@plumdev.com> writes: > Looking at it in more detail, I found that the planner is assuming that I'll get millions of rows back even when I do asimple query that does an index scan on my partial index: We don't look at partial-index predicates when trying to estimate the selectivity of a WHERE clause. It's not clear to me whether that'd be a useful thing to do, or whether it could be shoehorned into the system easily. (One big problem is that while the index size could provide an upper bound, it's not apparent how to combine that knowledge with selectivities of unrelated conditions. Also, it's riskier to extrapolate a current rowcount estimate from stale relpages/reltuples data for an index than it is for a table, because the index is less likely to scale up linearly.) If this particular query is performance-critical, you might consider materializing the condition, that is something like create table orderitems ( ... , committed_unfulfilled bool GENERATED ALWAYS AS (LEAST(committed, quantity) > fulfilled) STORED ); and then your queries and your partial-index predicate must look like "WHERE committed_unfulfilled". Having done this, ANALYZE would gather stats on the values of that column and the WHERE clauses would be estimated accurately. regards, tom lane
Re: query plan using partial index expects a much larger number of rows than is possible
From
"Olivier Poquet"
Date:
Thanks Tom, That makes perfect sense. I'd already gone the route of materializing the condition but I didn't even realize that generated columns was an option(I'd done the same with triggers instead). So thanks a lot of that too! -- Olivier Poquet opoquet@plumdev.com On Wed, Oct 28, 2020, at 7:30 PM, Tom Lane wrote: > "Olivier Poquet" <opoquet@plumdev.com> writes: > > Looking at it in more detail, I found that the planner is assuming that I'll get millions of rows back even when I doa simple query that does an index scan on my partial index: > > We don't look at partial-index predicates when trying to estimate the > selectivity of a WHERE clause. It's not clear to me whether that'd be > a useful thing to do, or whether it could be shoehorned into the system > easily. (One big problem is that while the index size could provide > an upper bound, it's not apparent how to combine that knowledge with > selectivities of unrelated conditions. Also, it's riskier to extrapolate > a current rowcount estimate from stale relpages/reltuples data for an > index than it is for a table, because the index is less likely to scale > up linearly.) > > If this particular query is performance-critical, you might consider > materializing the condition, that is something like > > create table orderitems ( > ... , > committed_unfulfilled bool GENERATED ALWAYS AS > (LEAST(committed, quantity) > fulfilled) STORED > ); > > and then your queries and your partial-index predicate must look > like "WHERE committed_unfulfilled". Having done this, ANALYZE > would gather stats on the values of that column and the WHERE > clauses would be estimated accurately. > > regards, tom lane >
Re: query plan using partial index expects a much larger number of rows than is possible
From
Michael Lewis
Date:
On Wed, Oct 28, 2020 at 5:30 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Olivier Poquet" <opoquet@plumdev.com> writes:
> Looking at it in more detail, I found that the planner is assuming that I'll get millions of rows back even when I do a simple query that does an index scan on my partial index:
We don't look at partial-index predicates when trying to estimate the
selectivity of a WHERE clause. It's not clear to me whether that'd be
a useful thing to do, or whether it could be shoehorned into the system
easily. (One big problem is that while the index size could provide
an upper bound, it's not apparent how to combine that knowledge with
selectivities of unrelated conditions. Also, it's riskier to extrapolate
a current rowcount estimate from stale relpages/reltuples data for an
index than it is for a table, because the index is less likely to scale
up linearly.)
regards, tom lane
Aren't there custom stats created for functional indexes? Would it be feasible to create those for partial indexes as well, maybe only optionally? I assume there may be giant gaps with that notion.