PATCH: index-only scans with partial indexes - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | PATCH: index-only scans with partial indexes |
Date | |
Msg-id | 55A00F17.1020608@2ndquadrant.com Whole thread Raw |
Responses |
Re: PATCH: index-only scans with partial indexes
Re: PATCH: index-only scans with partial indexes |
List | pgsql-hackers |
Hi, currently partial indexes end up not using index only scans in most cases, because check_index_only() is overly conservative, as explained in this comment: * XXX this is overly conservative for partial indexes, since we will * consider attributes involved in the index predicate as required even * though the predicate won't need to be checked at runtime. (The same * is true for attributes used only in index quals, if we are certain * that the index is not lossy.) However, it would be quite expensive * to determine that accurately at this point, so for now we take the * easy way out. In other words, unless you include columns from the index predicate to the index, the planner will decide index only scans are not possible. Which is a bit unfortunate, because those columns are not needed at runtime, and will only increase the index size (and the main benefit of partial indexes is size reduction). The attached patch fixes this by only considering clauses that are not implied by the index predicate. The effect is simple: create table t as select i as a, i as b from generate_series(1,10000000) s(i); create index tidx_partial on t(b) where a > 1000 and a < 2000; vacuum freeze t; analyze t; explain analyze select count(b) from t where a > 1000 and a < 2000; QUERY PLAN ----------------------------------------------------------------------- Aggregate (cost=39.44..39.45 rows=1 width=4) (actual time=8.350..8.354 rows=1 loops=1) -> Index Scan using tidx_partial on t (cost=0.28..37.98 rows=585 width=4) (actual time=0.034..4.368 rows=999 loops=1) Planning time: 0.197 ms Execution time: 8.441 ms (4 rows) explain analyze select count(b) from t where a > 1000 and a < 2000; QUERY PLAN ----------------------------------------------------------------------- Aggregate (cost=33.44..33.45 rows=1 width=4) (actual time=8.019..8.023 rows=1 loops=1) -> Index Only Scan using tidx_partial on t (cost=0.28..31.98 rows=585 width=4) (actual time=0.036..4.165 rows=999 loops=1) Heap Fetches: 0 Planning time: 0.188 ms Execution time: 8.106 ms (5 rows) I've done a bunch of tests, and I do see small (hardly noticeable) increase in planning time with long list of WHERE clauses, because all those need to be checked against the index predicate. Not sure if this is what's meant by 'quite expensive' in the comment. Moreover, this was more than compensated by the IOS benefits (even with everything in RAM). But maybe it's possible to fix that somehow? For example, we're certainly doing those checks elsewhere when deciding which clauses need to be evaluated at run-time, so maybe we could cache that somehow? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: