Re: index prefetching - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: index prefetching |
Date | |
Msg-id | 91f02f2e-a6b5-4dda-8e45-5dd576ac3416@enterprisedb.com Whole thread Raw |
In response to | Re: index prefetching (Konstantin Knizhnik <knizhnik@garret.ru>) |
Responses |
Re: index prefetching
|
List | pgsql-hackers |
On 1/21/24 20:50, Konstantin Knizhnik wrote: > > On 20/01/2024 12:14 am, Tomas Vondra wrote: >> Looks like I was not true, even if it is not index-only scan but index >>> condition involves only index attributes, then heap is not accessed >>> until we find tuple satisfying search condition. >>> Inclusive index case described above >>> (https://commitfest.postgresql.org/46/4352/) is interesting but IMHO >>> exotic case. If keys are actually used in search, then why not to create >>> normal compound index instead? >>> >> Not sure I follow ... >> >> Firstly, I'm not convinced the example addressed by that other patch is >> that exotic. IMHO it's quite possible it's actually quite common, but >> the users do no realize the possible gains. >> >> Also, there are reasons to not want very wide indexes - it has overhead >> associated with maintenance, disk space, etc. I think it's perfectly >> rational to design indexes in a way eliminates most heap fetches >> necessary to evaluate conditions, but does not guarantee IOS (so the >> last heap fetch is still needed). > > We are comparing compound index (a,b) and covering (inclusive) index (a) > include (b) > This indexes have exactly the same width and size and almost the same > maintenance overhead. > > First index has more expensive comparison function (involving two > columns) but I do not think that it can significantly affect > performance and maintenance cost. Also if selectivity of "a" is good > enough, then there is no need to compare "b" > > Why we can prefer covering index to compound index? I see only two good > reasons: > 1. Extra columns type do not have comparison function need for AM. > 2. The extra columns are never used in query predicate. > Or maybe you don't want to include the columns in a UNIQUE constraint? > If you are going to use this columns in query predicates I do not see > much sense in creating inclusive index rather than compound index. > Do you? > But this is also about conditions that can't be translated into index scan keys. Consider this: create table t (a int, b int, c int); insert into t select 1000 * random(), 1000 * random(), 1000 * random() from generate_series(1,1000000) s(i); create index on t (a,b); vacuum analyze t; explain (analyze, buffers) select * from t where a = 10 and mod(b,10) = 1111111; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Index Scan using t_a_b_idx on t (cost=0.42..3670.74 rows=5 width=12) (actual time=4.562..4.564 rows=0 loops=1) Index Cond: (a = 10) Filter: (mod(b, 10) = 1111111) Rows Removed by Filter: 974 Buffers: shared hit=980 Prefetches: blocks=901 Planning Time: 0.304 ms Execution Time: 5.146 ms (8 rows) Notice that this still fetched ~1000 buffers in order to evaluate the filter on "b", because it's complex and can't be transformed into a nice scan key. Or this: explain (analyze, buffers) select a from t where a = 10 and (b+1) < 100 and c < 0; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Index Scan using t_a_b_idx on t (cost=0.42..3673.22 rows=1 width=4) (actual time=4.446..4.448 rows=0 loops=1) Index Cond: (a = 10) Filter: ((c < 0) AND ((b + 1) < 100)) Rows Removed by Filter: 974 Buffers: shared hit=980 Prefetches: blocks=901 Planning Time: 0.313 ms Execution Time: 4.878 ms (8 rows) where it's "broken" by the extra unindexed column. FWIW there are the primary cases I had in mind for this patch. > >> What do you mean by "create normal compound index"? The patch addresses >> a limitation that not every condition can be translated into a proper >> scan key. Even if we improve this, there will always be such conditions. >> The the IOS can evaluate them on index tuple, the regular index scan >> can't do that (currently). >> >> Can you share an example demonstrating the alternative approach? > > May be I missed something. > > This is the example from > https://www.postgresql.org/message-id/flat/N1xaIrU29uk5YxLyW55MGk5fz9s6V2FNtj54JRaVlFbPixD5z8sJ07Ite5CvbWwik8ZvDG07oSTN-usENLVMq2UAcizVTEd5b-o16ZGDIIU=@yamlcoder.me : > > ``` > > And here is the plan with index on (a,b). > > Limit (cost=0.42..4447.90 rows=1 width=12) (actual time=6.883..6.884 > rows=0 loops=1) Output: a, b, d Buffers: shared hit=613 -> > Index Scan using t_a_b_idx on public.t (cost=0.42..4447.90 rows=1 > width=12) (actual time=6.880..6.881 rows=0 loops=1) Output: a, > b, d Index Cond: ((t.a > 1000000) AND (t.b = 4)) > Buffers: shared hit=613 Planning: Buffers: shared hit=41 Planning > Time: 0.314 ms Execution Time: 6.910 ms ``` > > > Isn't it an optimal plan for this query? > > And cite from self reproducible example https://dbfiddle.uk/iehtq44L : > ``` > create unique index t_a_include_b on t(a) include (b); > -- I'd expecd index above to behave the same as index below for this query > --create unique index on t(a,b); > ``` > > I agree that it is natural to expect the same result for both indexes. > So this PR definitely makes sense. > My point is only that compound index (a,b) in this case is more natural > and preferable. > Yes, perhaps. But you may also see it from the other direction - if you already have an index with included columns (for whatever reason), it would be nice to leverage that if possible. And as I mentioned above, it's not always the case that move a column from "included" to a proper key, or stuff like that. Anyway, it seems entirely unrelated to this prefetching thread. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: