Home > mailing lists
Re: index prefetching - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: index prefetching
Date	January 21, 2024 23:39:14
Msg-id	91f02f2e-a6b5-4dda-8e45-5dd576ac3416@enterprisedb.com Whole thread Raw
In response to	Re: index prefetching (Konstantin Knizhnik <knizhnik@garret.ru>)
Responses	Re: index prefetching
List	pgsql-hackers
Tree view

On 1/21/24 20:50, Konstantin Knizhnik wrote:
> 
> On 20/01/2024 12:14 am, Tomas Vondra wrote:
>> Looks like I was not true, even if it is not index-only scan but index
>>> condition involves only index attributes, then heap is not accessed
>>> until we find tuple satisfying search condition.
>>> Inclusive index case described above
>>> (https://commitfest.postgresql.org/46/4352/) is interesting but IMHO
>>> exotic case. If keys are actually used in search, then why not to create
>>> normal compound index instead?
>>>
>> Not sure I follow ...
>>
>> Firstly, I'm not convinced the example addressed by that other patch is
>> that exotic. IMHO it's quite possible it's actually quite common, but
>> the users do no realize the possible gains.
>>
>> Also, there are reasons to not want very wide indexes - it has overhead
>> associated with maintenance, disk space, etc. I think it's perfectly
>> rational to design indexes in a way eliminates most heap fetches
>> necessary to evaluate conditions, but does not guarantee IOS (so the
>> last heap fetch is still needed).
> 
> We are comparing compound index (a,b) and covering (inclusive) index (a)
> include (b)
> This indexes have exactly the same width and size and almost the same
> maintenance overhead.
> 
> First index has more expensive comparison function (involving two
> columns)  but I do not think that it can significantly affect
> performance and maintenance cost. Also if selectivity of "a" is good
> enough, then there is no need to compare "b"
> 
> Why we can prefer covering index  to compound index? I see only two good
> reasons:
> 1. Extra columns type do not  have comparison function need for AM.
> 2. The extra columns are never used in query predicate.
> 

Or maybe you don't want to include the columns in a UNIQUE constraint?

> If you are going to use this columns in query predicates I do not see
> much sense in creating inclusive index rather than compound index.
> Do you?
> 

But this is also about conditions that can't be translated into index
scan keys. Consider this:

create table t (a int, b int, c int);
insert into t select 1000 * random(), 1000 * random(), 1000 * random()
from generate_series(1,1000000) s(i);
create index on t (a,b);
vacuum analyze t;

explain (analyze, buffers) select * from t where a = 10 and mod(b,10) =
1111111;
                                                   QUERY PLAN

-----------------------------------------------------------------------------------------------------------------
 Index Scan using t_a_b_idx on t  (cost=0.42..3670.74 rows=5 width=12)
(actual time=4.562..4.564 rows=0 loops=1)
   Index Cond: (a = 10)
   Filter: (mod(b, 10) = 1111111)
   Rows Removed by Filter: 974
   Buffers: shared hit=980
   Prefetches: blocks=901
 Planning Time: 0.304 ms
 Execution Time: 5.146 ms
(8 rows)

Notice that this still fetched ~1000 buffers in order to evaluate the
filter on "b", because it's complex and can't be transformed into a nice
scan key. Or this:

explain (analyze, buffers) select a from t where a = 10 and (b+1) < 100
                                             and c < 0;


                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Index Scan using t_a_b_idx on t  (cost=0.42..3673.22 rows=1 width=4)
(actual time=4.446..4.448 rows=0 loops=1)
   Index Cond: (a = 10)
   Filter: ((c < 0) AND ((b + 1) < 100))
   Rows Removed by Filter: 974
   Buffers: shared hit=980
   Prefetches: blocks=901
 Planning Time: 0.313 ms
 Execution Time: 4.878 ms
(8 rows)

where it's "broken" by the extra unindexed column.

FWIW there are the primary cases I had in mind for this patch.


> 
>> What do you mean by "create normal compound index"? The patch addresses
>> a limitation that not every condition can be translated into a proper
>> scan key. Even if we improve this, there will always be such conditions.
>> The the IOS can evaluate them on index tuple, the regular index scan
>> can't do that (currently).
>>
>> Can you share an example demonstrating the alternative approach?
> 
> May be I missed something.
> 
> This is the example from
>
https://www.postgresql.org/message-id/flat/N1xaIrU29uk5YxLyW55MGk5fz9s6V2FNtj54JRaVlFbPixD5z8sJ07Ite5CvbWwik8ZvDG07oSTN-usENLVMq2UAcizVTEd5b-o16ZGDIIU=@yamlcoder.me
:
> 
> ```
> 
> And here is the plan with index on (a,b).
> 
> Limit (cost=0.42..4447.90 rows=1 width=12) (actual time=6.883..6.884
> rows=0 loops=1)    Output: a, b, d    Buffers: shared hit=613    ->
> Index Scan using t_a_b_idx on public.t (cost=0.42..4447.90 rows=1
> width=12) (actual time=6.880..6.881 rows=0 loops=1)          Output: a,
> b, d          Index Cond: ((t.a > 1000000) AND (t.b = 4))      
>    Buffers: shared hit=613 Planning:    Buffers: shared hit=41 Planning
> Time: 0.314 ms Execution Time: 6.910 ms ```
> 
> 
> Isn't it an optimal plan for this query?
> 
> And cite from self reproducible example https://dbfiddle.uk/iehtq44L :
> ```
> create unique index t_a_include_b on t(a) include (b);
> -- I'd expecd index above to behave the same as index below for this query
> --create unique index on t(a,b);
> ```
> 
> I agree that it is natural to expect the same result for both indexes.
> So this PR definitely makes sense.
> My point is only that compound index (a,b) in this case is more natural
> and preferable.
> 

Yes, perhaps. But you may also see it from the other direction - if you
already have an index with included columns (for whatever reason), it
would be nice to leverage that if possible. And as I mentioned above,
it's not always the case that move a column from "included" to a proper
key, or stuff like that.

Anyway, it seems entirely unrelated to this prefetching thread.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
pgsql-hackers by date:
From: Peter Smith
Date: 21 January 2024, 23:38:08
Subject: Re: pg_rewind WAL segments deletion pitfall
From: Tomas Vondra
Date: 21 January 2024, 23:47:27
Subject: Re: index prefetching
Re: index prefetching - Mailing list pgsql-hackers

Previous

Next