Home > mailing lists

Re: Allow ILIKE forward matching to use btree index - Mailing list pgsql-hackers

From	Yugo NAGATA
Subject	Re: Allow ILIKE forward matching to use btree index
Date	January 16, 2025 08:53:35
Msg-id	20250116145335.0b4e164edee9b506353c7e0c@sraoss.co.jp Whole thread Raw
In response to	Re: Allow ILIKE forward matching to use btree index (Jeff Davis <pgsql@j-davis.com>)
Responses	Re: Allow ILIKE forward matching to use btree index
List	pgsql-hackers

Tree view

On Wed, 15 Jan 2025 14:40:19 -0800
Jeff Davis <pgsql@j-davis.com> wrote:

> My apologies, I sent the previous email prematurely. Let me try again:
> 
> On Wed, 2025-01-15 at 14:34 -0800, Jeff Davis wrote:
> > On Wed, 2025-01-15 at 01:40 +0900, Yugo NAGATA wrote:
> > > > > For example, "t ~~ '123foo%'" is converted to "(t >= '123foo'
> > > > > AND
> > > > > t < '123fop')" 
> > > > > and index scan can be used for this condition. On the other
> > > > > hand,
> > > > > "t ~~* '123foo'"
> > > > > cannot be converted and sequential scan is used. 
> > > > > 
> > > > > Even in this case, we can use a bitmap index scan for the
> > > > > condition 
> > > > > "(t >= '123f' AND t < '123g') OR "(t >= '123F' AND t < '123G')"
> > > > > followed by
> > > > > recheck by the original condition "t ~~* '123foo'", and this
> > > > > could be faster
> > > > > than seqscan. 
> 
> In theory, there could be many OR clauses:
> 
>   (t >= '123foo' AND t < '123fop') OR
>   (t >= '123Foo' AND t < '123Fop') OR
>   (t >= '123fOo' AND t < '123fOp') OR
>   (t >= '123FOo' AND t < '123FOp') OR
>   ...
> 
> How should that be limited?

Instead of generating complete patterns considering every case-varying characters,
two clauses considering only the first case-varying character are generated.

For example, for the condition "t ILIKE '123foo%'", the generated condition is
"(t >= '123f' AND t < '123g') OR "(t >= '123F' AND t < '123G')".

Rows meeting "(t >= '123f' AND t < '123g')" includes those whose "t" start with
'123f', that is meeting the following;

>   (t >= '123foo' AND t < '123fop') OR
>   (t >= '123fOo' AND t < '123fOp') OR
>   ...

, and rows meeting "(t >= '123F' AND t < '123G')" includes those whose "t" start
with '123F', that is meeting the following;

>   (t >= '123Foo' AND t < '123Fop') OR
>   (t >= '123FOo' AND t < '123FOp') OR
>   ...

It is required that the second case-varying character and later are checked after
the index scan, and some rows are filtered, but it will be still faster than full
sequential scan.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

pgsql-hackers by date:

From: Tom Lane
Date: 16 January 2025, 08:47:17
Subject: Re: Make pg_stat_io view count IOs as bytes instead of blocks

From: Yugo Nagata
Date: 16 January 2025, 09:04:38
Subject: Re: Inquiry About Determining Parallel Plans for REFRESH MATERIALIZED VIEW

Re: Allow ILIKE forward matching to use btree index - Mailing list pgsql-hackers

Previous

Next