Re: How to boost performance of queries containing pattern matching characters - Mailing list pgsql-performance

From Richard Huxton
Subject Re: How to boost performance of queries containing pattern matching characters
Date
Msg-id 4D58E048.9020506@archonet.com
Whole thread Raw
In response to Re: How to boost performance of queries containing pattern matching characters  ("Gnanakumar" <gnanam@zoniac.com>)
Responses Re: How to boost performance of queries containing pattern matching characters
List pgsql-performance
On 14/02/11 07:46, Gnanakumar wrote:
>> If you really need to match all those options, you can't use an index. A
>> substring-matching index would need to have multiple entries per
>> character per value (since it doesn't know what you will search for).
>> The index-size becomes unmanageable very quickly.
>
>> That's why I asked what you really wanted to match.
> To be more specific, in fact, our current application allows to delete
> email(s) with a minimum of 3 characters.  There is a note/warning also given
> for application Users' before deleting, explaining the implication of this
> delete action (partial&  case-insensitive, and it could be wide-ranging
> too).
>
>> So, I'll ask again: do you really want to match all of those options?
> Yes, as explained above, I want to match all those.

Then you can't use a simple index. If you did use an index it would
probably be much slower for "com" or "yah" or "gma" and so on.

The closest you can do is something like Artur's option (or the pg_trgm
module - handy since you are looking at 3-chars and up) to select likely
matches combined with a separate search on '%domain.com%' to confirm
that fact.

P.S. - I'd be inclined to just match the central domain parts, so for
"user1@europe.megacorp.com" you would index "europe" and "megacorp" and
only allow matching on the start of each string. Of course if your
application spec says you need to match on "p.c" too then that's what
you have to do.

--
   Richard Huxton
   Archonet Ltd

pgsql-performance by date:

Previous
From: Richard Huxton
Date:
Subject: Re: How to boost performance of queries containing pattern matching characters
Next
From: david@lang.hm
Date:
Subject: Re: choosing the right RAID level for PostgresQL database