Re: looking for a faster way to do that - Mailing list pgsql-general

From Alban Hertroys
Subject Re: looking for a faster way to do that
Date
Msg-id 976CEC06-8525-4A87-BBCF-84211ACD91CB@gmail.com
Whole thread Raw
In response to Re: looking for a faster way to do that  (hamann.w@t-online.de)
List pgsql-general
On 25 Sep 2011, at 8:04, hamann.w@t-online.de wrote:
> Hi Eduardo,
>
> it is clear that scanning the table once with a list of matches will outperform
> rescanning the table for every string wanted. Now, my problem is that the patterns are
> dynamic as well. So if I could translate a table with one column  and a few thousand rows
> into something like
> regexp_matches(code,'string1|string2|.....string2781')
> would ideally be a performant query. Unfortunately I have no idea how I could achieve this
> transformation inside the database. Doing it externally fails, because any single query cannot
> be more than so few characters.



To me it sounds a little bit like you're comparing every item in a warehouse to a set of descriptions to see what type
ofitem it is, which is something you would be much better off storing as a property of the item. If an item is a fruit,
storethat it's a fruit! 
But I'm guessing at what you're trying to accomplish, so here's a few other options...

I guess you could create 2781 expression indexes to do what you want (is there a limit that prevents this?). Query
planningwould probably become kind of slow and the indices will take up a considerable fraction of the total table
storagerequired - that's a pretty outlandish approach. 

CREATE INDEX tbl_str_regex1 ON tbl (regexp_matches(str, 'string1'));
CREATE INDEX tbl_str_regex1 ON tbl (regexp_matches(str, 'string2'));
...
CREATE INDEX tbl_str_regex1 ON tbl (regexp_matches(str, 'string2781'));

Or are you really going to query every record against all 2781 regexes? I can't figure out a realistic scenario why you
(oranyone) would want that. 
In that case those indices aren't going to help you much, as the planner would have to hold every record in tbl to each
index- it won't do that. 


You could also create a giant lookup table (a materialized view, if you like) where the results of every match of str
intbl against the wantcode in the regex table is stored. That's some huge overhead, but it will probably outperform
mostother options. With the numbers you gave that table will hold about 2-3 billion records with two foreign key values
anda truth value each. 


Alban Hertroys

--
The scale of a problem often equals the size of an ego.



pgsql-general by date:

Previous
From: Eduardo Morras
Date:
Subject: Re: Speed of lo_unlink vs. DELETE on BYTEA
Next
From: pasman pasmański
Date:
Subject: New feature: accumulative functions.