Re: Query tuning help - Mailing list pgsql-performance
From | Dan Harris |
---|---|
Subject | Re: Query tuning help |
Date | |
Msg-id | 6f4a543a2fbd348c7e82876b8e477600@drivefaster.net Whole thread Raw |
In response to | Re: Query tuning help (Russell Smith <mr-russ@pws.com.au>) |
Responses |
Re: Query tuning help
Re: Query tuning help Re: Query tuning help |
List | pgsql-performance |
On May 8, 2005, at 6:51 PM, Russell Smith wrote: > On Mon, 9 May 2005 09:20 am, Dan Harris wrote: > You cannot use an index for %CORVETTE%, or %RED%. There is no way > for the index to know if a row had that in the middle without scanning > the whole > index. So it's much cheaper to do a sequence scan. > While I believe you, I'm confused by this line in my original EXPLAIN ANALYZE: >> -> Index Scan using ea1 on ea (cost=0.00..2736.43 rows=42 width=47) >> (actual time=2.085..2.309 rows=2 loops=473) >> Index Cond: >> ((ea.incidentid)::text = ("outer".incidentid)::text) >> Filter: (((recordtext)::text >> ~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text)) Doesn't that mean it was using an index to filter? Along those lines, before I created index 'ea1', the query was much much slower. So, it seemed like creating this index made a difference. > One possible way to make the query faster is to limit based on date, > as you will only get about 700 rows. > And then don't use subselects, as they are doing full sequence scans. > I think this query does what you do > above, and I think it will be faster, but I don't know. > I REALLY like this idea! If I could just filter by date first and then sequential scan through those, it should be very manageable. Hopefully I can keep this goal while still accommodating the requirement listed in my next paragraph. > select distinct em.incidentid, ea.recordtext as retdata, eg.long, > eg.lat > FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >= > '2005-1-1 00:00' > AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like '%RED%' > AND ea.recordtext like '%CORVETTE%') > JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like > '%RED%' or recordtext like '%CORVETTE%' ); > I have run this, and while it is very fast, I'm concerned it's not doing what I need. Here's the situation: Due to the format of the systems with which I integrate ( I have no control over these formats ), we will get these 'recordtext' values one line at a time, accumulating over time. The only way I can find to make this work is to insert a new record for each line. The problem is, that when someone wants to search multiple keywords, they expect these words to be matched across multiple records with a given incident number. For a very simple example: IncidentID Date Recordtext -------------- ------------- ------------------------------------------------------- 11111 2005-05-01 14:21 blah blah blah RED blah blah 2222 2005-05-01 14:23 not what we are looking for 11111 2005-05-02 02:05 blah CORVETTE blah blah So, doing a search with an 'and' condition, e.g. WHERE RECORDTEXT LIKE '%RED%' AND RECORDTEXT LIKE '%CORVETTE%' , will not match because the condition will only be applied to a single row of recordtext at a time, not a whole group with the same incident number. If I were to use tsearch2 for full-text indexing, would I need to create another table that merges all of my recordtext rows into a single 'text' field type? If so, this is where I run into problems, as my logic also needs to match multiple words in their original order. I may also receive additional updates to the previous data. In that case, I need to replace the original record with the latest version of it. If I have already concatenated these rows into a single field, the logic to in-line replace only the old text that has changed is very very difficult at best. So, that's the reason I had to do two subqueries in my example. Please tell me if I misunderstood your logic and it really will match given my condition above, but it didn't seem like it would. Thanks again for the quick responses! This list has been a great resource for me. -Dan
pgsql-performance by date: