Re: Incorrect index being used - Mailing list pgsql-general
From | Jesse Long |
---|---|
Subject | Re: Incorrect index being used |
Date | |
Msg-id | 5257FDE5.7020608@unknown.za.net Whole thread Raw |
In response to | Re: Incorrect index being used (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Incorrect index being used
|
List | pgsql-general |
On 09/10/2013 18:06, Tom Lane wrote: > Jesse Long <jpl@unknown.za.net> writes: >> The query runs for much longer than I expect it to run for, and I think >> this is due to it using the incorrect subplan. As you can see, subplans >> 1 and 3 make use of and index, but these subplans are not used. >> Subplans and 4 are seqscan, and they are used. >> How can I get PostgreSQL to use subplan 1 and 3? > You can't, and you would not like the results if you did. > > The construct that's being described (perhaps not very intelligibly) > by this EXPLAIN output is an alternative pair of subplans. Actually > there are two such alternative pairs in this example. The indexscan > variants are subplans that would be fast if executed only once or > twice. The seqscan variants, if used, are used to load a hashtable > that is then probed for each row of the outer plan. If there are a > lot of rows to be considered in the outer plan, then it's better to > pay the price of loading the hashtable, because each hashtable probe > will be a lot cheaper than doing a fresh indexscan with the comparison > value from the current outer row. > > In this example, we can see that the outer scan that the subplans > are attached to eliminated 710851 rows by means of the subplan filters, > meaning that the subplans were probed 710851+2 times. If each of those > probes had been done with a separate indexscan, you'd likely still be > waiting for the result. Using the seqscan+hashtable was definitely the > right choice here. > > BTW, the reason it looks like this rather than just hard-wiring the > seqscan choice is a planner implementation artifact --- at the time > that the subplan plans are created, we don't know how many rows are > expected to pass through the outer plan level. So we plan it both > ways and leave the choice to be made during executor startup. > > What I'd suggest is that you see if you can't get rid of the "EXISTS() OR > EXISTS()" construction in favor of a single EXISTS clause --- I'm too lazy > to work out the details but it looks like you could do the OR in the WHERE > clause of a single EXISTS sub-select. That would allow the planner to > convert the EXISTS into a semi-join, which might work better than what > you've got. As is, you're dealing with fairly generic sub-select logic > that isn't going to be terribly well optimized. > Hi Tom, I understand what you say about using the index (archive_id, node_id, value) to do a separate lookup for each row in the archive_document table that would be filtered. I understand that this would be constly. However, the seqscan is killing me. I have another index on archive_document_index which has been there all along - (node_id, value). Would it not be better for PostgreSQL to use this index to perform an index scan instead of a seqscan when populating the hash table? explain select * from archive_document_index where node_id = 29 and value = 'BSH70002152'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Index Scan using archive_document_index_node_id_value_idx on archive_document_index (cost=0.57..36.13 rows=14 width=33) Index Cond: ((node_id = 29) AND ((value)::text = 'BSH70002152'::text)) (2 rows) Thanks, Jesse
pgsql-general by date: