Home > mailing lists

Re: Incorrect index being used - Mailing list pgsql-general

From	Jesse Long
Subject	Re: Incorrect index being used
Date	October 11, 2013 13:32:33
Msg-id	5257FDE5.7020608@unknown.za.net Whole thread Raw
In response to	Re: Incorrect index being used (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Incorrect index being used
List	pgsql-general

Tree view

On 09/10/2013 18:06, Tom Lane wrote:
> Jesse Long <jpl@unknown.za.net> writes:
>> The query runs for much longer than I expect it to run for, and I think
>> this is due to it using the incorrect subplan. As you can see, subplans
>> 1 and 3 make use of and index, but these subplans are not used.
>> Subplans  and 4 are seqscan, and they are used.
>> How can I get PostgreSQL to use subplan 1 and 3?
> You can't, and you would not like the results if you did.
>
> The construct that's being described (perhaps not very intelligibly)
> by this EXPLAIN output is an alternative pair of subplans.  Actually
> there are two such alternative pairs in this example.  The indexscan
> variants are subplans that would be fast if executed only once or
> twice.  The seqscan variants, if used, are used to load a hashtable
> that is then probed for each row of the outer plan.  If there are a
> lot of rows to be considered in the outer plan, then it's better to
> pay the price of loading the hashtable, because each hashtable probe
> will be a lot cheaper than doing a fresh indexscan with the comparison
> value from the current outer row.
>
> In this example, we can see that the outer scan that the subplans
> are attached to eliminated 710851 rows by means of the subplan filters,
> meaning that the subplans were probed 710851+2 times.  If each of those
> probes had been done with a separate indexscan, you'd likely still be
> waiting for the result.  Using the seqscan+hashtable was definitely the
> right choice here.
>
> BTW, the reason it looks like this rather than just hard-wiring the
> seqscan choice is a planner implementation artifact --- at the time
> that the subplan plans are created, we don't know how many rows are
> expected to pass through the outer plan level.  So we plan it both
> ways and leave the choice to be made during executor startup.
>
> What I'd suggest is that you see if you can't get rid of the "EXISTS() OR
> EXISTS()" construction in favor of a single EXISTS clause --- I'm too lazy
> to work out the details but it looks like you could do the OR in the WHERE
> clause of a single EXISTS sub-select.  That would allow the planner to
> convert the EXISTS into a semi-join, which might work better than what
> you've got.  As is, you're dealing with fairly generic sub-select logic
> that isn't going to be terribly well optimized.
>

Hi Tom,

I understand what you say about using the index (archive_id, node_id,
value) to do a separate lookup for each row in the archive_document
table that would be filtered. I understand that this would be constly.

However, the seqscan is killing me. I have another index on
archive_document_index which has been there all along - (node_id,
value). Would it not be better for PostgreSQL to use this index to
perform an index scan instead of a seqscan when populating the hash table?

explain select * from archive_document_index where node_id = 29 and
value = 'BSH70002152';
                                                         QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
  Index Scan using archive_document_index_node_id_value_idx on
archive_document_index  (cost=0.57..36.13 rows=14 width=33)
    Index Cond: ((node_id = 29) AND ((value)::text = 'BSH70002152'::text))
(2 rows)

Thanks,
Jesse

pgsql-general by date:

From: John McKown
Date: 11 October 2013, 13:29:29
Subject: Fwd: String reverse funtion?

From: Merlin Moncure
Date: 11 October 2013, 13:34:59
Subject: Re: Forms for entering data into postgresql

Re: Incorrect index being used - Mailing list pgsql-general

Previous

Next