performance of IN (subquery) - Mailing list pgsql-general
From | Kevin Murphy |
---|---|
Subject | performance of IN (subquery) |
Date | |
Msg-id | AE87AE00-F7AE-11D8-91C6-0003930D3626@genome.chop.edu Whole thread Raw |
Responses |
Re: performance of IN (subquery)
Re: performance of IN (subquery) |
List | pgsql-general |
I'm using PG 7.4.3 on Mac OS X. I am disappointed with the performance of queries like 'select foo from bar where baz in (subquery)', or updates like 'update bar set foo = 2 where baz in (subquery)'. PG always seems to want to do a sequential scan of the bar table. I wish there were a way of telling PG, "use the index on baz in your plan, because I know that the subquery will return very few results". Where it really matters, I have been constructing dynamic queries by looping over the values for baz and building a separate query for each one and combining with a UNION (or just directly updating, in the update case). Depending on the size of the bar table, I can get speedups of hundreds or even more than a thousand times, but it is a big pain to have to do this. Any tips? Thanks, Kevin Murphy Illustrated: The query I want to do is very slow: select bundle_id from build.elements where elementid in ( SELECT superlocs_2.element_id FROM superlocs_2 NATURAL JOIN bundle_superlocs_2 WHERE bundle_superlocs_2.protobundle_id = 1); ----------- 7644 7644 (2 rows) Time: 518.242 ms The subquery is fast: SELECT superlocs_2.element_id FROM superlocs_2 NATURAL JOIN bundle_superlocs_2 WHERE bundle_superlocs_2.protobundle_id = 1; ------------ 41209 25047 (2 rows) Time: 3.268 ms And using indexes on the main table is fast: select bundle_id from build.elements where elementid in (41209, 25047); ----------- 7644 7644 (2 rows) Time: 2.468 ms The plan for the slow query: egenome_test=# explain analyze select bundle_id from build.elements where elementid in ( SELECT superlocs_2.element_id FROM superlocs_2 NATURAL JOIN bundle_superlocs_2 WHERE bundle_superlocs_2.protobundle_id = 1); egenome_test-# egenome_test(# egenome_test(# egenome_test(# QUERY PLAN \ ------------------------------------------------------------------------ ------------------------------------------------------------- Hash Join (cost=70.33..72.86 rows=25 width=4) (actual time=583.051..583.059 rows=2 loops=1) Hash Cond: ("outer".element_id = "inner".elementid) -> HashAggregate (cost=47.83..47.83 rows=25 width=4) (actual time=0.656..0.658 rows=2 loops=1) -> Hash Join (cost=22.51..47.76 rows=25 width=4) (actual time=0.615..0.625 rows=2 loops=1) Hash Cond: ("outer".superloc_id = "inner".superloc_id) -> Seq Scan on superlocs_2 (cost=0.00..20.00 rows=1000 width=8) (actual time=0.004..0.012 rows=9 loops=1) -> Hash (cost=22.50..22.50 rows=5 width=4) (actual time=0.076..0.076 rows=0 loops=1) -> Seq Scan on bundle_superlocs_2 (cost=0.00..22.50 rows=5 width=4) (actual time=0.024..0.033 rows=2 loops=1) Filter: (protobundle_id = 1) -> Hash (cost=20.00..20.00 rows=1000 width=8) (actual time=581.802..581.802 rows=0 loops=1) -> Seq Scan on elements (cost=0.00..20.00 rows=1000 width=8) (actual time=0.172..405.243 rows=185535 loops=1) Total runtime: 593.843 ms (12 rows)
pgsql-general by date: