Re: [SQL] bad select performance fixed by forbidding hash joins - Mailing list pgsql-general
From | George Young |
---|---|
Subject | Re: [SQL] bad select performance fixed by forbidding hash joins |
Date | |
Msg-id | 199907211506.LAA16984@ll.mit.edu Whole thread Raw |
Responses |
Re: [SQL] bad select performance fixed by forbidding hash joins
|
List | pgsql-general |
[PostgreSQL 6.5.0 on i586-pc-linux-gnu, compiled by gcc egcs-2.91.66] table opset_steps (name text, id int2, ver int2) [1400 rows] non-unique index is on (id, ver) table run_opsets (status int2, id int2, ver int2, run_id int2, seq int2) [17000 rows] pkey is (id, seq), second index on(status, id, ver, run_id) select count(*) from run_opsets where status=1; --> 187 select count(*) from run_opsets where status=3; --> 10564 table runs (run_name text, run_id int2, status int2) [900 rows] pkey is run_name, second index(run_id, status) select count(*)from runs where status=1; -->68 I have vacuum analyzed all relevant tables. explain select os.name,r.run_name,ro.status from opset_steps os,runs r,run_opsets ro where (ro.status=3 or ro.status=1) andro.opset_id=os.opset_id and ro.run_id=r.run_id and ro.opset_ver=os.opset_ver and r.status=1; Hash Join (cost=1793.58 rows=14560 width=38) -> Hash Join (cost=1266.98 rows=14086 width=24) -> Seq Scan on run_opsets ro (cost=685.51 rows=13903 width=8) -> Hash (cost=70.84 rows=1389 width=16) -> Seq Scan on opset_steps os (cost=70.84 rows=1389 width=16) -> Hash (cost=47.43 rows=374 width=14) -> Seq Scan on runs r (cost=47.43 rows=374 width=14) This query takes 16 seconds. [returns 3126 rows] On Tue, Jul 20, 1999 at 05:42:20PM -0400, Tom Lane wrote: > On Tue, 20 Jul 1999 14:56:46 -0400 George Young wrote: > > ... Is this then > > the best that postgres can do? Is there some rephrasing/restructuring of > > this query that would make it faster? > > Hard to say. The query looks reasonable as it stands --- > ... You have no restriction > clause on opset_steps so all of those entries get loaded for hashing; > can you provide one? No. > The system's plan looks pretty reasonable as well. It might be that > a merge join would be faster than a hash join but I wouldn't assume > so. If you want, you can try forcing the system not to use hashes; > start psql with environment variable > PGOPTIONS="-fh" > and see what sort of plan and execution time you get. If that does > turn out to be a big win it would indicate that the planner is using > poor cost estimates, which is certainly possible... Yes! PGOPTIONS="-fh" made the query time go from 16 seconds to 2 seconds! Is this a safe thing to leave on permanently, or is there some way to set PGOPTIONS for just this query? explain select os.name,r.run_name,ro.status from opset_steps os,runs r,run_opsets ro where (ro.status=3 or ro.status=1) andro.opset_id=os.opset_id and ro.run_id=r.run_id and ro.opset_ver=os.opset_ver and r.status=1; Merge Join (cost=9295.54 rows=14560 width=38) -> Seq Scan (cost=8676.01 rows=14371 width=22) -> Sort (cost=8676.01 rows=14371 width=22) -> Merge Join (cost=1657.30 rows=14371 width=22) -> Index Scan using run_opsets_pkey on run_opsets ro (cost=1031.25 rows=13903 width=8) -> Seq Scan (cost=154.91 rows=374 width=14) -> Sort (cost=154.91 rows=374 width=14) -> Seq Scan on runs r (cost=47.43 rows=374 width=14) -> Index Scan using opset_steps_idx_ver_id on opset_steps os (cost=99.45 rows=1389 width=16) With PGOPTIONS=-fh, this query takes ~ 2 seconds! [returns 3126 rows] -- George Young, Rm. L-204 gry@ll.mit.edu MIT Lincoln Laboratory 244 Wood St. Lexington, Massachusetts 02420-9108 (781) 981-2756
pgsql-general by date: