Re: GiST index performance - Mailing list pgsql-performance
From | dforum |
---|---|
Subject | Re: GiST index performance |
Date | |
Msg-id | 49E76896.7080001@vieonet.com Whole thread Raw |
In response to | GiST index performance (Matthew Wakeling <matthew@flymine.org>) |
Responses |
Re: GiST index performance
Re: GiST index performance |
List | pgsql-performance |
hello, there is other performance problem on this request. If you analyse query plan, you see that most of the time are lost during sequencial scan, and you have 2 seq scan. You have to create other indexes to match the request. Postgresq is totally dependant on index to reach is performance. Regarding gist or btree, I personnaly had better performance with btree. Regards david Matthew Wakeling a écrit : > > I have been doing some queries that are best answered with GiST > indexes, however I have found that their performance is a little > lacking. I thought I would do a direct comparison on a level playing > field. Here are two EXPLAIN ANALYSE results for the same query, with > two different indexes. The two indexes are identical except that one > is btree and the other GiST. > > Here is the query: > > SELECT * > FROM > location l1, > location l2, > gene, > primer > WHERE > l1.subjectid <> l2.subjectid > AND l1.objectid = l2.objectid > AND l1.subjectid = gene.id > AND l2.subjectid = primer.id > AND l2.intermine_start <= l1.intermine_start > AND l2.intermine_end >= l1.intermine_start > > Here is the btree index: > > CREATE INDEX location_object_start ON location (objectid, > intermine_start); > > QUERY PLAN > ---------------------------------------------------------------------- > Hash Join > (cost=26213.16..135980894.76 rows=3155740824 width=484) > (actual time=2799.260..14256.588 rows=2758 loops=1) > Hash Cond: (l1.subjectid = gene.id) > -> Nested Loop > (cost=0.00..4364485.01 rows=8891802645 width=324) > (actual time=9.748..10418.807 rows=390695 loops=1) > Join Filter: (l1.subjectid <> l2.subjectid) > -> Nested Loop > (cost=0.00..446862.58 rows=572239 width=259) > (actual time=9.720..4226.117 rows=211880 loops=1) > -> Seq Scan on primer > (cost=0.00..15358.80 rows=211880 width=194) > (actual time=9.678..579.877 rows=211880 loops=1) > -> Index Scan using location__key_all on location l2 > (cost=0.00..2.00 rows=3 width=65) > (actual time=0.004..0.007 rows=1 loops=211880) > Index Cond: (l2.subjectid = primer.id) > -> Index Scan using location_object_start on location l1 > (cost=0.00..3.85 rows=150 width=65) > (actual time=0.005..0.012 rows=3 loops=211880) > Index Cond: ((l1.objectid = l2.objectid) AND > (l2.intermine_start <= l1.intermine_start) AND (l2.intermine_end >= > l1.intermine_start)) > -> Hash > (cost=20496.96..20496.96 rows=457296 width=160) > (actual time=2788.698..2788.698 rows=457296 loops=1) > -> Seq Scan on gene > (cost=0.00..20496.96 rows=457296 width=160) > (actual time=0.038..1420.604 rows=457296 loops=1) > Total runtime: 14263.846 ms > (13 rows) > > > Here is the GiST index: > > CREATE INDEX location_object_start_gist ON location USING gist > (objectid, intermine_start); > > QUERY PLAN > ------------------------------------------------------------------------ > Hash Join > (cost=26213.16..136159960.32 rows=3155740824 width=484) > (actual time=2576.109..2300486.267 rows=2758 loops=1) > Hash Cond: (l1.subjectid = gene.id) > -> Nested Loop > (cost=0.00..4543550.56 rows=8891802645 width=324) > (actual time=366.121..2296668.740 rows=390695 loops=1) > Join Filter: (l1.subjectid <> l2.subjectid) > -> Nested Loop > (cost=0.00..446862.58 rows=572239 width=259) > (actual time=362.774..13423.443 rows=211880 loops=1) > -> Seq Scan on primer > (cost=0.00..15358.80 rows=211880 width=194) > (actual time=319.559..1296.907 rows=211880 loops=1) > -> Index Scan using location__key_all on location l2 > (cost=0.00..2.00 rows=3 width=65) > (actual time=0.041..0.045 rows=1 loops=211880) > Index Cond: (l2.subjectid = primer.id) > -> Index Scan using location_object_start_gist on location l1 > (cost=0.00..4.16 rows=150 width=65) > (actual time=3.354..10.757 rows=3 loops=211880) > Index Cond: ((l1.objectid = l2.objectid) AND > (l2.intermine_start <= l1.intermine_start) AND (l2.intermine_end >= > l1.intermine_start)) > -> Hash > (cost=20496.96..20496.96 rows=457296 width=160) > (actual time=2157.914..2157.914 rows=457296 loops=1) > -> Seq Scan on gene > (cost=0.00..20496.96 rows=457296 width=160) > (actual time=3.904..1206.907 rows=457296 loops=1) > Total runtime: 2300510.674 ms > (13 rows) > > The query plans are identical except in the type of index used, but > there is a factor of a few hundred in execute time. Is this the kind > of factor that would be expected, or is there something amiss? Is this > seen as something that might be improved in the future? > > Matthew >
pgsql-performance by date: