abnormally long time in performing a two-table join - Mailing list pgsql-admin
From | Chris Mungall |
---|---|
Subject | abnormally long time in performing a two-table join |
Date | |
Msg-id | Pine.LNX.4.33.0208111408010.16003-100000@sos.lbl.gov Whole thread Raw |
Responses |
Re: abnormally long time in performing a two-table join
|
List | pgsql-admin |
I've recently made the switch from mysql to postgres - i am now a zealous postgres convert. however, i'm experiencing some performance issues that i'm sure are just a reflection on my inexperience admining pg. A simple two-table natural join (on integer primary/foreign keys) seems abnormally slow. (I have performed VACUUM ANALYZE on both tables) both forced index scans and the default seq+index nested loop combination are of comparably slow speed. I include explain analyze output for both modes, plus details on the tables and their indexes: omicia29=# set enable_seqscan=off; SET VARIABLE omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id= 15; NOTICE: QUERY PLAN: Nested Loop (cost=0.00..197011.39 rows=223 width=8) (actual time=16744.92..44572.00 rows=15 loops=1) -> Index Scan using seqfeature_pkey on seqfeature (cost=0.00..61715.62 rows=44674 width=4) (actual time=0.29..14669.06rows=100030 loops=1) -> Index Scan using sfqv_idx1 on sfqv (cost=0.00..3.02 rows=1 width=4) (actual time=0.29..0.29 rows=0 loops=100030) Total runtime: 44572.13 msec omicia29=# set enable_seqscan=on; SET VARIABLE omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL JOIN sfqv where qualifier_value = 'BRCA1' and seqfeature.seqfeature_key_id= 15; NOTICE: QUERY PLAN: Nested Loop (cost=0.00..186352.20 rows=223 width=8) (actual time=7803.68..14224.36 rows=15 loops=1) -> Seq Scan on seqfeature (cost=0.00..51056.43 rows=44674 width=4) (actual time=0.13..4050.09 rows=100030 loops=1) -> Index Scan using sfqv_idx1 on sfqv (cost=0.00..3.02 rows=1 width=4) (actual time=0.10..0.10 rows=0 loops=100030) Total runtime: 14224.56 msec omicia29=# select relkind,sum(relpages) from pg_class group by relkind; relkind | sum ---------+-------- S | 26 i | 678149 r | 487314 s | 0 t | 162194 v | 0 (6 rows) omicia29=# select count(*) from seqfeature; count --------- 2632194 (1 row) omicia29=# select count(*) from sfqv; count --------- 8284617 (1 row) omicia29=# select count(*) from seqfeature where seqfeature_key_id = 15; count -------- 100030 (1 row) omicia29=# select count(*) from sfqv where qualifier_value = 'BRCA1'; count ------- 110 (1 row) I am obviously doing something wrong, as other joins between the "seqfeature" table and other tables of comparable size to "sfqv" are nice and fast. I can make this go much faster by doing two seperate queries, and plugging the foreign keys from "sfqv" into a query on "seqfeature"; but obviously I'd rather do this in one SQL query than two. the pg planner seems to be attempting to do the query the other way round - doing the query that returns the most rows first?? I have tried forcing the order with a CROSS JOIN and explicitly constraining by the seqfeature_id field, it works out the same these are the indexes: CREATE INDEX sfqv_idx1 ON sfqv USING btree (seqfeature_id); CREATE INDEX sfqv_idx2 ON sfqv USING btree (ontology_term_id); CREATE INDEX sfqv_idx3 ON sfqv USING btree (qualifier_value); CREATE INDEX sfqv_idx4 ON sfqv USING btree (seqfeature_id, qualifier_value); CREATE INDEX sf1 ON seqfeature USING btree (seqfeature_key_id); CREATE INDEX sf2 ON seqfeature USING btree (seqfeature_source_id); CREATE INDEX sf3 ON seqfeature USING btree (bioentry_id); CREATE INDEX sf4 ON seqfeature USING btree (seqfeature_id, seqfeature_key_id); as you can see from the EXPLAIN above, only sfqv_idx is being used. I would have thought sfqv_idx4 would be useful in this particular query? and also sf4? and finally, here are the actual tables: CREATE TABLE "seqfeature" ( "seqfeature_id" integer DEFAULT nextval('seqfeature_pk_seq'::text) NOT N ULL, "bioentry_id" integer NOT NULL, "seqfeature_key_id" integer, "seqfeature_source_id" integer, "seqfeature_rank" integer, Constraint "seqfeature_pkey" Primary Key ("seqfeature_id") ); CREATE TABLE "sfqv" ( "seqfeature_id" integer, "ontology_term_id" integer, "qualifier_rank" integer, "qualifier_value" character(255) );
pgsql-admin by date: