Home > mailing lists

abnormally long time in performing a two-table join - Mailing list pgsql-admin

From	Chris Mungall
Subject	abnormally long time in performing a two-table join
Date	August 11, 2002 17:18:03
Msg-id	Pine.LNX.4.33.0208111408010.16003-100000@sos.lbl.gov Whole thread Raw
Responses	Re: abnormally long time in performing a two-table join
List	pgsql-admin

Tree view

I've recently made the switch from mysql to postgres - i am now a
zealous postgres convert. however, i'm experiencing some performance
issues that i'm sure are just a reflection on my inexperience admining pg.

A simple two-table natural join (on integer primary/foreign keys) seems
abnormally slow.

(I have performed VACUUM ANALYZE on both tables)

both forced index scans and the default seq+index nested loop
combination are of comparably slow speed.

I include explain analyze output for both modes, plus details on the
tables and their indexes:


omicia29=# set enable_seqscan=off;
SET VARIABLE

omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL JOIN sfqv where qualifier_value = 'BRCA1' and
seqfeature.seqfeature_key_id= 15; 
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..197011.39 rows=223 width=8) (actual time=16744.92..44572.00 rows=15 loops=1)
  ->  Index Scan using seqfeature_pkey on seqfeature  (cost=0.00..61715.62 rows=44674 width=4) (actual
time=0.29..14669.06rows=100030 loops=1) 
  ->  Index Scan using sfqv_idx1 on sfqv  (cost=0.00..3.02 rows=1 width=4) (actual time=0.29..0.29 rows=0 loops=100030)

Total runtime: 44572.13 msec

omicia29=# set enable_seqscan=on;
SET VARIABLE

omicia29=# explain analyze select seqfeature_id from seqfeature NATURAL JOIN sfqv where qualifier_value = 'BRCA1' and
seqfeature.seqfeature_key_id= 15; 
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..186352.20 rows=223 width=8) (actual time=7803.68..14224.36 rows=15 loops=1)
  ->  Seq Scan on seqfeature  (cost=0.00..51056.43 rows=44674 width=4) (actual time=0.13..4050.09 rows=100030 loops=1)
  ->  Index Scan using sfqv_idx1 on sfqv  (cost=0.00..3.02 rows=1 width=4) (actual time=0.10..0.10 rows=0 loops=100030)
Total runtime: 14224.56 msec

omicia29=# select relkind,sum(relpages) from pg_class group by relkind;
 relkind |  sum
---------+--------
 S       |     26
 i       | 678149
 r       | 487314
 s       |      0
 t       | 162194
 v       |      0
(6 rows)

omicia29=# select count(*) from seqfeature;
  count
---------
 2632194
(1 row)

omicia29=# select count(*) from sfqv;
  count
---------
 8284617
(1 row)

omicia29=# select count(*) from seqfeature where seqfeature_key_id = 15;
 count
--------
 100030
(1 row)

omicia29=# select count(*) from sfqv where qualifier_value = 'BRCA1';
 count
-------
   110
(1 row)

I am obviously doing something wrong, as other joins between the
"seqfeature" table and other tables of comparable size to "sfqv" are
nice and fast.

I can make this go much faster by doing two seperate queries, and
plugging the foreign keys from "sfqv" into a query on "seqfeature";
but obviously I'd rather do this in one SQL query than two.

the pg planner seems to be attempting to do the query the other way
round - doing the query that returns the most rows first??

I have tried forcing the order with a CROSS JOIN and explicitly
constraining by the seqfeature_id field, it works out the same

these are the indexes:

CREATE INDEX sfqv_idx1 ON sfqv USING btree (seqfeature_id);
CREATE INDEX sfqv_idx2 ON sfqv USING btree (ontology_term_id);
CREATE INDEX sfqv_idx3 ON sfqv USING btree (qualifier_value);
CREATE INDEX sfqv_idx4 ON sfqv USING btree (seqfeature_id, qualifier_value);

CREATE INDEX sf1 ON seqfeature USING btree (seqfeature_key_id);
CREATE INDEX sf2 ON seqfeature USING btree (seqfeature_source_id);
CREATE INDEX sf3 ON seqfeature USING btree (bioentry_id);
CREATE INDEX sf4 ON seqfeature USING btree (seqfeature_id, seqfeature_key_id);

as you can see from the EXPLAIN above, only sfqv_idx is being used.

I would have thought sfqv_idx4 would be useful in this particular query?

and also sf4?

and finally, here are the actual tables:

CREATE TABLE "seqfeature" (
        "seqfeature_id" integer DEFAULT nextval('seqfeature_pk_seq'::text) NOT N
ULL,
        "bioentry_id" integer NOT NULL,
        "seqfeature_key_id" integer,
        "seqfeature_source_id" integer,
        "seqfeature_rank" integer,
        Constraint "seqfeature_pkey" Primary Key ("seqfeature_id")
);

CREATE TABLE "sfqv" (
        "seqfeature_id" integer,
        "ontology_term_id" integer,
        "qualifier_rank" integer,
        "qualifier_value" character(255)
);

pgsql-admin by date:

From: Brian McCane
Date: 11 August 2002, 09:04:55
Subject: Re: Postgres performance slowly gets worse over a month

From: Tom Lane
Date: 11 August 2002, 17:41:46
Subject: Re: abnormally long time in performing a two-table join

abnormally long time in performing a two-table join - Mailing list pgsql-admin

Previous

Next