Re: optimizing query - Mailing list pgsql-general
From | Chantal Ackermann |
---|---|
Subject | Re: optimizing query |
Date | |
Msg-id | 3E2FB2D1.9020904@biomax.de Whole thread Raw |
In response to | Re: optimizing query (Stephan Szabo <sszabo@megazone23.bigpanda.com>) |
Responses |
Re: optimizing query
Re: [PERFORM] optimizing query |
List | pgsql-general |
hi Stephan, thank you for your reply. I ran vacuum analyze before calling explain. As this is a newly built database where no rows have been deleted, yet, I thought vacuum full would have no effect. In fact, BEFORE running vacuum full, the cost of the query is estimates by explain analyze as 33 secs, and AFTER running it, the cost is estimate to be 43 secs??? (Hey, I want at least the 10 secs back ;-) ) I have just installed this database on a "bigger" (see the system info further down) machine, and I expected the queries would run _really_ fast. especially, as there is a lot more data to be inserted in the occurrences tables. This is the row count of the tables and the output of explain analyze before and after running vacuum full (after that, I listed some system and postgresql information): relate=# select count(*) from gene; count -------- 218085 (1 row) relate=# select count(*) from disease; count -------- 164597 (1 row) relate=# select count(*) from disease_occurrences_puid; count -------- 471915 (1 row) relate=# select count(*) from gene_occurrences_puid; count -------- 339347 (1 row) relate=# explain analyze SELECT DISTINCT gene.gene_name, gene_occurrences_puid.puid FROM gene, disease_occurrences_puid, gene_occurrences_puid WHERE disease_occurrences_puid.puid=gene_occurrences_puid.puid AND gene.gene_id=gene_occurrences_puid.gene_id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Unique (cost=342175.89..352511.77 rows=137812 width=33) (actual time=32112.66..33139.23 rows=219435 loops=1) -> Sort (cost=342175.89..345621.18 rows=1378118 width=33) (actual time=32112.65..32616.14 rows=695158 loops=1) Sort Key: gene.gene_name, gene_occurrences_puid.puid -> Merge Join (cost=63671.50..98237.36 rows=1378118 width=33) (actual time=10061.83..17940.02 rows=695158 loops=1) Merge Cond: ("outer".puid = "inner".puid) -> Index Scan using disease_occpd_puid_i on disease_occurrences_puid (cost=0.00..14538.05 rows=471915 width=4) (actual time=0.03..3917.99 rows=471915 loops=1) -> Sort (cost=63671.50..64519.87 rows=339347 width=29) (actual time=10061.69..10973.64 rows=815068 loops=1) Sort Key: gene_occurrences_puid.puid -> Merge Join (cost=0.00..22828.18 rows=339347 width=29) (actual time=0.21..3760.59 rows=339347 loops=1) Merge Cond: ("outer".gene_id = "inner".gene_id) -> Index Scan using gene_pkey on gene (cost=0.00..7668.59 rows=218085 width=21) (actual time=0.02..955.19 rows=218073 loops=1) -> Index Scan using gene_id_puid_uni on gene_occurrences_puid (cost=0.00..9525.57 rows=339347 width=8) (actual time=0.02..1523.81 rows=339347 loops=1) Total runtime: 33244.81 msec (13 rows) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ AFTER relate=# vacuum full verbose analyze; relate=# explain analyze SELECT DISTINCT gene.gene_name, gene_occurrences_puid.puid FROM gene, disease_occurrences_puid, gene_occurrences_puid WHERE disease_occurrences_puid.puid=gene_occurrences_puid.puid AND gene.gene_id=gene_occurrences_puid.gene_id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=359069.64..369948.41 rows=145050 width=33) (actual time=42195.60..43229.04 rows=219435 loops=1) -> Sort (cost=359069.64..362695.90 rows=1450503 width=33) (actual time=42195.59..42694.70 rows=695158 loops=1) Sort Key: gene.gene_name, gene_occurrences_puid.puid -> Merge Join (cost=63732.51..99264.24 rows=1450503 width=33) (actual time=13172.40..27973.79 rows=695158 loops=1) Merge Cond: ("outer".puid = "inner".puid) -> Index Scan using disease_occpd_puid_i on disease_occurrences_puid (cost=0.00..14543.06 rows=471915 width=4) (actual time=36.50..10916.29 rows=471915 loops=1) -> Sort (cost=63732.51..64580.88 rows=339347 width=29) (actual time=13126.56..14048.38 rows=815068 loops=1) Sort Key: gene_occurrences_puid.puid -> Merge Join (cost=0.00..22889.19 rows=339347 width=29) (actual time=58.00..6775.55 rows=339347 loops=1) Merge Cond: ("outer".gene_id = "inner".gene_id) -> Index Scan using gene_pkey on gene (cost=0.00..7739.91 rows=218085 width=21) (actual time=29.00..3416.01 rows=218073 loops=1) -> Index Scan using gene_id_puid_uni on gene_occurrences_puid (cost=0.00..9525.57 rows=339347 width=8) (actual time=28.69..1936.83 rows=339347 loops=1) Total runtime: 43338.94 msec ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Postgres Version: 7.3.1 CPU: 1666.767 MHz RAM: 2070492 kB shmmax/shmall: 1048576000 postgresql.conf: shared_buffers: 121600 max_connections: 64 max_fsm_relations = 200 max_fsm_pages = 40000 effective_cache_size = 8000 ******************************************************************** Thank you again for your interest and help! Chantal Stephan Szabo wrote: > (Replying to general and performance in a hope to move this > to performance after a couple of replies). > > On Wed, 22 Jan 2003, Chantal Ackermann wrote: > > >>I am getting the following output from EXPLAIN, concerning a query with >>joins. The merge uses index scans but takes too long, in my opinion. The >>query is in fact only a part (subquery) of another one, but it is the >>bottle neck. >> >>As I am quite ignorant in optimizing queries, and I have no idea where >>to find documentation on the net on how to learn optimizing my queries, >>I am posting this here in hope someone will give me either tips how to >>optimize, or where to find some tutorial that could help me get along on >>my own. >> >>dropping the "DISTINCT" has some effect, but I can't really do without. > > > The first thing is, have you done ANALYZE recently to make sure that the > statistics are correct and what does EXPLAIN ANALYZE give you (that will > run the query and give the estimate and actual). Also, if you haven't > vacuumed recently, you may want to vacuum full. > > How many rows are there on gene, disease and both occurrances tables? > I'd wonder if perhaps using explicit sql join syntax (which postgres uses > to constrain order) to join disease and disease_occurrences_puid before > joining it to the other two would be better or worse in practice. > >
pgsql-general by date: