Re: Slow query with joins - Mailing list pgsql-performance
From | Bendik Rognlien Johansen |
---|---|
Subject | Re: Slow query with joins |
Date | |
Msg-id | 850AD6F9-08BD-4EF2-A549-407FD87C7974@gmail.com Whole thread Raw |
In response to | Re: Slow query with joins ("Jim C. Nasby" <jnasby@pervasive.com>) |
Responses |
Re: Slow query with joins
|
List | pgsql-performance |
The sort is definitively the culprit. When I removed it the query was instant. I tried setting work_mem = 131072 but it did not seem to help. I really don't understand this :-( Any other ideas? Thanks! On Jan 11, 2006, at 9:23 PM, Jim C. Nasby wrote: > I'd try figuring out if the join is the culprit or the sort is (by > dropping the ORDER BY). work_mem is probably forcing the sort to spill > to disk, and if your drives are rather busy... > > You might also get a win if you re-order the joins to people, > contacts, > addresses, if you know it will have the same result. > > In this case LIMIT won't have any real effect, because you have to go > all the way through with the ORDER BY anyway. > > On Wed, Jan 11, 2006 at 08:55:32PM +0100, Bendik Rognlien Johansen > wrote: >> Yes, the rowcount estimates are real, however, it has been a long >> time since the last VACUUM FULL (there is never a good time). >> >> I have clustered the tables, reindexed, analyzed, vacuumed and the >> plan now looks like this: >> >> >> no_people=# explain SELECT r.id AS r_id, r.firstname || ' ' || >> r.lastname AS r_name, ad.id AS ad_id, ad.type AS ad_type, ad.address >> AS ad_address, ad.postalcode AS ad_postalcode, ad.postalsite AS >> ad_postalsite, ad.priority AS ad_priority, ad.position[0] AS ad_lat, >> ad.position[1] AS ad_lon, ad.uncertainty AS ad_uncertainty, ad.extra >> AS ad_extra, ad.deleted AS ad_deleted, co.id AS co_id, co.type AS >> co_type, co.value AS co_value, co.description AS co_description, >> co.priority AS co_priority, co.visible AS co_visible, co.searchable >> AS co_searchable, co.deleted AS co_deleted FROM people r LEFT OUTER >> JOIN addresses ad ON(r.id = ad.record) LEFT OUTER JOIN contacts co ON >> (r.id = co.record) WHERE NOT r.deleted AND r.original IS NULL ORDER >> BY r.id; >> QUERY PLAN >> --------------------------------------------------------------------- >> --- >> -------------------------------------------------- >> Sort (cost=182866.49..182943.12 rows=30655 width=587) >> Sort Key: r.id >> -> Nested Loop Left Join (cost=0.00..170552.10 rows=30655 >> width=587) >> -> Nested Loop Left Join (cost=0.00..75054.96 rows=26325 >> width=160) >> -> Index Scan using people_deleted_original_is_null >> on people r (cost=0.00..1045.47 rows=23861 width=27) >> Filter: ((NOT deleted) AND (original IS NULL)) >> -> Index Scan using addresses_record_idx on >> addresses ad (cost=0.00..3.05 rows=4 width=137) >> Index Cond: ("outer".id = ad.record) >> -> Index Scan using contacts_record_idx on contacts co >> (cost=0.00..3.32 rows=24 width=431) >> Index Cond: ("outer".id = co.record) >> (10 rows) >> >> >> >> >> >> >> Looks faster, but still very slow. I added limit 1000 and it has been >> running for about 25 minutes now with no output. top shows: >> >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 29994 postgres 18 0 95768 78m 68m R 17.0 7.7 0:53.27 >> postmaster >> >> >> >> which is unusual, I usually get 99.9 %cpu for just about any query, >> which leads me to believe this is disk related. >> >> >> >> postgresql.conf: >> shared_buffers = 8192 >> work_mem = 8192 >> maintenance_work_mem = 524288 >> >> >> >> >> Hardware 2x2.8GHz cpu >> 1GB ram >> >> Could this be an issue related to lack of VACUUM FULL? The tables get >> a lot of updates. >> >> >> Thank you very much so far! >> >> >> >> >> On Jan 11, 2006, at 4:45 PM, Tom Lane wrote: >> >>> Bendik Rognlien Johansen <bendik.johansen@gmail.com> writes: >>>> Has anyone got any tips for speeding up this query? It currently >>>> takes hours to start. >>> >>> Are the rowcount estimates close to reality? The plan doesn't look >>> unreasonable to me if they are. It might help to increase work_mem >>> to ensure that the hash tables don't spill to disk. >>> >>> Indexes: >>> "people_original_is_null" btree (original) WHERE original IS >>> NULL >>> >>> This index seems poorly designed: the actual index entries are dead >>> weight since all of them are necessarily NULL. You might as well >>> make >>> the index carry something that you frequently test in conjunction >>> with >>> "original IS NULL". For instance, if this particular query is a >>> common >>> case, you could replace this index with >>> >>> CREATE INDEX people_deleted_original_is_null ON people(deleted) >>> WHERE original IS NULL; >>> >>> This index is still perfectly usable for queries that only say >>> "original >>> IS NULL", but it can also filter out rows with the wrong value of >>> deleted. Now, if there are hardly any rows with deleted = true, >>> maybe >>> this won't help much for your problem. But in any case you ought to >>> consider whether you can make the index entries do something useful. >>> >>> regards, tom lane >> >> >> ---------------------------(end of >> broadcast)--------------------------- >> TIP 5: don't forget to increase your free space map settings >> > > -- > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > Pervasive Software http://pervasive.com work: 512-231-6117 > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
pgsql-performance by date: