Re: Slow query with a lot of data - Mailing list pgsql-performance
From | Moritz Onken |
---|---|
Subject | Re: Slow query with a lot of data |
Date | |
Msg-id | 380CD097-4A33-4134-8C13-B9BF34EC6C63@houseofdesign.de Whole thread Raw |
In response to | Re: Slow query with a lot of data (Matthew Wakeling <matthew@flymine.org>) |
Responses |
Re: Slow query with a lot of data
Re: Slow query with a lot of data |
List | pgsql-performance |
Am 19.08.2008 um 14:17 schrieb Matthew Wakeling: > On Tue, 19 Aug 2008, Moritz Onken wrote: >> tablename | attname | n_distinct | correlation >> result | domain | 1642 | 1 > > Well, the important thing is the correlation, which is 1, indicating > that Postgres knows that the table is clustered. So I have no idea > why it is sorting the entire table. > > What happens when you run EXPLAIN SELECT * FROM result ORDER BY > domain? > "Index Scan using result_domain_idx on result (cost=0.00..748720.72 rows=20306816 width=49)" ... as it should be. >>> Sounds like an awfully long time to me. Also, I think restricting >>> it to 280 users is probably not making it any faster. >> >> If I hadn't restricted it to 280 users it would have run ~350days... > > What makes you say that? Perhaps you could post EXPLAINs of both of > the queries. > > Matthew That was just a guess. The query needs to retrieve the data for about 50,000 users. But it should be fast if I don't retrieve the data for specific users but let in run through all rows. explain insert into setup1 (select a."user", b.category, sum(1.0/b.cat_count)::float from result a, domain_categories b where a."domain" = b."domain" and b.depth < 4 and a.results > 100 and a."user" < 30000 group by a."user", b.category); "GroupAggregate (cost=11745105.66..12277396.81 rows=28704 width=12)" " -> Sort (cost=11745105.66..11878034.93 rows=53171707 width=12)" " Sort Key: a."user", b.category" " -> Merge Join (cost=149241.25..1287278.89 rows=53171707 width=12)" " Merge Cond: (b.domain = a.domain)" " -> Index Scan using domain_categories_domain on domain_categories b (cost=0.00..421716.32 rows=5112568 width=12)" " Filter: (depth < 4)" " -> Materialize (cost=148954.16..149446.36 rows=39376 width=8)" " -> Sort (cost=148954.16..149052.60 rows=39376 width=8)" " Sort Key: a.domain" " -> Bitmap Heap Scan on result a (cost=1249.93..145409.79 rows=39376 width=8)" " Recheck Cond: ("user" < 30000)" " Filter: (results > 100)" " -> Bitmap Index Scan on result_user_idx (cost=0.00..1240.08 rows=66881 width=0)" " Index Cond: ("user" < 30000)" This query limits the number of users to 215 and this query took about 50 minutes. I could create to temp tables which have only those records which I need for this query. Would this be a good idea? moritz
pgsql-performance by date: