Re: Need help optimizing this query - Mailing list pgsql-general
From | Pat Maddox |
---|---|
Subject | Re: Need help optimizing this query |
Date | |
Msg-id | 810a540e0707181534w267ae41au570ed0ca1a4edf84@mail.gmail.com Whole thread Raw |
In response to | Re: Need help optimizing this query (Michael Glaesemann <grzm@seespotcode.net>) |
Responses |
Re: Need help optimizing this query
|
List | pgsql-general |
On 7/18/07, Michael Glaesemann <grzm@seespotcode.net> wrote: > > On Jul 18, 2007, at 16:48 , Pat Maddox wrote: > > > For some reason the functions you wrote are giving me trouble (there's > > a BIGINT involved, I tried changing the functions around but kept > > having issues). > > You might try these, if you're interested. > > CREATE OR REPLACE FUNCTION ternary(BOOLEAN, BIGINT, BIGINT) > RETURNS BIGINT > LANGUAGE SQL AS $_$SELECT CASE WHEN $1 THEN $2 ELSE $3 END$_$; > > CREATE OR REPLACE FUNCTION value_when(BOOLEAN, BIGINT) > RETURNS BIGINT > LANGUAGE SQL AS $_$SELECT ternary($1,$2,0)$_$; > > CREATE OR REPLACE FUNCTION one_when(BOOLEAN) > RETURNS BIGINT > LANGUAGE SQL as $_$SELECT value_when($1,1)$_$; > > > So here's the full query, hopefully formatted better: > > I'm still curious about why the planner is choosing a hash join over > using the indexes on the foreign keys, but that might be because the > tables are relatively small. > > > That one runs reasonably fine, because there are only 20 videos being > > returned and a handful of video views associated with them. In the > > real query there are about 1k videos and a couple million views. That > > took about 80 minutes to run, according to logs. > > The planner will choose different plans based on, among other things, > what it estimates the size of the result to be, so while looking at a > small example query might seem like a way to go about looking at > what's going on, it's most likely not going to give you an accurate > representation of the situation. Are you looking at two different > systems (e.g., a development system versus a production system) or > just choosing a smaller query on the same system? If you can't run > the query on your production system, you may want to take a dump of > the production system and set it up on another box. Even with a > couple million rows in the video_views table, PostgreSQL shouldn't > really blink too much, as long as the server is tuned properly, the > hardware is adequate, and the database statistics are up to date. > > Michael Glaesemann > grzm seespotcode net > > > > Sorry, I mentioned that it took 90 seconds to run the query but I didn't show that EXPLAIN ANALYZE output. Here it is, same query just with a different company_id: Limit (cost=879283.07..879283.32 rows=100 width=64) (actual time=92486.858..92486.891 rows=100 loops=1) -> Sort (cost=879283.07..879297.15 rows=5632 width=64) (actual time=92486.856..92486.867 rows=100 loops=1) Sort Key: sum(CASE WHEN ((video_views.hit IS TRUE) AND (video_views.logged_in IS FALSE)) THEN 1 ELSE 0 END) -> GroupAggregate (cost=803054.95..878932.21 rows=5632 width=64) (actual time=67145.471..92484.408 rows=730 loops=1) -> Sort (cost=803054.95..809363.98 rows=2523610 width=64) (actual time=67076.407..75441.274 rows=5799447 loops=1) Sort Key: videos.id, videos.title, videos.guid -> Hash Join (cost=1220.63..237115.16 rows=2523610 width=64) (actual time=31.230..11507.406 rows=5799447 loops=1) Hash Cond: (video_views.asset_id = assets.id) -> Hash Join (cost=535.62..179627.88 rows=2525294 width=60) (actual time=13.286..7621.950 rows=5799447 loops=1) Hash Cond: (video_views.video_id = videos.id) -> Seq Scan on video_views (cost=0.00..101352.70 rows=5998470 width=12) (actual time=0.023..2840.718 rows=5998341 loops=1) -> Hash (cost=465.23..465.23 rows=5632 width=52) (actual time=13.216..13.216 rows=5712 loops=1) -> Seq Scan on videos (cost=0.00..465.23 rows=5632 width=52) (actual time=0.038..9.060 rows=5712 loops=1) Filter: (company_id = 11) -> Hash (cost=487.78..487.78 rows=15778 width=12) (actual time=17.876..17.876 rows=15778 loops=1) -> Seq Scan on assets (cost=0.00..487.78 rows=15778 width=12) (actual time=0.032..10.880 rows=15778 loops=1) Total runtime: 92548.006 ms (17 rows)
pgsql-general by date: