Re: Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions |
Date | |
Msg-id | 20131025.152550.154326087.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Re: Using indexes for ORDER BY and PARTITION BY clause in windowing functions (Sameer Kumar <sameer.kumar@ashnik.com>) |
Responses |
Re: Re: Using indexes for ORDER BY and PARTITION BY clause
in windowing functions
|
List | pgsql-hackers |
Hello, > Agree that windowing function will return all the rows compared to max and > group by returing only max rows per group. But even while arriving at the > aggregate/sorting windowing function seems to spend more effort than group > by/order by. (I'll apologise in advance for possible misreading..) The most cause of the difference in time comes from sorting. Over 90% of total execution time has elapsed while sorting (49ms->2733ms) for the one using windowing function. If this sort were useless, the execution time would be less than 300 ms - seems comparable enough to group-by query. | Subquery Scan on __unnamed_subquery_0 | (actual time=2606.075..2953.937 rows=558 loops=1) | Filter: (__unnamed_subquery_0.rn = 1) | -> WindowAgg (actual time=2606.063..2928.061 rows=122880 loops=1) | -> Sort (actual time=2606.020..2733.677 rows=122880 loops=1) | Sort Key: student_score.course, student_score.score | -> Seq Scan on student_score | (actual time=0.009..49.026 rows=122880 loops=1) As you see in above plan, sorting key is (course, score). If your point is the overall performance but not reusing a kind of 'hash', there's a big chance to eliminate this sorting if you are able to have an additional index, say, =# create index idx_co_sc on student_score using btree (course, score); With this index, you will get a different plan like this, > uniontest=# explain analyze select student_name from (select student_name, dense_rank() over(partition by course orderby score) rn, score from student_score) rnn where rn=2; > QUERY PLAN > ------------------------------------------------------------------------------- > Subquery Scan on rnn (actual time=0.088..319.403 rows=135 loops=1) > Filter: (rnn.rn = 2) > Rows Removed by Filter: 122746 > -> WindowAgg (actual time=0.037..296.851 rows=122881 loops=1) > -> Index Scan using idx_co_sc on student_score > (actual time=0.027..111.333 rows=122881 loops=1) > Total runtime: 319.483 ms Does this satisfies your needs? ======= > Another thing, (I may be stupid and naive here) does PostgreSQL > re-uses the hash which has been already created for sort. In > this case the inner query must have created a hash for windoing > aggregate. Can't we use that same one while applying the the > filter "rn=1" ? Generally saying, hashes cannot yield ordered output by its nature, I believe. Windowing function (execnode) always receives tuples sequentially in the window-defined order (as you see in the explained plan above) then processes the tuples in semi tuple-by-tuple manner to perform per-frame aggregaion, and finally outputs tuples of the same number to input. And furthermore, dense_rank() doesn't even need per-frame aggregations. So none of the planners so far seems to have chance to use a kind of hash tables to culculate/execute windowing fucntions. On the another point, automatically preserving some internal data within a query beyond the end of the query brings in 'when to discard it?' problem. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: