Why percent_rank is so slower than rank? - Mailing list pgsql-hackers
From | Jie Li |
---|---|
Subject | Why percent_rank is so slower than rank? |
Date | |
Msg-id | AANLkTi=Jxs_mG4kJuxEWLdKvwREHYBeSA40AYnXqYhtq@mail.gmail.com Whole thread Raw |
Responses |
Re: Why percent_rank is so slower than rank?
|
List | pgsql-hackers |
Hi all, <br /><br />I'm new to window functions. Recently I run some simple queries but surprised to find percent_rank isso slower than rank, could anybody tell me why?<br /><br />The table schema:<br />test=# \d inventory1<br /> Table"public.inventory1"<br /> Column | Type | Modifiers <br />----------------------+---------+-----------<br/> inv_date_sk | integer | not null<br /> inv_item_sk | integer | not null<br /> inv_warehouse_sk | integer | not null<br /> inv_quantity_on_hand |integer | <br /><br />test=# \dt+ inventory1<br /> List of relations<br /> Schema | Name | Type | Owner | Size | Description <br /> --------+------------+-------+----------+---------+-------------<br /> public| inventory1 | table | workshop | 8880 kB | <br /><br />The rank query result:<br />test=# explain analyze selectinv_date_sk,inv_item_sk, rank()over(partition by inv_date_sk order by inv_item_sk) from inventory1;<br /> QUERY PLAN <br />-------------------------------------------------------------------------------------------------------------------------------<br /> WindowAgg (cost=19563.99..23343.99 rows=189000 width=8) (actual time=631.947..1361.158 rows=189000 loops=1)<br /> -> Sort (cost=19563.99..20036.49 rows=189000 width=8) (actual time=631.924..771.990 rows=189000 loops=1)<br /> Sort Key: inv_date_sk, inv_item_sk<br /> Sort Method: quicksort Memory: 12218kB<br /> -> SeqScan on inventory1 (cost=0.00..3000.00 rows=189000 width=8) (actual time=0.055..198.948 rows=189000 loops=1)<br /> Totalruntime: 1500.193 ms<br />(6 rows)<br /><br />The percent_rank result:<br />test=# explain analyze select inv_date_sk,inv_item_sk,percent_rank()over(partition by inv_date_sk order by inv_item_sk) from inventory1;<br /> QUERY PLAN <br /> -------------------------------------------------------------------------------------------------------------------------------<br /> WindowAgg (cost=19563.99..23343.99 rows=189000 width=8) (actual time=766.432..32924.804 rows=189000 loops=1)<br /> -> Sort (cost=19563.99..20036.49 rows=189000 width=8) (actual time=756.320..905.407 rows=189000 loops=1)<br /> Sort Key: inv_date_sk, inv_item_sk<br /> Sort Method: quicksort Memory: 12218kB<br /> -> Seq Scan on inventory1 (cost=0.00..3000.00 rows=189000 width=8) (actual time=0.102..224.607 rows=189000 loops=1)<br/> Total runtime: 33152.188 ms<br />(6 rows)<br /><br />One special thing is that all the values of the partitionkey(inv_date_sk) are the same, that is, there is only one window partition. I find that percent_rank needs to bufferall the tuples to get the total number of rows. But why is it so expensive?<br /><br />I use 8.4.4. And I only increasethe work_mem to 100M and leave other parameters untouched. <br /><br />Thanks,<br />Li Jie<br />
pgsql-hackers by date: