Home > mailing lists

Why percent_rank is so slower than rank? - Mailing list pgsql-hackers

From	Jie Li
Subject	Why percent_rank is so slower than rank?
Date	December 9, 2010 10:42:51
Msg-id	AANLkTi=Jxs_mG4kJuxEWLdKvwREHYBeSA40AYnXqYhtq@mail.gmail.com Whole thread Raw
Responses	Re: Why percent_rank is so slower than rank?
List	pgsql-hackers

Tree view

Hi all, <br /><br />I'm new to window functions. Recently I run some simple queries but surprised to find percent_rank
isso slower than rank, could anybody tell me why?<br /><br />The table schema:<br />test=# \d inventory1<br />         
Table"public.inventory1"<br />        Column        |  Type   | Modifiers <br
/>----------------------+---------+-----------<br/> inv_date_sk          | integer | not null<br
/> inv_item_sk         | integer | not null<br />  inv_warehouse_sk     | integer | not null<br /> inv_quantity_on_hand
|integer | <br /><br />test=# \dt+ inventory1<br />                       List of relations<br /> Schema |    Name    |
Type |  Owner   |  Size   | Description <br /> --------+------------+-------+----------+---------+-------------<br
/> public| inventory1 | table | workshop | 8880 kB | <br /><br />The rank query result:<br />test=# explain analyze
selectinv_date_sk,inv_item_sk, rank()over(partition by inv_date_sk order by inv_item_sk) from inventory1;<br />
                                                         QUERY
PLAN                                                          <br
/>-------------------------------------------------------------------------------------------------------------------------------<br
/> WindowAgg  (cost=19563.99..23343.99 rows=189000 width=8) (actual time=631.947..1361.158 rows=189000 loops=1)<br />  
-> Sort  (cost=19563.99..20036.49 rows=189000 width=8) (actual time=631.924..771.990 rows=189000 loops=1)<br />
        Sort Key: inv_date_sk, inv_item_sk<br />         Sort Method:  quicksort  Memory: 12218kB<br />         -> 
SeqScan on inventory1  (cost=0.00..3000.00 rows=189000 width=8) (actual time=0.055..198.948 rows=189000 loops=1)<br />
 Totalruntime: 1500.193 ms<br />(6 rows)<br /><br />The percent_rank result:<br />test=# explain analyze select
inv_date_sk,inv_item_sk,percent_rank()over(partition by inv_date_sk order by inv_item_sk) from inventory1;<br
/>                                                         QUERY
PLAN                                                          <br />
-------------------------------------------------------------------------------------------------------------------------------<br
/> WindowAgg (cost=19563.99..23343.99 rows=189000 width=8) (actual time=766.432..32924.804 rows=189000 loops=1)<br />
  ->  Sort  (cost=19563.99..20036.49 rows=189000 width=8) (actual time=756.320..905.407 rows=189000 loops=1)<br
/>        Sort Key: inv_date_sk, inv_item_sk<br />         Sort Method:  quicksort  Memory: 12218kB<br />        
-> Seq Scan on inventory1  (cost=0.00..3000.00 rows=189000 width=8) (actual time=0.102..224.607 rows=189000
loops=1)<br/>  Total runtime: 33152.188 ms<br />(6 rows)<br /><br />One special thing is that all the values of the
partitionkey(inv_date_sk) are the same, that is, there is only one window partition. I find that percent_rank needs to
bufferall the tuples to get the total number of rows. But why is it so expensive?<br /><br />I use 8.4.4. And I only
increasethe work_mem to 100M and leave other parameters untouched. <br /><br />Thanks,<br />Li Jie<br />

pgsql-hackers by date:

From: Fujii Masao
Date: 09 December 2010, 09:13:11
Subject: PS display and standby query conflict

From: Tom Lane
Date: 09 December 2010, 10:50:43
Subject: Re: [PERFORM] Slow BLOBs restoring

Why percent_rank is so slower than rank? - Mailing list pgsql-hackers

Previous

Next