slow count in window query - Mailing list pgsql-hackers
From | Pavel Stehule |
---|---|
Subject | slow count in window query |
Date | |
Msg-id | 162867790907150318y3dccb77ep1ba6eb94742e72c8@mail.gmail.com Whole thread Raw |
Responses |
Re: slow count in window query
|
List | pgsql-hackers |
Hello, I did some test - median via window function - I found probably some bad optimised code. I found two methods - Celko and Itzik Ben-Gan. Ben-Gan methoud should to be faster - there is one sort less, but in practice - it is 2 times slower. create table x(a integer); insert into x select (random()*10000)::int from generate_series(1,10000); Celko method: postgres=# explain select avg(a) from (select a, row_number() over (order by a asc) as hi, row_number() over (order by a desc) as lo, from x) s where hi in (lo-1,lo+1); QUERY PLAN -------------------------------------------------------------------------------------------------Aggregate (cost=2144.02..2144.03rows=1 width=4) -> Subquery Scan s (cost=1643.77..2143.77 rows=100 width=4) Filter: ((s.hi= (s.lo - 1)) OR (s.hi = (s.lo + 1))) -> WindowAgg (cost=1643.77..1943.77 rows=10000 width=4) -> WindowAgg (cost=1643.77..1818.77 rows=10000 width=4) -> Sort (cost=1643.77..1668.77 rows=10000width=4) Sort Key: x.a -> WindowAgg (cost=804.39..979.39 rows=10000 width=4) -> Sort (cost=804.39..829.39 rows=10000 width=4) Sort Key: x.a -> Seq Scanon x (cost=0.00..140.00 rows=10000 width=4) (11 rows) Ben-Gan: postgres=# explain select avg(a) from (select a, row_number() over (order by a) as r, count(*) over () as rc from x ) p where r in ((rc+1)/2,(rc+2)/2) ; QUERY PLAN -------------------------------------------------------------------------------------Aggregate (cost=1354.64..1354.65 rows=1width=4) -> Subquery Scan p (cost=804.39..1354.39 rows=100 width=4) Filter: ((p.r = ((p.rc + 1) / 2)) OR(p.r = ((p.rc + 2) / 2))) -> WindowAgg (cost=804.39..1104.39 rows=10000 width=4) -> WindowAgg (cost=804.39..979.39rows=10000 width=4) -> Sort (cost=804.39..829.39 rows=10000 width=4) Sort Key: x.a -> Seq Scan on x (cost=0.00..140.00 rows=10000 width=4) (8 rows) but postgres=# select avg(a) from (select a, row_number() over (order by a) as r, count(*) over () as rc from x ) p where r in ((rc+1)/2,(rc+2)/2) ; avg -----------------------5027.0000000000000000 (1 row) Time: 179,310 ms postgres=# select avg(a) from (select a, row_number() over (order by a asc) as hi, row_number() over (order by a desc) as lo from x) s where hi in (lo-1,lo+1); avg -----------------------5027.0000000000000000 (1 row) Time: 78,791 ms When I checked diff, I found, so the problem is count() function. count(*) over () is very slow. - maybe so this is standard aggregate? Regards Pavel Stehule
pgsql-hackers by date: