Re: INDEX suggestion needed - Mailing list pgsql-general
From | Thomas Beutin |
---|---|
Subject | Re: INDEX suggestion needed |
Date | |
Msg-id | 20021217144058.C13038@laokoon.bug.net Whole thread Raw |
In response to | Re: INDEX suggestion needed (Manfred Koizar <mkoi-pg@aon.at>) |
Responses |
Re: INDEX suggestion needed
|
List | pgsql-general |
Hi, now i'm back on track; had some trouble with the tax at the and of the year, and this consumes lot of time, most waiting rigth bevor a boring office :-( On Fri, Dec 13, 2002 at 06:37:59PM +0100, Manfred Koizar wrote: > On Fri, 13 Dec 2002 13:00:14 -0300, Alvaro Herrera > <alvherre@dcc.uchile.cl> wrote: > >Now this catched my attention (in the questions' side, sorry, not the > >answers'). Why the aggregate takes 10 times the time needed for the > >indexscan? > > Good point! > > > One would think that a function like count() should be > >pretty cheap, > > COUNT is cheap. But COUNT(DISTINCT something) is not trivial, it has > to keep a list of all values it has already counted. I didn't look at > the implementation. Do we have O(n^2) cost here? > > Thomas, could you EXPLAIN ANALYZE some test cases with > > SELECT COUNT(*) FROM ( > SELECT DISTINCT a_id > FROM stat_pages > WHERE ... > ) AS x; > > and compare them to the results of SELECT COUNT(DISTINCT ...)? Here we are: EXPLAIN ANALYZE SELECT COUNT(DISTINCT a_id) FROM stat_pages WHERE m_id = 35::smallint AND (visit >= '2002-07-01' AND visit<= '2002-07-31'); NOTICE: QUERY PLAN: Aggregate (cost=3.21..3.21 rows=1 width=34) (actual time=8.67..8.67 rows=1 loops=1) -> Index Scan using tb5 on stat_pages (cost=0.00..3.21 rows=1 width=34) (actual time=0.08..2.00 rows=164 loops=1) Total runtime: 8.86 msec EXPLAIN ANALYZE SELECT COUNT(a_id) FROM ( SELECT DISTINCT a_id FROM stat_pages WHERE m_id = 35::smallint AND visit >= '2002-07-01'AND visit <= '2002-07-31' ) AS foo; NOTICE: QUERY PLAN: Aggregate (cost=3.22..3.22 rows=1 width=34) (actual time=9.42..9.42 rows=1 loops=1) -> Subquery Scan foo (cost=3.22..3.22 rows=1 width=34) (actual time=7.25..9.39 rows=8 loops=1) -> Unique (cost=3.22..3.22 rows=1 width=34) (actual time=7.24..9.34 rows=8 loops=1) -> Sort (cost=3.22..3.22 rows=1 width=34) (actual time=7.24..7.66 rows=164 loops=1) -> Index Scan using tb5 on stat_pages (cost=0.00..3.21 rows=1 width=34) (actual time=0.08..1.92 rows=164loops=1) Total runtime: 9.67 msec EXPLAIN ANALYZE SELECT COUNT(DISTINCT a_id) FROM stat_pages WHERE m_id = 35::smallint AND (visit >= '2002-08-01' AND visit<= '2002-08-31'); NOTICE: QUERY PLAN: Aggregate (cost=14103.16..14103.16 rows=1 width=34) (actual time=22864.95..22864.95 rows=1 loops=1) -> Seq Scan on stat_pages (cost=0.00..13821.19 rows=112784 width=34) (actual time=3664.51..5145.32 rows=113612 loops=1) Total runtime: 22889.60 msec EXPLAIN ANALYZE SELECT COUNT(a_id) FROM ( SELECT DISTINCT a_id FROM stat_pages WHERE m_id = 35::smallint AND visit >= '2002-08-01'AND visit <= '2002-08-31' ) AS foo; NOTICE: QUERY PLAN: Aggregate (cost=28034.99..28034.99 rows=1 width=34) (actual time=25055.61..25055.61 rows=1 loops=1) -> Subquery Scan foo (cost=27724.83..28006.79 rows=11278 width=34) (actual time=23467.83..25053.67 rows=555 loops=1) -> Unique (cost=27724.83..28006.79 rows=11278 width=34) (actual time=23467.81..25050.34 rows=555 loops=1) -> Sort (cost=27724.83..27724.83 rows=112784 width=34) (actual time=23467.80..23798.54 rows=113612 loops=1) -> Seq Scan on stat_pages (cost=0.00..13821.19 rows=112784 width=34) (actual time=3936.11..5426.79rows=113612 loops=1) Total runtime: 25726.64 msec EXPLAIN ANALYZE SELECT COUNT(DISTINCT a_id) FROM stat_pages WHERE m_id = 35::smallint AND (visit >= '2002-09-01' AND visit<= '2002-09-30'); NOTICE: QUERY PLAN: Aggregate (cost=14293.93..14293.93 rows=1 width=34) (actual time=37803.08..37803.08 rows=1 loops=1) -> Seq Scan on stat_pages (cost=0.00..13821.19 rows=189093 width=34) (actual time=461.37..5648.66 rows=195265 loops=1) Total runtime: 37803.25 msec EXPLAIN ANALYZE SELECT COUNT(a_id) FROM ( SELECT DISTINCT a_id FROM stat_pages WHERE m_id = 35::smallint AND visit >= '2002-09-01'AND visit <= '2002-09-30' ) AS foo; NOTICE: QUERY PLAN: Aggregate (cost=40763.98..40763.98 rows=1 width=34) (actual time=42325.52..42325.52 rows=1 loops=1) -> Subquery Scan foo (cost=40243.98..40716.71 rows=18909 width=34) (actual time=39487.50..42323.18 rows=702 loops=1) -> Unique (cost=40243.98..40716.71 rows=18909 width=34) (actual time=39487.48..42319.02 rows=702 loops=1) -> Sort (cost=40243.98..40243.98 rows=189093 width=34) (actual time=39487.47..40079.07 rows=195265 loops=1) -> Seq Scan on stat_pages (cost=0.00..13821.19 rows=189093 width=34) (actual time=624.73..5814.48 rows=195265loops=1) Total runtime: 44169.09 msec EXPLAIN ANALYZE SELECT COUNT(DISTINCT a_id) FROM stat_pages WHERE m_id = 35::smallint AND (visit >= '2002-10-01' AND visit<= '2002-10-31'); NOTICE: QUERY PLAN: Aggregate (cost=8792.18..8792.18 rows=1 width=34) (actual time=5419.84..5419.84 rows=1 loops=1) -> Index Scan using tb5 on stat_pages (cost=0.00..8708.74 rows=33376 width=34) (actual time=62.75..1520.06 rows=29937loops=1) Total runtime: 5420.06 msec EXPLAIN ANALYZE SELECT COUNT(a_id) FROM ( SELECT DISTINCT a_id FROM stat_pages WHERE m_id = 35::smallint AND visit >= '2002-10-01'AND visit <= '2002-10-31' ) AS foo; NOTICE: QUERY PLAN: Aggregate (cost=11812.87..11812.87 rows=1 width=34) (actual time=4755.42..4755.42 rows=1 loops=1) -> Subquery Scan foo (cost=11721.09..11804.53 rows=3338 width=34) (actual time=4325.68..4753.70 rows=662 loops=1) -> Unique (cost=11721.09..11804.53 rows=3338 width=34) (actual time=4325.66..4750.65 rows=662 loops=1) -> Sort (cost=11721.09..11721.09 rows=33376 width=34) (actual time=4325.64..4401.14 rows=29937 loops=1) -> Index Scan using tb5 on stat_pages (cost=0.00..8708.74 rows=33376 width=34) (actual time=0.21..473.04rows=29937 loops=1) Total runtime: 5242.39 msec > So now you are back where you started. At least you have an index on > "visit" now ;-) Yes, but this is on the production system not used (pgversion 7.2.1), so i'm looking for a low usage time slot to upgrade the system to at least 7.2.3. Then i can start playing with the indicees. I think, i got an idea about the usage of indicees. Thanks for Your help! Greetings, -tb -- Thomas Beutin tb@laokoon.IN-Berlin.DE Beam me up, Scotty. There is no intelligent live down in Redmond.
pgsql-general by date: