Home > mailing lists

speed up query with max() and odd estimates - Mailing list pgsql-performance

From	Matthew Nuzum
Subject	speed up query with max() and odd estimates
Date	April 26, 2005 17:17:03
Msg-id	f3c0b40805042613167c47c7e9@mail.gmail.com Whole thread Raw
Responses	Re: speed up query with max() and odd estimates Re: speed up query with max() and odd estimates
List	pgsql-performance

Tree view

I have this query that takes a little over 8 min to run:
select client,max(atime) as atime from usage_access where atime >=
(select atime - '1 hour'::interval from usage_access order by atime
desc limit 1) group by client;

I think it can go a lot faster. Any suggestions on improving this? DB
is 7.3.4 I think. (There is no index on client because it is very big
and this data is used infrequently.)

explain ANALYZE select client,max(atime) as atime from usage_access
where atime >= (select atime - '1 hour'::interval from usage_access
order by atime desc limit 1) group by client;

      QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=3525096.28..3620450.16 rows=1271385 width=20)
(actual time=482676.95..482693.69 rows=126 loops=1)
   InitPlan
     ->  Limit  (cost=0.00..0.59 rows=1 width=8) (actual
time=0.40..0.41 rows=1 loops=1)
           ->  Index Scan Backward using usage_access_atime on
usage_access  (cost=0.00..22657796.18 rows=38141552 width=8) (actual
time=0.39..0.40 rows=2 loops=1)
   ->  Group  (cost=3525096.28..3588665.53 rows=12713851 width=20)
(actual time=482676.81..482689.29 rows=3343 loops=1)
         ->  Sort  (cost=3525096.28..3556880.90 rows=12713851
width=20) (actual time=482676.79..482679.16 rows=3343 loops=1)
               Sort Key: client
               ->  Seq Scan on usage_access  (cost=0.00..1183396.40
rows=12713851 width=20) (actual time=482641.57..482659.18 rows=3343
loops=1)
                     Filter: (atime >= $0)
 Total runtime: 482694.65 msec


I'm starting to understand this, which is quite frightening to me. I
thought that maybe if I shrink the number of rows down I could improve
things a bit, but my first attempt didn't work. I thought I'd replace
the "from usage_access" with this query instead:
select * from usage_access where atime >= (select atime - '1
hour'::interval from usage_access order by atime desc limit 1);

      QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on usage_access  (cost=0.00..1183396.40 rows=12713851
width=116) (actual time=481796.22..481839.43 rows=3343 loops=1)
   Filter: (atime >= $0)
   InitPlan
     ->  Limit  (cost=0.00..0.59 rows=1 width=8) (actual
time=0.41..0.42 rows=1 loops=1)
           ->  Index Scan Backward using usage_access_atime on
usage_access  (cost=0.00..22657796.18 rows=38141552 width=8) (actual
time=0.40..0.41 rows=2 loops=1)
 Total runtime: 481842.47 msec

It doesn't look like this will help at all.

This table is primarily append, however I just recently deleted a few
million rows from the table, if that helps anyone.

--
Matthew Nuzum
www.bearfruit.org

pgsql-performance by date:

From: Steve Poe
Date: 26 April 2005, 14:50:16
Subject: Re: pgbench Comparison of 7.4.7 to 8.0.2

From: Simon Riggs
Date: 26 April 2005, 17:35:54
Subject: Re: [HACKERS] Bad n_distinct estimation; hacks suggested?

speed up query with max() and odd estimates - Mailing list pgsql-performance

Previous

Next