Home > mailing lists

Re: Optimization with dates - Mailing list pgsql-sql

From	Jean-Christophe Boggio
Subject	Re: Optimization with dates
Date	November 14, 2001 07:10:12
Msg-id	868508534.20011114130251@thefreecat.org Whole thread Raw
In response to	Re: Optimization with dates (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Optimization with dates
List	pgsql-sql

Tree view

Hello,

Thanks for all the answers. A little feedback :

>> I have tried :
>>   where dategain>=(select now()-30);
>>   and many other, syntactically absurd :-)

TL>         dategain >= ago(30)
TL> where "ago" is a function that computes "date(now()) - n" and is
TL> marked "iscachable".

create function ago(interval) returns timestamp as ' select now() - $1
' language 'sql' with (iscachable);

explain select count(*) from gains where dategain>=ago('30 0:00');

Aggregate  (cost=180640.90..180640.90 rows=1 width=0) ->  Seq Scan on gains  (cost=0.00..179761.71 rows=351676
width=0)

===== 
explain select count(*) from gains where dategain>=ago('5 days');

Aggregate  (cost=172340.65..172340.65 rows=1 width=0) ->  Index Scan using ix_gains_dategain on gains
(cost=0.00..172202.94rows=55084 width=0)
 

===== 
explain select count(*) from gains where dategain>=ago('6 days');

Aggregate  (cost=179929.06..179929.06 rows=1 width=0) ->  Seq Scan on gains  (cost=0.00..179761.71 rows=66940 width=0)


TL> Just out of curiosity, do the indexed timestamps correlate closely to
TL> the physical order of the table?  I'd expect that to happen if you
TL> are timestamping records by insertion time and there are few or no
TL> updates.

That's right, there are very few updates. 
===========================================
Now, for Jason's idea :

set enable_seqscan to off;
SET VARIABLE

explain select count(*) from gains where dategain>=now()-30;

Aggregate  (cost=100256770.86..100256770.86 rows=1 width=0) ->  Seq Scan on gains  (cost=100000000.00..100250847.08
rows=2369512width=0)
 

Strange isn't it ?

Is it possible to do the equivalent of "set enable_seqscan to off"
out of psql (in php or perl code) ?

===========================================

To answer Stephan and Josh :

SS> Is 2367640 a reasonable estimate for the number of
SS> rows that match the condition?

JB> Hmmm... if the number of rows is actually accurate (2M out of 10M in the
JB> last 30 days) then a Seq Scan seems like a good plan to me.  If the
JB> numbers aren't accurate, it's time to run a VACUUM ANALYZE.

select avg(cnt) from (select count(*) as cnt from gains group bydate(dategain)) as foo;      avg
------------------12009.6131756757

If I did it right, this should be the average number of rows per day.
The db exists since April 1st 2000.

select date('now')-date('2000-04-01');     592

select 592*12009; 7109328

select count(*) from gains; count
---------7109753

As you see, dategain is *quite* linear !

So to answer your question, a reasonable estimate for the number of
rows that match the condition is :

select 30*12009;  360270

The real answer is :

select count(*) from gains where dategain>=now()-30;231781  
SS> Have you run vacuum analyze?

Every night (and it's a VEERRYYYY long process, even dropping the
indexes before and recreating them afterwards, maybe that's the real
problem ?) Keeping the index makes the VACUUM process several hours.
We'll try 7.2 which should solve part of this problem but since these
are production systems, we wait a little feedback from 7.2 users.

SS> If the estimate is right, you'll probably find that
SS> the sequence scan is actually faster than an index
SS> scan since about 1/4 of the table is being selected.

It should select 1/592 of the table !



Any further advises VERY appreciated. Thanks again everyone for your
help.

--
Jean-Christophe Boggio                       
cat@thefreecat.org                           -o)
Independant Consultant and Developer         /\\
Delphi, Linux, Perl, PostgreSQL, Debian     _\_V

pgsql-sql by date:

From: Carl van Tast
Date: 14 November 2001, 04:50:10
Subject: Re: Is this a bug?

From: Haller Christoph
Date: 14 November 2001, 07:29:56
Subject: ago()

Re: Optimization with dates - Mailing list pgsql-sql

Previous

Next