Re: Optimization with dates - Mailing list pgsql-sql
From | Jean-Christophe Boggio |
---|---|
Subject | Re: Optimization with dates |
Date | |
Msg-id | 868508534.20011114130251@thefreecat.org Whole thread Raw |
In response to | Re: Optimization with dates (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Optimization with dates
|
List | pgsql-sql |
Hello, Thanks for all the answers. A little feedback : >> I have tried : >> where dategain>=(select now()-30); >> and many other, syntactically absurd :-) TL> dategain >= ago(30) TL> where "ago" is a function that computes "date(now()) - n" and is TL> marked "iscachable". create function ago(interval) returns timestamp as ' select now() - $1 ' language 'sql' with (iscachable); explain select count(*) from gains where dategain>=ago('30 0:00'); Aggregate (cost=180640.90..180640.90 rows=1 width=0) -> Seq Scan on gains (cost=0.00..179761.71 rows=351676 width=0) ===== explain select count(*) from gains where dategain>=ago('5 days'); Aggregate (cost=172340.65..172340.65 rows=1 width=0) -> Index Scan using ix_gains_dategain on gains (cost=0.00..172202.94rows=55084 width=0) ===== explain select count(*) from gains where dategain>=ago('6 days'); Aggregate (cost=179929.06..179929.06 rows=1 width=0) -> Seq Scan on gains (cost=0.00..179761.71 rows=66940 width=0) TL> Just out of curiosity, do the indexed timestamps correlate closely to TL> the physical order of the table? I'd expect that to happen if you TL> are timestamping records by insertion time and there are few or no TL> updates. That's right, there are very few updates. =========================================== Now, for Jason's idea : set enable_seqscan to off; SET VARIABLE explain select count(*) from gains where dategain>=now()-30; Aggregate (cost=100256770.86..100256770.86 rows=1 width=0) -> Seq Scan on gains (cost=100000000.00..100250847.08 rows=2369512width=0) Strange isn't it ? Is it possible to do the equivalent of "set enable_seqscan to off" out of psql (in php or perl code) ? =========================================== To answer Stephan and Josh : SS> Is 2367640 a reasonable estimate for the number of SS> rows that match the condition? JB> Hmmm... if the number of rows is actually accurate (2M out of 10M in the JB> last 30 days) then a Seq Scan seems like a good plan to me. If the JB> numbers aren't accurate, it's time to run a VACUUM ANALYZE. select avg(cnt) from (select count(*) as cnt from gains group bydate(dategain)) as foo; avg ------------------12009.6131756757 If I did it right, this should be the average number of rows per day. The db exists since April 1st 2000. select date('now')-date('2000-04-01'); 592 select 592*12009; 7109328 select count(*) from gains; count ---------7109753 As you see, dategain is *quite* linear ! So to answer your question, a reasonable estimate for the number of rows that match the condition is : select 30*12009; 360270 The real answer is : select count(*) from gains where dategain>=now()-30;231781 SS> Have you run vacuum analyze? Every night (and it's a VEERRYYYY long process, even dropping the indexes before and recreating them afterwards, maybe that's the real problem ?) Keeping the index makes the VACUUM process several hours. We'll try 7.2 which should solve part of this problem but since these are production systems, we wait a little feedback from 7.2 users. SS> If the estimate is right, you'll probably find that SS> the sequence scan is actually faster than an index SS> scan since about 1/4 of the table is being selected. It should select 1/592 of the table ! Any further advises VERY appreciated. Thanks again everyone for your help. -- Jean-Christophe Boggio cat@thefreecat.org -o) Independant Consultant and Developer /\\ Delphi, Linux, Perl, PostgreSQL, Debian _\_V