Thread: Another "why am I not using the indicies?"
Here is all the information about the system and the steps I took. If you can tell me or point me in the right direction I would appreciate it. There are 624306 records in this table, so it only takes a few seconds, but this is going to grow to 5M rows/month when this works properly. Clinton James #select version(); version ---------------------------------------------------------------------- PostgreSQL 7.1rc4 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66 (1 row) #CREATE TABLE ld ( billmonth CHAR(4), cust_num CHAR(8), location CHAR(4), date TIMESTAMP, orignum CHAR(10), destnum CHAR(13), destcity CHAR(10), deststate CHAR(2), duration NUMERIC(8,1), callcode CHAR(9), cost NUMERIC(10,4) ); #copy ld from '/tmp/ld0104.txt'; #create index ld_custnum_idx ON ld(cust_num); #create index ld_orignum_idx ON ld(orignum); #vacuum analyze; # explain select * from ld where cust_num = '10102899'; NOTICE: QUERY PLAN: Seq Scan on ld (cost=0.00..20810.83 rows=8989 width=128)
"Clinton James" <cjames@callone.net> writes: > # explain select * from ld where cust_num = '10102899'; > NOTICE: QUERY PLAN: > Seq Scan on ld (cost=0.00..20810.83 rows=8989 width=128) The reason it's not using an indexscan is that it thinks it's going to find 8989 matches (and if that were true, a seqscan probably *is* the right plan). You must have some extremely-often-repeated entries in that column; what are they, and can you get rid of them? regards, tom lane
"Clinton James" <cjames@callone.net> writes: > True, using a different cust_num, it is possible there may be a couple of > cust_num where that would be true. The majority will not have even near > that many. In this case there are only 13 matching records. Is VACUUMDB > ANALYZE using the worst case (the cust_num with the greatest number of rows) > senario for this? Yes, because that's as far as the statistics go at the moment :-(. The frequency of the most common value is basically all the info the planner has about the data distribution, and so a highly skewed distribution leads to bad plan choices. I am presently working on better statistics ... to appear in 7.2 ... regards, tom lane