Thread: Not using index
Hi, I have a table with about 100K rows, on which I have created a btree index of the type table_name(int, int, int, timestamp). At first postgres was using it for my AND query on all four columns, but after dropping it and creating different ones and testing, it suddenly stopped using it. Vaccuuming, reindexing, recreating the table and even recreating the database all didn't help. Then I discovered "set enable_seqscan to off". It started using the index again. This is certainly not a case of it not being beneficial to use the index; using the index drops the query time from 260ms to 36ms! In most references I have seen, setting enable_seqscan to off is something only to be done for testing. But obviously if Postgres doesn't start behaving properly, I can't go to production without it. Is this the case or do many use this feature in production? Can you give Postgres index hints like you can do in Oracle and Sybase? Any clues? Thanks, Bas.
Bas Scheffers wrote: > Then I discovered "set enable_seqscan to off". It started using the index > again. This is certainly not a case of it not being beneficial to use the > index; using the index drops the query time from 260ms to 36ms! > > In most references I have seen, setting enable_seqscan to off is something > only to be done for testing. But obviously if Postgres doesn't start > behaving properly, I can't go to production without it. Is this the case > or do many use this feature in production? > > Can you give Postgres index hints like you can do in Oracle and Sybase? What is the exact query and table schema? Are you sure it is not problem of cross data-types in where clause and indexes? Postgresql uses indexes if it sees fit. Usually it is a better choice to let it decide what it wants. Shridhar
On Thu, 12 Feb 2004, Bas Scheffers wrote: > Hi, > > I have a table with about 100K rows, on which I have created a btree index > of the type table_name(int, int, int, timestamp). > > At first postgres was using it for my AND query on all four columns, but > after dropping it and creating different ones and testing, it suddenly > stopped using it. Vaccuuming, reindexing, recreating the table and even > recreating the database all didn't help. > > Then I discovered "set enable_seqscan to off". It started using the index > again. This is certainly not a case of it not being beneficial to use the > index; using the index drops the query time from 260ms to 36ms! > > In most references I have seen, setting enable_seqscan to off is something > only to be done for testing. But obviously if Postgres doesn't start > behaving properly, I can't go to production without it. Is this the case > or do many use this feature in production? > > Can you give Postgres index hints like you can do in Oracle and Sybase? Ok, there are two cases where postgresql doesn't use an index but should. One is where, due to type mismatch, it can't, the other is where the planner thinks it will be slower, but in fact would be faster. the set enable_seqscan = off trick allows you to see which it is. Since postgresql then used the index, it must be capable, but just doesn't realize it should. There are a few settings that tell postgresql how to decide which to use, they are: (All the following are in terms of comparison to a sequencial page access) random_page_cost: This one tells the server how much a random access costs versus a sequential access. default of 4. On machines with lots of cache / fast drives / lots of drives in a RAID array / RAID array with cache (i.e. things that speed up random access) you may want to set this lower. I run mine at 1.4. anything below 1.0 is unrealistic, but may be necessary to force the right plan sometimes. As a global setting, I'd recommend something between 1 and 2 for most servers. effective_cache_size: This tells the planner how big the kernel cache being used for caching postgresql is. If postgresql is the only thing on the machine, then it is likely that all the kernel cache is being used for postgresql. higher favors index scans, since it's more likely the data will be in memory, where random and seq cost the same, 1.0 cpu_tuple_cost < each tuple retrieved cpu_index_tuple_cost < each tuple's (additional?) cost for an index Lowering this favors index scans. cpu_operator_cost < not sure, I think it's stuff like nestloop loop cost and such. So, to start with, try changing random page cost. you can change it for the current session only for testing, and try to find the "breakover point" where it forces the planner to make the right decision. Also, keep a large table around you can do a select * from bigtable to clear the caches and then run the original query, and compare the performance of seq versus index. you'll often find that a query that screams when the caches are full of your data is quite slow when the cache is empty.
Hi Scot, scott.marlowe said: > So, to start with, try changing random page cost. you can change it for As "unrealistic" as it should be, I need <1 before Postgres takes the bait. Initialy 0.7, to be exact, but later It also worked at a little higher setting of 1. I have given PG 96Mb of memory to play with, so likely all my data will be in cache. So no very fast disk (6MB/sec reads), but loads of RAM. Should I try tweaking any of the other parameters? > performance of seq versus index. you'll often find that a query that > screams when the caches are full of your data is quite slow when the cache > is empty. True, but as this single query is going to be the work horse of the web service I am developing, it is likely all data will always be in memory, even if I'd have to stick several gigs of ram in. Thanks, Bas.
On Thu, 12 Feb 2004, Bas Scheffers wrote: > Hi Scot, > > As "unrealistic" as it should be, I need <1 before Postgres takes the > bait. Initialy 0.7, to be exact, but later It also worked at a little > higher setting of 1. I have given PG 96Mb of memory to play with, so > likely all my data will be in cache. So no very fast disk (6MB/sec reads), > but loads of RAM. > > Should I try tweaking any of the other parameters? Yes. drop cpu_tuple_index_cost by a factor of 100 or so cpu_index_tuple_cost = 0.001 to cpu_index_tuple_cost = 0.0001 or cpu_index_tuple_cost = 0.00001 Also up effective_cache_size. It's measured in 8k blocks, so for a machine with 1 gig of ram, and 700 meg of that in kernel cache, you'd want approximately 90000 for that. Note that this is not an exact measure, and it's ok if you like and make it even larger to ensure the database thinks we have gobs of RAM. > > performance of seq versus index. you'll often find that a query that > > screams when the caches are full of your data is quite slow when the cache > > is empty. > True, but as this single query is going to be the work horse of the web > service I am developing, it is likely all data will always be in memory, > even if I'd have to stick several gigs of ram in. Note that rather than "set enable_seqscan=off" for the whole database, you can always set it for just this session / query. When you run explain analyze <query> are any of the estimates of rows way off versus the real number of rows? If so, you may need to analyze more often or change the column's stat target to get a good number. and some query plans just don't have any way of knowing, so they just guess, and there's no way to change what they're guessing, so setting random page cost to <1 may be the only answer.
scott.marlowe said: > Yes. drop cpu_tuple_index_cost by a factor of 100 or so No effect. > Also up effective_cache_size. It's measured in 8k blocks, so for a That's better, set to 9000, which seems reasonable for my current setup, it will start using the index when RANDOM_PAGE_COST <= 1.5. > Note that rather than "set enable_seqscan=off" for the whole database, you > can always set it for just this session / query. Considering how rare a case it is that a table scan is more efficient than using proper indexes, that might not be a bad idea. > When you run explain analyze <query> are any of the estimates of rows way > off versus the real number of rows? If so, you may need to analyze more They are actualy depending on what stage it is in, it is execting a factor 20 to 100 rows more than actualy are returned. That sounds way off to me. Here's what's happening: first there is the index scan, which would return about 5000 rows (the planner is expecting 3700). But it doesn't return that, as there is another filter happening (circle ~ point) which reduces the actual number of rows to 242. That number is then further reduced to 32 by a tsearch2 query, but the planner is still expecting 3700 rows by that stage. I tried upping the statistics for the columns I am searching on and running analyze on the table, but without results. So I guess I am stuck with setting the effective_cache_size to a sane value and lowering the random_page_cost value to something not much higher than 1. Hey, as long as it works! Thanks, Bas.