Re: index usage (and foreign keys/triggers) - Mailing list pgsql-general
From | scott.marlowe |
---|---|
Subject | Re: index usage (and foreign keys/triggers) |
Date | |
Msg-id | Pine.LNX.4.33.0302261619090.18487-100000@css120.ihs.com Whole thread Raw |
In response to | index usage (and foreign keys/triggers) (Patrik Kudo <kudo@pingpong.net>) |
Responses |
Re: index usage (and foreign keys/triggers)
|
List | pgsql-general |
On Wed, 26 Feb 2003, Patrik Kudo wrote: > Hi gurus et al ;) > > I have a database with a couple of hundred tables. In this database one > table, "person", represents a user. person.userid is a primary key. To > this key there are about a hundred foreign keys spread out over > aproximately equaly many tables. When deleting a user I noticed a quite > big difference in time depending on how much data there are in the > foreign key-tables. After some investigation I concluded that for some > users and some tables the indices wheren't used when deleting, resulting > in longer run-times. > > Here's an example: > > select count(*) from login; > count > ------- > 96824 > > select count(*) from login where userid = 'patrik'; > count > ------- > 608 > > select count(*) from login where userid = 'jennie'; > count > ------- > 4211 > > explain delete from login where userid = 'patrik'; > QUERY PLAN > > --------------------------------------------------------------------------------- > Index Scan using login_userid_idx on login (cost=0.00..237.06 rows=61 > width=6) > Index Cond: (userid = 'patrik'::text) > > explain delete from login where userid = 'jennie'; > QUERY PLAN > ----------------------------------------------------------- > Seq Scan on login (cost=0.00..2045.30 rows=3421 width=6) > Filter: (userid = 'jennie'::text) > > > What makes the planer choose seq scan for 'jennie', but not for > 'patrik'? I also tested the following: > > delete from login where userid = 'jennie'; > DELETE 4211 > Time: 508.94 ms > > set enable_seqscan = false; > > delete from login where userid = 'jennie'; > DELETE 4211 > Time: 116.92 ms > > As you can see the index scan is almost 5 times faster, but still > postgres chooses to seq scan... Am I doing something wrong or is > postgres being stupid on me? Postgresql is being smart, just not smart enough. Imagine that one of your queries was to delete 99.9% of all the tuples. Would an index scan help then? Of course not, since you're going to visit nearly every row in the database. the planner uses several settings to try and figure out the cost of sequentially scanning a table versus index access, and it doesn't always get things right. Take a look at random_page_cost. It defaults to 4, which means that postgresql will make it's decisions on index versus seq scan assuming that random individual pages cost 4 times as much to get as a sequential scan that just happens to include them. On most modern machines the difference in cost is very low, what with disk caching and all. This is especially true for smaller tables that can fit in memory. Once a table's in buffer memory, along with it's index, random page cost will be about 1. I.e. a seq scan in memory or an index op are both quite fast. In fact, it is possible that at this point, a random page access for certain percentages of your table will cost you LESS than 1 in practice because linear access in memory yields little if any gain over random access. The only overhead is reading the index blocks. So, try tuning your random page cost down to somewhere between 1.0 and 2.0 for best performance on these kinds of things. Our database at work runs on a machine with 1.5 gig ram, of which 800 megs is cache, and postgresql has 256 meg shared buffer. It generally only hits the drives about 5% of the reads, so random page cost for us is set to 1.2 and works well. Welcome to the wonderful world of performance tuning...
pgsql-general by date: