index choosing problem - Mailing list pgsql-performance
From | Rural Hunter |
---|---|
Subject | index choosing problem |
Date | |
Msg-id | 4F61A909.20109@gmail.com Whole thread Raw |
Responses |
Re: index choosing problem
|
List | pgsql-performance |
I have a table with serveral million records. they are divided into about one hundred catagory(column cid). I created index includes the cid as the first column. I had a problem with some cids they only have few records comparing with other cids. Some of them only have serveral thousand rows. Some queries are not using index on the cids. I got the explain for the queries. Note: article_others_cid_time_style_idx is the index contains cid as the first column article_others_pkey is the primary key on an auto incremented column aid. # select count(*) from article_others; count --------- 6888459 (1 row) # select count(*) from article_others where cid=74; count ------- 4199 (1 row) 1. # explain select count(*) from article_others where cid=74; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Aggregate (cost=32941.95..32941.96 rows=1 width=0) -> Index Scan using article_others_cid_time_style_idx on article_others (cost=0.00..32909.34 rows=13047 width=0) Index Cond: (cid = 74) (3 rows) 2. # explain select aid from article_others where cid=74 limit 10; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..25.22 rows=10 width=8) -> Index Scan using article_others_cid_time_style_idx on article_others (cost=0.00..32909.34 rows=13047 width=8) Index Cond: (cid = 74) (3 rows) 3. # explain select aid from article_others where cid=74 order by aid desc limit 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..1034.00 rows=10 width=8) -> Index Scan Backward using article_others_pkey on article_others (cost=0.00..1349056.65 rows=13047 width=8) Filter: (cid = 74) (3 rows) 4. # explain select aid from article_others where cid=74 order by aid desc limit 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..103.40 rows=1 width=8) -> Index Scan Backward using article_others_pkey on article_others (cost=0.00..1349060.65 rows=13047 width=8) Filter: (cid = 74) (3 rows) 5. # explain select max(aid) from article_others where cid=74; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Result (cost=104.70..104.71 rows=1 width=0) InitPlan 1 (returns $0) -> Limit (cost=0.00..104.70 rows=1 width=8) -> Index Scan Backward using article_others_pkey on article_others (cost=0.00..1365988.55 rows=13047 width=8) Index Cond: (aid IS NOT NULL) Filter: (cid = 74) (6 rows) Now the query 3-5 using article_others_pkey are quite slow. The rows for cid 74 are very old and seldom get updated. I think pg needs to scan quite a lot on article_others_pkey before it gets the rows for cid 74. The same query for other cids with new and majority of rows runs very fast. for example: # explain select max(aid) from article_others where cid=258; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Result (cost=1.54..1.55 rows=1 width=0) InitPlan 1 (returns $0) -> Limit (cost=0.00..1.54 rows=1 width=8) -> Index Scan Backward using article_others_pkey on article_others (cost=0.00..1366260.55 rows=889520 width=8) Index Cond: (aid IS NOT NULL) Filter: (cid = 258) So I think if pg chooses to use index article_others_cid_time_style_idx the performance would be much better. or any other solution I can take to improve the query performance for those cids like 74? Another question, why the plan shows rows=13047 for cid=74 while actually it only has 4199 rows? There is almost no data changes for cid 74 and I just vacuum/analyzed the table this morning.
pgsql-performance by date: