Re: Strange choice of general index over partial index - Mailing list pgsql-performance
From | Mark Kirkwood |
---|---|
Subject | Re: Strange choice of general index over partial index |
Date | |
Msg-id | 54B87820.3090101@catalyst.net.nz Whole thread Raw |
In response to | Re: Strange choice of general index over partial index (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>) |
Responses |
Re: Strange choice of general index over partial index
Re: Strange choice of general index over partial index |
List | pgsql-performance |
On 16/01/15 13:37, Mark Kirkwood wrote: > On 16/01/15 11:30, Josh Berkus wrote: >> This is an obfuscation and mock up, but: >> >> table files ( >> id serial pk, >> filename text not null, >> state varchar(20) not null >> ... 18 more columns >> ) >> >> index file_state on (state) >> (35GB in size) >> index file_in_flight_state (state) where state in ( >> 'waiting','assigning', 'processing' ) >> (600MB in size) >> ... 10 more indexes >> >> More important facts: >> * state = 'done' 95% of the time. thereform the partial index >> represents only 5% of the table >> * all indexes and the table are very bloated >> * server has 128GB RAM >> * Version 9.2. >> >> Given this setup, I would expect the planner to *always* choose >> file_in_flight_state over file_state for this query: >> >> SELECT id, filename FROM files WHERE state = 'waiting'; >> >> ... and yet it keeps selecting file_state based on extremely small >> changes to the stats. This is important because the same query, using >> file_state, is 20X to 50X slower, because that index frequently gets >> pushed out of memory. >> >> What am I missing? Or is this potentially a planner bug for costing? >> > > Are you seeing a bitmapscan access plan? If so see if disabling it gets > you a plan on the files_in_flight index. I'm seeing this scenario with a > fake/generated dataset a bit like yours in 9.2 (9.5 uses the > files_in_flight w/o any coercing). > FWIW: For me 9.2 and 9.3 (default config) generate plans like: state=# EXPLAIN ANALYZE SELECT id, filename FROM files WHERE state = 'processing'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on files (cost=3102.02..89228.68 rows=164333 width=15) (actual time=26.629..803.507 rows=166696 loops=1) Recheck Cond: ((state)::text = 'processing'::text) Rows Removed by Index Recheck: 7714304 -> Bitmap Index Scan on file_state (cost=0.00..3060.93 rows=164333 width=0) (actual time=25.682..25.682 rows=166696 loops=1) Index Cond: ((state)::text = 'processing'::text) Total runtime: 808.662 ms (6 rows) whereas 9.4 and 9.5 get: QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Index Scan using file_in_flight on files (cost=0.42..62857.39 rows=158330 width=15) (actual time=0.055..202.732 rows=166696 loops=1) Index Cond: ((state)::text = 'processing'::text) Planning time: 24.203 ms Execution time: 208.926 ms (4 rows) This is with each version loading exactly the same dataset (generated by the attached scripty). Obviously this is a vast simplification of what Josh is looking at - but it is (hopefully) interesting that these later versions are doing so much better... Cheers Mark
Attachment
pgsql-performance by date: