Re: Strange choice of general index over partial index - Mailing list pgsql-performance
From | Mark Kirkwood |
---|---|
Subject | Re: Strange choice of general index over partial index |
Date | |
Msg-id | 54B88032.7020206@catalyst.net.nz Whole thread Raw |
In response to | Re: Strange choice of general index over partial index (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>) |
Responses |
Re: Strange choice of general index over partial index
|
List | pgsql-performance |
On 16/01/15 15:32, Mark Kirkwood wrote: > On 16/01/15 13:37, Mark Kirkwood wrote: >> On 16/01/15 11:30, Josh Berkus wrote: >>> This is an obfuscation and mock up, but: >>> >>> table files ( >>> id serial pk, >>> filename text not null, >>> state varchar(20) not null >>> ... 18 more columns >>> ) >>> >>> index file_state on (state) >>> (35GB in size) >>> index file_in_flight_state (state) where state in ( >>> 'waiting','assigning', 'processing' ) >>> (600MB in size) >>> ... 10 more indexes >>> >>> More important facts: >>> * state = 'done' 95% of the time. thereform the partial index >>> represents only 5% of the table >>> * all indexes and the table are very bloated >>> * server has 128GB RAM >>> * Version 9.2. >>> >>> Given this setup, I would expect the planner to *always* choose >>> file_in_flight_state over file_state for this query: >>> >>> SELECT id, filename FROM files WHERE state = 'waiting'; >>> >>> ... and yet it keeps selecting file_state based on extremely small >>> changes to the stats. This is important because the same query, using >>> file_state, is 20X to 50X slower, because that index frequently gets >>> pushed out of memory. >>> >>> What am I missing? Or is this potentially a planner bug for costing? >>> >> >> Are you seeing a bitmapscan access plan? If so see if disabling it gets >> you a plan on the files_in_flight index. I'm seeing this scenario with a >> fake/generated dataset a bit like yours in 9.2 (9.5 uses the >> files_in_flight w/o any coercing). >> > > FWIW: For me 9.2 and 9.3 (default config) generate plans like: > state=# EXPLAIN ANALYZE > SELECT id, filename > FROM files > WHERE state = 'processing'; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------- > > Bitmap Heap Scan on files (cost=3102.02..89228.68 rows=164333 > width=15) (actual time=26.629..803.507 rows=166696 loops=1) > Recheck Cond: ((state)::text = 'processing'::text) > Rows Removed by Index Recheck: 7714304 > -> Bitmap Index Scan on file_state (cost=0.00..3060.93 rows=164333 > width=0) (actual time=25.682..25.682 rows=166696 loops=1) > Index Cond: ((state)::text = 'processing'::text) > Total runtime: 808.662 ms > (6 rows) > > > whereas 9.4 and 9.5 get: > > QUERY PLAN > > --------------------------------------------------------------------------------------------------------------------------------------- > > Index Scan using file_in_flight on files (cost=0.42..62857.39 > rows=158330 width=15) (actual time=0.055..202.732 rows=166696 loops=1) > Index Cond: ((state)::text = 'processing'::text) > Planning time: 24.203 ms > Execution time: 208.926 ms > (4 rows) > > > This is with each version loading exactly the same dataset (generated by > the attached scripty). Obviously this is a vast simplification of what > Josh is looking at - but it is (hopefully) interesting that these later > versions are doing so much better... > A bit more poking about shows that the major factor (which this fake dataset anyway) is the default for effective_cache_size (changes from 128MB to 4GB in 9.4). Increasing this makes 9.2 start using the files_in_flight index in a plain index scan too. Josh - might be worth experimenting with this parameter. regards Mark
pgsql-performance by date: