Re: bitmask index - Mailing list pgsql-performance
From | Marcus Engene |
---|---|
Subject | Re: bitmask index |
Date | |
Msg-id | 4E12E442.9020202@engene.se Whole thread Raw |
In response to | Re: bitmask index (Greg Smith <greg@2ndQuadrant.com>) |
Responses |
Re: bitmask index
|
List | pgsql-performance |
On 6/22/11 11:42 , Greg Smith wrote: > On 06/22/2011 05:27 PM, Marcus Engene wrote: >> I have some tables with bitmask integers. Set bits are the >> interesting ones. Usually they are sparse. > > If it's sparse, create a partial index that just includes rows where > the bit is set: > http://www.postgresql.org/docs/current/static/indexes-partial.html > > You need to be careful the query uses the exact syntax as the one that > created the index for it to be used. But if you do that, it should be > able to pull the rows that match out quickly. > I ended up having a separate table with an index on. Though partial index solved another problem. Usually I'm a little bit annoyed with the optimizer and the developers religious "fix the planner instead of index hints". I must say that I'm willing to reconsider my usual stance to that. We have a large table of products where status=20 is a rare intermediate status. I added a... CREATE INDEX pond_item_common_x8 ON pond_item_common(pond_user, status) WHERE status = 20; ...and a slow 5s select with users who had existing status=20 items became very fast. Planner, I guess, saw the 10000 status 20 clips (out of millions of items) instead of like 5 different values of status and thus ignoring the index. Super! To my great amazement, the planner also managed to use the index when counting how many status=20 items there are in total: pond90=> explain analyze select pond90-> coalesce(sum(tt.antal),0) as nbr_in_queue pond90-> from pond90-> ( pond90(> select pond90(> pu.username pond90(> ,t.antal pond90(> from pond90(> ( pond90(> select pond90(> sum(1) as antal pond90(> ,pond_user pond90(> from pond90(> pond_item_common pond90(> where pond90(> status = 20 pond90(> group by pond_user pond90(> ) as t pond90(> ,pond_user pu pond90(> where pond90(> pu.objectid = t.pond_user pond90(> order by t.antal desc pond90(> ) as tt; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=38079.45..38079.46 rows=1 width=8) (actual time=166.439..166.440 rows=1 loops=1) -> Sort (cost=38079.13..38079.18 rows=21 width=18) (actual time=166.009..166.085 rows=648 loops=1) Sort Key: (sum(1)) Sort Method: quicksort Memory: 67kB -> Nested Loop (cost=37903.66..38078.67 rows=21 width=18) (actual time=157.545..165.561 rows=648 loops=1) -> HashAggregate (cost=37903.66..37903.92 rows=21 width=4) (actual time=157.493..157.720 rows=648 loops=1) -> Bitmap Heap Scan on pond_item_common (cost=451.43..37853.37 rows=10057 width=4) (actual time=9.061..151.511 rows=12352 loops=1) Recheck Cond: (status = 20) -> Bitmap Index Scan on pond_item_common_x8 (cost=0.00..448.91 rows=10057 width=0) (actual time=5.654..5.654 rows=20051 loops=1) Index Cond: (status = 20) -> Index Scan using pond_user_pkey on pond_user pu (cost=0.00..8.30 rows=1 width=14) (actual time=0.011..0.012 rows=1 loops=648) Index Cond: (pu.objectid = pond_item_common.pond_user) Total runtime: 166.709 ms (13 rows) My hat's off to the dev gang. Impressive! Best, Marcus
pgsql-performance by date: