Home > mailing lists

Re: bitmask index - Mailing list pgsql-performance

From	Marcus Engene
Subject	Re: bitmask index
Date	July 5, 2011 07:15:53
Msg-id	4E12E442.9020202@engene.se Whole thread Raw
In response to	Re: bitmask index (Greg Smith <greg@2ndQuadrant.com>)
Responses	Re: bitmask index
List	pgsql-performance

Tree view

On 6/22/11 11:42 , Greg Smith wrote:
> On 06/22/2011 05:27 PM, Marcus Engene wrote:
>> I have some tables with bitmask integers. Set bits are the
>> interesting ones. Usually they are sparse.
>
> If it's sparse, create a partial index that just includes rows where
> the bit is set:
> http://www.postgresql.org/docs/current/static/indexes-partial.html
>
> You need to be careful the query uses the exact syntax as the one that
> created the index for it to be used.  But if you do that, it should be
> able to pull the rows that match out quickly.
>
I ended up having a separate table with an index on.

Though partial index solved another problem. Usually I'm a little bit
annoyed with the optimizer and the developers religious "fix the planner
instead of index hints". I must say that I'm willing to reconsider my
usual stance to that.

We have a large table of products where status=20 is a rare intermediate
status. I added a...

CREATE INDEX pond_item_common_x8 ON pond_item_common(pond_user, status)
WHERE status = 20;

...and a slow 5s select with users who had existing status=20 items
became very fast. Planner, I guess, saw the 10000 status 20 clips (out
of millions of items) instead of like 5 different values of status and
thus ignoring the index. Super!

To my great amazement, the planner also managed to use the index when
counting how many status=20 items there are in total:

pond90=> explain analyze             select
pond90->                 coalesce(sum(tt.antal),0) as nbr_in_queue
pond90->             from
pond90->                 (
pond90(>                     select
pond90(>                         pu.username
pond90(>                        ,t.antal
pond90(>                     from
pond90(>                         (
pond90(>                             select
pond90(>                                 sum(1) as antal
pond90(>                                ,pond_user
pond90(>                             from
pond90(>                                 pond_item_common
pond90(>                             where
pond90(>                                 status = 20
pond90(>                             group by pond_user
pond90(>                         ) as t
pond90(>                        ,pond_user pu
pond90(>                     where
pond90(>                         pu.objectid = t.pond_user
pond90(>                     order by t.antal desc
pond90(>                 ) as tt;
                                                                           QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=38079.45..38079.46 rows=1 width=8) (actual
time=166.439..166.440 rows=1 loops=1)
    ->  Sort  (cost=38079.13..38079.18 rows=21 width=18) (actual
time=166.009..166.085 rows=648 loops=1)
          Sort Key: (sum(1))
          Sort Method:  quicksort  Memory: 67kB
          ->  Nested Loop  (cost=37903.66..38078.67 rows=21 width=18)
(actual time=157.545..165.561 rows=648 loops=1)
                ->  HashAggregate  (cost=37903.66..37903.92 rows=21
width=4) (actual time=157.493..157.720 rows=648 loops=1)
                      ->  Bitmap Heap Scan on pond_item_common
(cost=451.43..37853.37 rows=10057 width=4) (actual time=9.061..151.511
rows=12352 loops=1)
                            Recheck Cond: (status = 20)
                            ->  Bitmap Index Scan on
pond_item_common_x8  (cost=0.00..448.91 rows=10057 width=0) (actual
time=5.654..5.654 rows=20051 loops=1)
                                  Index Cond: (status = 20)
                ->  Index Scan using pond_user_pkey on pond_user pu
(cost=0.00..8.30 rows=1 width=14) (actual time=0.011..0.012 rows=1
loops=648)
                      Index Cond: (pu.objectid = pond_item_common.pond_user)
  Total runtime: 166.709 ms
(13 rows)

My hat's off to the dev gang. Impressive!

Best,
Marcus

pgsql-performance by date:

From: Greg Smith
Date: 03 July 2011, 15:18:26
Subject: Re: Infinite Cache

From: Shaun Thomas
Date: 05 July 2011, 10:35:48
Subject: Re: Infinite Cache

Re: bitmask index - Mailing list pgsql-performance

Previous

Next