GSoC 2015 proposal. Bitmap Index-only Count - Mailing list pgsql-hackers
From | Anastasia Lubennikova |
---|---|
Subject | GSoC 2015 proposal. Bitmap Index-only Count |
Date | |
Msg-id | CAP4vRV7xdUE7Eq7ZFgoWSC6aXmH9AM3qeSQ139dzZhpCcQGr3A@mail.gmail.com Whole thread Raw |
Responses |
Re: GSoC 2015 proposal. Bitmap Index-only Count
|
List | pgsql-hackers |
Project name
Bitmap Index-only Count
Brief review
There is a problem of slow counting in PostgreSQL [1]. The reason why this is slow is related to the MVCC implementation in PostgreSQL. Index-only scans (implemented since PostgreSQL-9.2) providing some performance improvements where the visibility map of the table allows it. That’s good. But it works only for access methods which provide amgettuple method. Unfortunately GIN supports only BitmapIndexScan and has no implementation of index_getnext() interface [2]. Idea of the proposal is to implement Bitmap Index-Only Count method that allows to count elements, without reference to the heap.
Benefits to the PostgreSQL Community
Faster count(*) would be actual for GIN. Probably it would accelerate count(*) for other access methods too, but I do not sure if it would be significant difference.
Quantifiable results
Acceleration of count(*) queries with WHERE clause which use GIN.
Project details
Let’s look at count(*) query:
EXPLAIN SELECT count(*) from tbl_mytbl where val @> '{"b":2}';
Now the query plan looks like this:
Aggregate
-> Bitmap Heap Scan on tbl_mytbl
Recheck Cond: (val @> '{"b": 2}'::jsonb)
-> Bitmap Index Scan on idx_mytbl
Index Cond: (val @> '{"b": 2}'::jsonb)
Idea of the proposal is to implement Bitmap Index-Only Count method that allows to count elements, without reference to the heap.
Conditions:
- all tuples are visible (the same problem as for Index-only count(*));
- result TIDBitmap is not lossy. nchunks = 0;
int nchunks; /* number of lossy entries in pagetable */
- pages in TIDBitmap don’t require recheck
* recheck is used only on exact pages --- it indicates that although
* only the stated tuples need be checked, the full index qual condition
* must be checked for each (ie, these are candidate matches).
If all conditions are true, it’s possible to call aggregate COUNT function for each tuple from TIDBitmap returned by Bitmap Index Scan (or BitmapAnd/BitmapOr nodes). And there’s no necessity to call Bitmap Heap Scan.
As a GSoC student I will create new Node “Bitmap Index-Only Scan”, which would catch tuples from Bitmap Index Scan node and pass them to Aggregate node. Thus, new query plan will be as follow:
Aggregate
-> Bitmap Index-only Count
-> Bitmap Index Scan on idx_mytbl
Index Cond: (val @> '{"b": 2}'::jsonb)
Project Schedule
until May 25
Read documentation and source code, clarify details of implementation.
until June 30
Implement Bitmap Index-only Count node.
1 July – 31 July
Add Bitmap Index-only Count node to Planner.
1 August -15 August
Final refactoring, testing and submitting a patch.
Links
- https://wiki.postgresql.org/wiki/Slow_Counting
- http://postgresql.nabble.com/Todo-item-Support-amgettuple-in-GIN-td5780810.html
pgsql-hackers by date: