Home > mailing lists

Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension) - Mailing list pgsql-hackers

From	Haribabu Kommi
Subject	Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)
Date	February 14, 2017 04:59:43
Msg-id	CAJrrPGdjQ0MOS_LqjSu5=nw-Od1p2voMHhOrWYu8=SL5vAL6aA@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] [WIP]Vertical Clustered Index (columnar storeextension) (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses	Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)
List	pgsql-hackers

Tree view

On Tue, Feb 14, 2017 at 2:57 AM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

Hi,

I wonder if it is possible to somehow benchmark your clustered index implementation.
I tried to create VCI index for lineitem table from TPC and run Q6 query.
After index creation Postgres is not using parallel execution plan any more but speed of sequential plan is not changed
and nothing in query execution plan indicates that VCI index is used:

postgres=# explain select
    sum(l_extendedprice*l_discount) as revenue
from
    lineitem_projection
where
    l_shipdate between '1996-01-01' and '1997-01-01'
    and l_discount between 0.08 and 0.1
    and l_quantity < 24;
                                                                                                                 QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------
Finalize Aggregate (cost=608333.85..608333.86 rows=1 width=4)
   -> Gather (cost=608333.23..608333.84 rows=6 width=4)
         Workers Planned: 6
         -> Partial Aggregate (cost=607333.23..607333.24 rows=1 width=4)
               -> Parallel Seq Scan on lineitem_projection (cost=0.00..607024.83 rows=61680 width=8)
                     Filter: ((l_shipdate >= '1996-01-01'::date) AND (l_shipdate <= '1997-01-01'::date) AND (l_discount >= '0.08'::double precision) AN
D (l_discount <= '0.1'::double precision) AND (l_quantity < '24'::double precision))
(6 rows)

postgres=# select
    sum(l_extendedprice*l_discount) as revenue
from
    lineitem_projection
where
    l_shipdate between '1996-01-01' and '1997-01-01'
    and l_discount between 0.08 and 0.1
    and l_quantity < 24;
   revenue
-------------
6.21111e+08
(1 row)

Time: 1171.324 ms (00:01.171)

postgres=# create index vci_idx on lineitem_projection using vci(l_shipdate,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus);
CREATE INDEX
Time: 4.705 ms

postgres=# explain select
    * from
    lineitem_projection
where
    l_shipdate between '1996-01-01' and '1997-01-01'
    and l_discount between 0.08 and 0.1
    and l_quantity < 24;
                                                                                                        QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------
Seq Scan on lineitem_projection (cost=0.00..382077.00 rows=1 width=22)
   Filter: ((l_shipdate >= '1996-01-01'::date) AND (l_shipdate <= '1997-01-01'::date) AND (l_discount >= '0.08'::double precision) AND (l_discount <= '
0.1'::double precision) AND (l_quantity < '24'::double precision))
(2 rows)

postgres=# select
    sum(l_extendedprice*l_discount) as revenue
from
    lineitem_projection
where
    l_shipdate between '1996-01-01' and '1997-01-01'
    and l_discount between 0.08 and 0.1
    and l_quantity < 24;
revenue
------------
6.2112e+08
(1 row)

Time: 4304.355 ms (00:04.304)

I wonder if there is any query which can demonstrate advantages of using VCI index?

The current patch that I shared doesn't contains the plan and executor changes to show

the performance benefit of the clustered index. we used custom plan to generate the plan

for the clustered index. Currently I am working on it to rebase it to current master and

other necessary changes.

In the current state of the patch, I cannot take any performance tests, as it needs some

major changes according to the latest PostgreSQL version. I have an old performance

report that is took on 9.5 attached for your reference.

The current patch that is shared is to find out the best approach in developing a columnar

storage in PostgreSQL, by adopting Index access methods + additional hooks or pluggable

storage access methods?

The only problem I can think of pluggable storage methods is, to use the proper benefits of

columnar storage, the planner and executor needs to be changed to support vector processing,

But whereas in the current model, we implemented the same with custom plan and additional

hooks. The same may be possible with pluggable storage methods also.

Regards,

Hari Babu

Fujitsu Australia

Attachment

VCI_DBT3_Query_Performance.xlsx

pgsql-hackers by date:

From: Haribabu Kommi
Date: 14 February 2017, 04:21:46
Subject: Re: [HACKERS] Parallel bitmap heap scan

From: Amit Kapila
Date: 14 February 2017, 05:04:41
Subject: Re: [HACKERS] Parallel Index Scans

Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension) - Mailing list pgsql-hackers

Attachment

Previous

Next