PostgreSQL Columnar Store for Analytic Workloads - Mailing list pgsql-hackers
From | Hadi Moshayedi |
---|---|
Subject | PostgreSQL Columnar Store for Analytic Workloads |
Date | |
Msg-id | CAK=1=WrL42oFuF3Mm3i9CaUyzZznO68UXEQEWA-PJQhqpfwJFw@mail.gmail.com Whole thread Raw |
Responses |
Re: PostgreSQL Columnar Store for Analytic Workloads
|
List | pgsql-hackers |
<div dir="ltr"><div style="font-family:arial,sans-serif;font-size:13px">Dear Hackers,<br /></div><div style="font-family:arial,sans-serif;font-size:13px"><br/></div><div style="font-family:arial,sans-serif;font-size:13px">Weat Citus Data have been developing a columnar store extension for PostgreSQL.Today we are excited to open source it under the Apache v2.0 license.</div><div style="font-family:arial,sans-serif;font-size:13px"><br/></div><div style="font-family:arial,sans-serif;font-size:13px">Thiscolumnar store extension uses the Optimized Row Columnar (ORC) formatfor its data layout, which improves upon the RCFile format developed at Facebook, and brings the following benefits:</div><divstyle="font-family:arial,sans-serif;font-size:13px"><br /></div><div style="font-family:arial,sans-serif;font-size:13px">*Compression: Reduces in-memory and on-disk data size by 2-4x. Can beextended to support different codecs. We used the functions in pg_lzcompress.h for compression and decompression.</div><divstyle="font-family:arial,sans-serif;font-size:13px">* Column projections: Only reads column datarelevant to the query. Improves performance for I/O bound queries.</div><div style="font-family:arial,sans-serif;font-size:13px">* Skip indexes: Stores min/max statistics for row groups, and uses themto skip over unrelated rows.</div><div style="font-family:arial,sans-serif;font-size:13px"><br /></div><div style="font-family:arial,sans-serif;font-size:13px">We used the PostgreSQL FDW APIs to make this work. The extension doesn'timplement the writable FDW API, but it uses the process utility hook to enable COPY command for the columnar tables.</div><divstyle="font-family:arial,sans-serif;font-size:13px"><br /></div><div style="font-family:arial,sans-serif;font-size:13px">Thisextension uses PostgreSQL's internal data type representation tostore data in the table, so this columnar store should support all data types that PostgreSQL supports.</div><div style="font-family:arial,sans-serif;font-size:13px"><br/></div><div style="font-family:arial,sans-serif;font-size:13px">Wetried the extension on TPC-H benchmark with 4GB scale factor on a m1.xlargeAmazon EC2 instance, and the query performance improved by 2x-3x compared to regular PostgreSQL table. Note thatwe flushed the page cache before each test to see the impact on disk I/O.</div><div style="font-family:arial,sans-serif;font-size:13px"><br/></div><div style="font-family:arial,sans-serif;font-size:13px">Whendata is cached in memory, the performance of cstore_fdw tables wereclose to the performance of regular PostgreSQL tables.</div><div style="font-family:arial,sans-serif;font-size:13px"><br/></div><div style="font-family:arial,sans-serif;font-size:13px">Formore information, please visit:</div><div style="font-family:arial,sans-serif;font-size:13px"> * our blog post: <a href="http://citusdata.com/blog/76-postgresql-columnar-store-for-analytics" target="_blank">http://citusdata.com/blog/76-postgresql-columnar-store-for-analytics</a></div><div style="font-family:arial,sans-serif;font-size:13px"> * our github page: <a href="https://github.com/citusdata/cstore_fdw"target="_blank">https://github.com/citusdata/cstore_fdw</a></div><div style="font-family:arial,sans-serif;font-size:13px"><br/></div><div style="font-family:arial,sans-serif;font-size:13px">Feedback from you is really appreciated.</div><div style="font-family:arial,sans-serif;font-size:13px"><br/></div><div style="font-family:arial,sans-serif;font-size:13px">Thanks,</div><div style="font-family:arial,sans-serif;font-size:13px"> -- Hadi</div><div style="font-family:arial,sans-serif;font-size:13px"><br/></div></div>
pgsql-hackers by date: