Re: [HACKERS] [WIP]Vertical Clustered Index (columnar storeextension) - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: [HACKERS] [WIP]Vertical Clustered Index (columnar storeextension) |
Date | |
Msg-id | ca0e06ba-d1f1-7256-b5b6-55a0cb5a5749@postgrespro.ru Whole thread Raw |
In response to | [HACKERS] [WIP]Vertical Clustered Index (columnar store extension) (Haribabu Kommi <kommi.haribabu@gmail.com>) |
Responses |
Re: [HACKERS] [WIP]Vertical Clustered Index (columnar store extension)
|
List | pgsql-hackers |
On 30.12.2016 06:55, Haribabu Kommi wrote:
We in PostgresPRO are also very interested in developing vertical storage (VS) for Postgres.
And after considering many alternatives, we came to the conclusion that approach based on representing columnar store as access method (index)
is the most promising one.
It allows to:
1. Implement VS as extension without affecting Postgres core.
2. Have both ROS and WOS.
3. Create multiple projections (as in Vertica).
4. Optimize insert speed by support batch inserts and use flexible recovery model for VS.
So it is very similar with your approach. But there are few differences:
1. Our intention is to completely eliminate changes in Postgres core.
You wrote:
Why it is not possible to rely on standard rules of applying indexes in Postgres optimizer based on costs provided by our AM implementation?
2. You are accessing VS pages through Postgres buffer manager. It certainly have a lot of advantages. First of all it significantly simplifies implementation of VS and allows to reuse Postgres cache and lock managers.
But is all leads to some limitation:
- For VS it is preferable to have larger pages (in Vertica size of page can be several megabytes).
- VS is optimized for sequential access, so caching pages in buffer manager is no needed and can only cause leaching of other useful pages from cache.
- It makes it not possible to implement in-memory version of VS.
- Access to buffer manager adds extra synchronization overhead which becomes noticeable at MPP systems.
So I wonder if you have considered approach with VS specific implementation of storage layer?
3. To take all advantages of vertical model, we should provide vector execution.
Without it columnar store can only reduce amount of fetched data by selective fetch of accessed columns and better compression of them.
But this is what existed cstore_fdw extension for Postgres also does.
We are going to use executor hooks or custom nodes to implement vector operations for some nodes (filter, grand aggregate, aggregation with group by,...).
Something similar with https://github.com/citusdata/postgres_vectorization_test
What is your vision of optimizing executor to work with VS?
4. How do you consider adding parallelism support to VS? Should it be handled inside VS implementation? Or should we use standard Postgres parallel execution (parallel index-only scan)?
Thanks in advance,
Kosntantin
Hi All,Fujitsu was interested in developing a columnar storage extension with minimalchanges the server backend.
We in PostgresPRO are also very interested in developing vertical storage (VS) for Postgres.
And after considering many alternatives, we came to the conclusion that approach based on representing columnar store as access method (index)
is the most promising one.
It allows to:
1. Implement VS as extension without affecting Postgres core.
2. Have both ROS and WOS.
3. Create multiple projections (as in Vertica).
4. Optimize insert speed by support batch inserts and use flexible recovery model for VS.
So it is very similar with your approach. But there are few differences:
1. Our intention is to completely eliminate changes in Postgres core.
You wrote:
But I still do not completely understand why it is not possible to use VS in index only scans without any changes and standard Postgres executor?Yes, it is a mix of both index and table access methods. The current designof Vertical clustered index needs both access methods, because of this reasonwe used both access methods.
Why it is not possible to rely on standard rules of applying indexes in Postgres optimizer based on costs provided by our AM implementation?
2. You are accessing VS pages through Postgres buffer manager. It certainly have a lot of advantages. First of all it significantly simplifies implementation of VS and allows to reuse Postgres cache and lock managers.
But is all leads to some limitation:
- For VS it is preferable to have larger pages (in Vertica size of page can be several megabytes).
- VS is optimized for sequential access, so caching pages in buffer manager is no needed and can only cause leaching of other useful pages from cache.
- It makes it not possible to implement in-memory version of VS.
- Access to buffer manager adds extra synchronization overhead which becomes noticeable at MPP systems.
So I wonder if you have considered approach with VS specific implementation of storage layer?
3. To take all advantages of vertical model, we should provide vector execution.
Without it columnar store can only reduce amount of fetched data by selective fetch of accessed columns and better compression of them.
But this is what existed cstore_fdw extension for Postgres also does.
We are going to use executor hooks or custom nodes to implement vector operations for some nodes (filter, grand aggregate, aggregation with group by,...).
Something similar with https://github.com/citusdata/postgres_vectorization_test
What is your vision of optimizing executor to work with VS?
4. How do you consider adding parallelism support to VS? Should it be handled inside VS implementation? Or should we use standard Postgres parallel execution (parallel index-only scan)?
Thanks in advance,
Kosntantin
The columnar store is implemented as an extension using index access methods.This can be easily enhanced with pluggable storage methods once they are available.A new index method (VCI) is added to create columnar index on the table.The following is the basic design idea of the columnar extension,This has the on-disk columnar representation. So, even after crash,the columnar format is recovered to the state when it was crashed.To provide performance benefit for both read and write operations,the data is stored in two formats1) write optimized storage (WOS)2) read optimized storage (ROS).This is useful for the users where there is a great chance of data modificationthat is newly added instead of the old data.WOS====write optimized storage is the data of all columns that are part of VCI arestored in a row wise format. All the newly added data is stored in WOSrelation with xmin/xmax information also. If user wants to update/delete thenewly added data, it doesn't affect the performance much compared todeleting the data from columnar storage.The tuples which don't have multiple copies or frozen data will be movedfrom WOS to ROS periodically by the background worker process or autovauumprocess. Every column data is stored separately in it's relation file. Thereis no transaction information is present in ROS. The data in ROS can bereferred with tuple ID.In this approach, the column data is present in both heap and columnarstorage.ROS====This is the place, where all the column data is stored in columnar format.The data from WOS to ROS is converted by background workers continously basedon the tuple visibility check. Whenever the tuple is frozen and it gets movedfrom WOS to ROS.The Data in ROS is stored in extents. One extent contains of 262,144 rows. Becauseof fixed number of records in an extent it is easy to map the heap record to the columnarrecord with TID to CRID map.Insert=====The insert operation is just like inserting a data into an index.Select=====Because of two storage formats, during the select operation, the data in WOSis converted into Local ROS for the statement to be executed. The conversioncost depends upon the number of tuples present in the WOS file. Thismay add some performance overhead for select statements. The life of the LocalROS is till the end of query context.Delete=====During the delete operation, whenever the data is deleted in heap at the sametime the data in WOS file is marked as deleted similar like heap. But in caseif the data is already migrated from WOS to ROS, then we will maintain somedelete vector to store the details of tuple id, transaction information and etc.During the data read from ROS file, it is verified against delete vector andconfirms whether the record is visible or not? All the delete vectorsdata is applied to ROS periodically.More details of internal relations and their usage is available in the README.Still it needs more updates to explain full details of the columnar index design.The concept of Vertical clustered index columnar extension is from Fujitsu Labs, Japan.Following is the brief schedule of patches that are requiredfor a better performing columnar store.1. Minimal server changes (new relkind "CSTORE" option)2. Base storage patch3. Support for moving data from WOS to ROS4. Local ROS support5. Custom scan support to read the data from ROS and Local ROS6. Background worker support for data movement7. Expression state support in VCI8. Aggregation support in VCI9. Pg_dump support for the new type of relations10. psql \d command support for CSTORE relations11. Parallelism support12. Compression support13. In-memory support with dynamic shared memoryCurrently I attached only patches for 1 and 2. These will provide thebasic changes that are required in PostgreSQL core to the extensionto work.I have to rebase/rewrite the rest of the patches to the latest master,and share them with community.Any Comments on the approach?Regards,Hari BabuFujitsu Australia
-- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-hackers by date: