[PROPOSAL] Table Partition - Mailing list pgsql-hackers
From | My Life |
---|---|
Subject | [PROPOSAL] Table Partition |
Date | |
Msg-id | tencent_2CD4D0A83798F46014D7DD58@qq.com Whole thread Raw |
Responses |
Re: [PROPOSAL] Table Partition
|
List | pgsql-hackers |
Hi, everyone! I'd like to propose a postgres partition implementation. First, I would show the design to everyone, and talkabout it. If we think the design is not very bad, and can be commit to the PostgreSQL baseline, then I will post thecode to the community.<br />(note: my english is not very good.)<br /><br />Table Partition Design<br />=====================<br/>In this design, partitions are normal tables in inheritance hierarchies, with the same tablestructure with the partitioned table.<br /><br />In pg_class we have an additional relpartition field which has followingvalues:<br />'s' /* single regular table */<br />'r' /* partitioned table by range */<br />'l' /* partitioned table by list */<br />'h' /* partitioned table by hash */<br />'c' /* child partitiontable */<br /><br />Add a new system schema named 'pg_partition', just like 'pg_toast', we can create the partitioncatalog table to store the partition entries. let's assume the partition catalog's name is pg_partition_2586 (2586is the partitioned table's OID in pg_class).<br />a range or interval partition catalog's structure is as follows:<br/>column data type comment<br />partname name a partition's name,this is the primary key<br />partid oid a partition's OID in pg_class<br />interval text a interval partition's interval(maybe a expression)<br />partkey1 depends onpartitioned table<br />...<br />partkeyN depends on partitioned table<br />partkey1, ..., partkeyN is a partition'supper bound.<br />Finally, make a unique constraint on partkey1, ..., partkeyN.<br />Every time we create a newpartition, we insert a new tuple into this partition catalog.<br />Every time we drop an old partition, we delete therelated tuple in this partition catalog.<br /><br />For a partitioned table's CREATE action, we should transform the actioninto the CREATE action of partitioned table and partitions, and the INSERT action into the partition catalog.<br /><br/>For INSERT action, we implement a RelationGetTuplePartid method, which can find the partition the tuple belongs to.It will do an index scan on the partition catalog table(assume it is pg_partition_2586) to find the partition.<br />anda ExecGetPartitionResultRel method, which can return the partition's ResultRelInfo to execute INSERT action.<br /><br/>For partitioned table's scan action, and JOIN action, we implemented a plan node named 'PartitionExpand'. the plannode can expand the partitioned table scan node into a list of partitions according to the filter and conditions. andit can expand partitioned table JOIN node into a list of partitions JOIN node wisely.<br />We implemented a DynamicPrunePartitionmethod, which can expand the partitioned table's scan node into a list of partition's scan node.<br/>We implemented a DynamicPrunePartitionJoin method, which can expand the partitioned table's JOIN node into a listof partition's JOIN node.<br />These expand action happend in ExecInitPartitionExpand function, when initialize the executor.and all these action implemented based on the partition catalog.<br /><br />For UPDATE and DELETE action, we justset real partition as the ResultRelInfo, when ExecPartitionExpand is running.<br /><br />For pg_dump backup action, weshould dump the partition catalog, and relpartition field in pg_class.<br /><br />so these are the main points of the design,and I can show any detail you wondered later.<br />
pgsql-hackers by date: