Thread: [PROPOSAL] Table Partition
Hi, everyone! I'd like to propose a postgres partition implementation. First, I would show the design to everyone, and talkabout it. If we think the design is not very bad, and can be commit to the PostgreSQL baseline, then I will post thecode to the community.<br />(note: my english is not very good.)<br /><br />Table Partition Design<br />=====================<br/>In this design, partitions are normal tables in inheritance hierarchies, with the same tablestructure with the partitioned table.<br /><br />In pg_class we have an additional relpartition field which has followingvalues:<br />'s' /* single regular table */<br />'r' /* partitioned table by range */<br />'l' /* partitioned table by list */<br />'h' /* partitioned table by hash */<br />'c' /* child partitiontable */<br /><br />Add a new system schema named 'pg_partition', just like 'pg_toast', we can create the partitioncatalog table to store the partition entries. let's assume the partition catalog's name is pg_partition_2586 (2586is the partitioned table's OID in pg_class).<br />a range or interval partition catalog's structure is as follows:<br/>column data type comment<br />partname name a partition's name,this is the primary key<br />partid oid a partition's OID in pg_class<br />interval text a interval partition's interval(maybe a expression)<br />partkey1 depends onpartitioned table<br />...<br />partkeyN depends on partitioned table<br />partkey1, ..., partkeyN is a partition'supper bound.<br />Finally, make a unique constraint on partkey1, ..., partkeyN.<br />Every time we create a newpartition, we insert a new tuple into this partition catalog.<br />Every time we drop an old partition, we delete therelated tuple in this partition catalog.<br /><br />For a partitioned table's CREATE action, we should transform the actioninto the CREATE action of partitioned table and partitions, and the INSERT action into the partition catalog.<br /><br/>For INSERT action, we implement a RelationGetTuplePartid method, which can find the partition the tuple belongs to.It will do an index scan on the partition catalog table(assume it is pg_partition_2586) to find the partition.<br />anda ExecGetPartitionResultRel method, which can return the partition's ResultRelInfo to execute INSERT action.<br /><br/>For partitioned table's scan action, and JOIN action, we implemented a plan node named 'PartitionExpand'. the plannode can expand the partitioned table scan node into a list of partitions according to the filter and conditions. andit can expand partitioned table JOIN node into a list of partitions JOIN node wisely.<br />We implemented a DynamicPrunePartitionmethod, which can expand the partitioned table's scan node into a list of partition's scan node.<br/>We implemented a DynamicPrunePartitionJoin method, which can expand the partitioned table's JOIN node into a listof partition's JOIN node.<br />These expand action happend in ExecInitPartitionExpand function, when initialize the executor.and all these action implemented based on the partition catalog.<br /><br />For UPDATE and DELETE action, we justset real partition as the ResultRelInfo, when ExecPartitionExpand is running.<br /><br />For pg_dump backup action, weshould dump the partition catalog, and relpartition field in pg_class.<br /><br />so these are the main points of the design,and I can show any detail you wondered later.<br />
Hello, On 2015-08-30 PM 10:42, My Life wrote: > > For partitioned table's scan action, and JOIN action, we implemented > a plan node named 'PartitionExpand'. the plan node can expand the > partitioned table scan node into a list of partitions according to > the filter and conditions. and it can expand partitioned table JOIN > node into a list of partitions JOIN node wisely. > We implemented a DynamicPrunePartition method, which can expand the > partitioned table's scan node into a list of partition's scan node. > We implemented a DynamicPrunePartitionJoin method, which can expand > the partitioned table's JOIN node into a list of partition's JOIN node. > These expand action happend in ExecInitPartitionExpand function, when > initialize the executor. and all these action implemented based on the > partition catalog. > In your design, can index scan be used for individual partition? If yes, can you share how it is handled? Thanks, Amit
> There is already a recent proposal on hackers about partition support in PostgreSQL by Amit Langote.
> You will find the thread at http://www.postgresql.org/message-id/55D3093C.5010800@lab.ntt.co.jp.
Actually, I have seen this design before, and it was not just a design, it has been implemented. I agree with it although I still have some reservations, because I think it is a little more complicated.
1. you store all partition info into 2 system catalogs: pg_partitioned_rel, pg_partition. it will be less efficient to access and maintain the partition info, include scan, add, delete, modify action, maybe concurrency. And the data volumn will get larger and larger.
2. the column 'anyarray partrangebounds' in pg_partition is not very accurate, and we cannot use the index access operators associated with some data type.
> In your design, can index scan be used for individual partition? If yes,
> can you share how it is handled?
of course, index scan can be used for individual partition.
because we make a unique constraint on partkey1, ..., partkeyN on pg_partition.pg_partition_2586.
the unique constraint is really a unique index.
if a index scan's index condition involved the partkey, we can use this unique index to choose the matched partitions in a specified order. and expand the partitioned table's index scan node into a list of partition's index scan node. In this way, the 'PartitionExpand' plan node likes the 'Append' plan node, but has some difference.
------------------ Original ------------------
> You will find the thread at http://www.postgresql.org/message-id/55D3093C.5010800@lab.ntt.co.jp.
Actually, I have seen this design before, and it was not just a design, it has been implemented. I agree with it although I still have some reservations, because I think it is a little more complicated.
1. you store all partition info into 2 system catalogs: pg_partitioned_rel, pg_partition. it will be less efficient to access and maintain the partition info, include scan, add, delete, modify action, maybe concurrency. And the data volumn will get larger and larger.
2. the column 'anyarray partrangebounds' in pg_partition is not very accurate, and we cannot use the index access operators associated with some data type.
> In your design, can index scan be used for individual partition? If yes,
> can you share how it is handled?
of course, index scan can be used for individual partition.
because we make a unique constraint on partkey1, ..., partkeyN on pg_partition.pg_partition_2586.
the unique constraint is really a unique index.
if a index scan's index condition involved the partkey, we can use this unique index to choose the matched partitions in a specified order. and expand the partitioned table's index scan node into a list of partition's index scan node. In this way, the 'PartitionExpand' plan node likes the 'Append' plan node, but has some difference.
------------------ Original ------------------
From: "Ashutosh Bapat";<ashutosh.bapat@enterprisedb.com>;
Date: Mon, Aug 31, 2015 02:43 PM
To: "My Life"<life.show@qq.com>;
Copy: "pgsql-hackers"<pgsql-hackers@postgresql.org>; "tgl"<tgl@sss.pgh.pa.us>; "bruce"<bruce@momjian.us>; "robertmhaas"<robertmhaas@gmail.com>;
Subject: Re: [HACKERS] Proposal of Table Partition
There is already a recent proposal on hackers about partition support in PostgreSQL by Amit Langote. You will find the thread at http://www.postgresql.org/message-id/55D3093C.5010800@lab.ntt.co.jp. May be you can collaborate with the ongoing work.
------------------ Original ------------------
From: "Amit Langote";<Langote_Amit_f8@lab.ntt.co.jp>;
Date: Mon, Aug 31, 2015 03:14 PM
To: "My Life"<life.show@qq.com>; "pgsql-hackers"<pgsql-hackers@postgresql.org>;
Subject: Re: [HACKERS] [PROPOSAL] Table Partition
Hello,
On 2015-08-30 PM 10:42, My Life wrote:
>
> For partitioned table's scan action, and JOIN action, we implemented
> a plan node named 'PartitionExpand'. the plan node can expand the
> partitioned table scan node into a list of partitions according to
> the filter and conditions. and it can expand partitioned table JOIN
> node into a list of partitions JOIN node wisely.
> We implemented a DynamicPrunePartition method, which can expand the
> partitioned table's scan node into a list of partition's scan node.
> We implemented a DynamicPrunePartitionJoin method, which can expand
> the partitioned table's JOIN node into a list of partition's JOIN node.
> These expand action happend in ExecInitPartitionExpand function, when
> initialize the executor. and all these action implemented based on the
> partition catalog.
>
In your design, can index scan be used for individual partition? If yes,
can you share how it is handled?
Thanks,
Amit