Thread: [HACKERS] hash partitioning based on v10Beta2
Hi all,
Now we have had the range / list partition, but hash partitioning is not implemented yet.
Attached is a POC patch based on the v10Beta2 to add the hash partitioning feature.
Although we will need more discussions about the syntax and other specifications before going ahead the project,
but I think this runnable code might help to discuss what and how we implement this.
Description
The hash partition's implement is on the basis of the original range / list partition,and using similar syntax.
To create a partitioned table ,use:
CREATE TABLE h (id int) PARTITION BY HASH(id);
The partitioning key supports only one value, and I think the partition key can support multiple values,
Now we have had the range / list partition, but hash partitioning is not implemented yet.
Attached is a POC patch based on the v10Beta2 to add the hash partitioning feature.
Although we will need more discussions about the syntax and other specifications before going ahead the project,
but I think this runnable code might help to discuss what and how we implement this.
Description
The hash partition's implement is on the basis of the original range / list partition,and using similar syntax.
To create a partitioned table ,use:
CREATE TABLE h (id int) PARTITION BY HASH(id);
The partitioning key supports only one value, and I think the partition key can support multiple values,
which may be difficult to implement when querying, but it is not impossible.
A partition table can be create as bellow:
CREATE TABLE h1 PARTITION OF h;
CREATE TABLE h2 PARTITION OF h;
CREATE TABLE h3 PARTITION OF h;
FOR VALUES clause cannot be used, and the partition bound is calclulated automatically as partition index of single integer value.
An inserted record is stored in a partition whose index equals
DatumGetUInt32(OidFunctionCall1(lookup_type_cache(key->parttypid[0], TYPECACHE_HASH_PROC)->hash_proc, values[0])) % nparts /* Number of partitions */
A partition table can be create as bellow:
CREATE TABLE h1 PARTITION OF h;
CREATE TABLE h2 PARTITION OF h;
CREATE TABLE h3 PARTITION OF h;
FOR VALUES clause cannot be used, and the partition bound is calclulated automatically as partition index of single integer value.
An inserted record is stored in a partition whose index equals
DatumGetUInt32(OidFunctionCall1(lookup_type_cache(key->parttypid[0], TYPECACHE_HASH_PROC)->hash_proc, values[0])) % nparts /* Number of partitions */
;
In the above example, this is DatumGetUInt32(OidFunctionCall1(lookup_type_cache(key->parttypid[0], TYPECACHE_HASH_PROC)->hash_proc, id)) % 3;
postgres=# insert into h select generate_series(1,20);
INSERT 0 20
postgres=# select tableoid::regclass,* from h;
tableoid | id
----------+----
h1 | 3
h1 | 5
h1 | 17
h1 | 19
h2 | 2
h2 | 6
h2 | 7
h2 | 11
h2 | 12
h2 | 14
h2 | 15
h2 | 18
h2 | 20
h3 | 1
h3 | 4
h3 | 8
h3 | 9
h3 | 10
h3 | 13
h3 | 16
(20 rows)
The number of partitions here can be dynamically added, and if a new partition is created, the number of partitions changes, the calculated target partitions will change, and the same data is not reasonable in different partitions,So you need to re-calculate the existing data and insert the target partition when you create a new partition.
postgres=# create table h4 partition of h;
CREATE TABLE
postgres=# select tableoid::regclass,* from h;
tableoid | id
----------+----
h1 | 5
h1 | 17
h1 | 19
h1 | 6
h1 | 12
h1 | 8
h1 | 13
h2 | 11
h2 | 14
h3 | 1
h3 | 9
h3 | 2
h3 | 15
h4 | 3
h4 | 7
h4 | 18
h4 | 20
h4 | 4
h4 | 10
h4 | 16
(20 rows)
When querying the data, the hash partition uses the same algorithm as the insertion, and filters out the table that does not need to be scanned.
postgres=# explain analyze select * from h where id = 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Append (cost=0.00..41.88 rows=13 width=4) (actual time=0.020..0.023 rows=1 loops=1)
-> Seq Scan on h3 (cost=0.00..41.88 rows=13 width=4) (actual time=0.013..0.016 rows=1 loops=1)
Filter: (id = 1)
Rows Removed by Filter: 3
Planning time: 0.346 ms
Execution time: 0.061 ms
(6 rows)
postgres=# explain analyze select * from h where id in (1,5);;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Append (cost=0.00..83.75 rows=52 width=4) (actual time=0.016..0.028 rows=2 loops=1)
-> Seq Scan on h1 (cost=0.00..41.88 rows=26 width=4) (actual time=0.015..0.018 rows=1 loops=1)
Filter: (id = ANY ('{1,5}'::integer[]))
Rows Removed by Filter: 6
-> Seq Scan on h3 (cost=0.00..41.88 rows=26 width=4) (actual time=0.005..0.007 rows=1 loops=1)
Filter: (id = ANY ('{1,5}'::integer[]))
Rows Removed by Filter: 3
Planning time: 0.720 ms
Execution time: 0.074 ms
(9 rows)
postgres=# explain analyze select * from h where id = 1 or id = 5;;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Append (cost=0.00..96.50 rows=50 width=4) (actual time=0.017..0.078 rows=2 loops=1)
-> Seq Scan on h1 (cost=0.00..48.25 rows=25 width=4) (actual time=0.015..0.019 rows=1 loops=1)
Filter: ((id = 1) OR (id = 5))
Rows Removed by Filter: 6
-> Seq Scan on h3 (cost=0.00..48.25 rows=25 width=4) (actual time=0.005..0.010 rows=1 loops=1)
Filter: ((id = 1) OR (id = 5))
Rows Removed by Filter: 3
Planning time: 0.396 ms
Execution time: 0.139 ms
(9 rows)
Can not detach / attach / drop partition table.
In the above example, this is DatumGetUInt32(OidFunctionCall1(lookup_type_cache(key->parttypid[0], TYPECACHE_HASH_PROC)->hash_proc, id)) % 3;
postgres=# insert into h select generate_series(1,20);
INSERT 0 20
postgres=# select tableoid::regclass,* from h;
tableoid | id
----------+----
h1 | 3
h1 | 5
h1 | 17
h1 | 19
h2 | 2
h2 | 6
h2 | 7
h2 | 11
h2 | 12
h2 | 14
h2 | 15
h2 | 18
h2 | 20
h3 | 1
h3 | 4
h3 | 8
h3 | 9
h3 | 10
h3 | 13
h3 | 16
(20 rows)
The number of partitions here can be dynamically added, and if a new partition is created, the number of partitions changes, the calculated target partitions will change, and the same data is not reasonable in different partitions,So you need to re-calculate the existing data and insert the target partition when you create a new partition.
postgres=# create table h4 partition of h;
CREATE TABLE
postgres=# select tableoid::regclass,* from h;
tableoid | id
----------+----
h1 | 5
h1 | 17
h1 | 19
h1 | 6
h1 | 12
h1 | 8
h1 | 13
h2 | 11
h2 | 14
h3 | 1
h3 | 9
h3 | 2
h3 | 15
h4 | 3
h4 | 7
h4 | 18
h4 | 20
h4 | 4
h4 | 10
h4 | 16
(20 rows)
When querying the data, the hash partition uses the same algorithm as the insertion, and filters out the table that does not need to be scanned.
postgres=# explain analyze select * from h where id = 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Append (cost=0.00..41.88 rows=13 width=4) (actual time=0.020..0.023 rows=1 loops=1)
-> Seq Scan on h3 (cost=0.00..41.88 rows=13 width=4) (actual time=0.013..0.016 rows=1 loops=1)
Filter: (id = 1)
Rows Removed by Filter: 3
Planning time: 0.346 ms
Execution time: 0.061 ms
(6 rows)
postgres=# explain analyze select * from h where id in (1,5);;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Append (cost=0.00..83.75 rows=52 width=4) (actual time=0.016..0.028 rows=2 loops=1)
-> Seq Scan on h1 (cost=0.00..41.88 rows=26 width=4) (actual time=0.015..0.018 rows=1 loops=1)
Filter: (id = ANY ('{1,5}'::integer[]))
Rows Removed by Filter: 6
-> Seq Scan on h3 (cost=0.00..41.88 rows=26 width=4) (actual time=0.005..0.007 rows=1 loops=1)
Filter: (id = ANY ('{1,5}'::integer[]))
Rows Removed by Filter: 3
Planning time: 0.720 ms
Execution time: 0.074 ms
(9 rows)
postgres=# explain analyze select * from h where id = 1 or id = 5;;
QUERY PLAN
----------------------------------------------------------------------------------------------------
Append (cost=0.00..96.50 rows=50 width=4) (actual time=0.017..0.078 rows=2 loops=1)
-> Seq Scan on h1 (cost=0.00..48.25 rows=25 width=4) (actual time=0.015..0.019 rows=1 loops=1)
Filter: ((id = 1) OR (id = 5))
Rows Removed by Filter: 6
-> Seq Scan on h3 (cost=0.00..48.25 rows=25 width=4) (actual time=0.005..0.010 rows=1 loops=1)
Filter: ((id = 1) OR (id = 5))
Rows Removed by Filter: 3
Planning time: 0.396 ms
Execution time: 0.139 ms
(9 rows)
Can not detach / attach / drop partition table.
Best regards, young
https://yonj1e.github.io/
Attachment
On Sat, Aug 26, 2017 at 10:10 AM, yangjie@highgo.com <yangjie@highgo.com> wrote:
Hi all,
Now we have had the range /list partition, but hash partitioning is not implemented yet.
Attached is a POC patch basedon the v10Beta2 to add the hash partitioning feature.
Although we will need morediscussions about the syntax and other specifications before going ahead the project,
but I think this runnablecode might help to discuss what and how we implement this.
FYI, there is already an existing commitfest entry for this project.
Description
The hash partition'simplement is on the basis of the original range / list partition,and using similar syntax.
To create a partitioned table,use:
CREATE TABLE h (id int)PARTITION BY HASH(id);
The partitioning key supportsonly one value, and I think the partition key can support multiple values, which may be difficult toimplement when querying, but it is not impossible.
A partition table can becreate as bellow:
CREATE TABLE h1 PARTITION OFh;
CREATE TABLE h2 PARTITION OFh;
CREATE TABLE h3 PARTITION OFh;
FOR VALUES clause cannot beused, and the partition bound is calclulated automatically as partition index of single integer value.
An inserted record is storedin a partition whose index equals
DatumGetUInt32(OidFunctionCall1(lookup_type_ cache(key->parttypid[0], TYPECACHE_HASH_PROC)->hash_ proc, values[0])) % nparts /* Number of partitions */ ;
In the above example, this isDatumGetUInt32( OidFunctionCall1(lookup_type_ cache(key->parttypid[0], TYPECACHE_HASH_PROC)->hash_ proc, id)) % 3;
postgres=# insert into hselect generate_series(1,20);
INSERT 0 20
postgres=# select tableoid::regclass,* from h;
tableoid | id
----------+----
h1 | 3
h1 | 5
h1 | 17
h1 | 19
h2 | 2
h2 | 6
h2 | 7
h2 | 11
h2 | 12
h2 | 14
h2 | 15
h2 | 18
h2 | 20
h3 | 1
h3 | 4
h3 | 8
h3 | 9
h3 | 10
h3 | 13
h3 | 16
(20 rows)
The number of partitions herecan be dynamically added, and if a new partition is created, the number of partitions changes, the calculated target partitions will change, and the same data is not reasonable in different partitions,So you need to re- calculate the existing data and insert the target partition when you create a new partition.
postgres=# create table h4partition of h;
CREATE TABLE
postgres=# select tableoid::regclass,* from h;
tableoid | id
----------+----
h1 | 5
h1 | 17
h1 | 19
h1 | 6
h1 | 12
h1 | 8
h1 | 13
h2 | 11
h2 | 14
h3 | 1
h3 | 9
h3 | 2
h3 | 15
h4 | 3
h4 | 7
h4 | 18
h4 | 20
h4 | 4
h4 | 10
h4 | 16
(20 rows)
When querying the data, thehash partition uses the same algorithm as the insertion, and filters out the table that does not need to be scanned.
postgres=# explain analyzeselect * from h where id = 1;
QUERY PLAN
------------------------------------------------------------ ------------------------------ ----------
Append (cost=0.00..41.88rows=13 width=4) (actual time= 0.020..0.023 rows=1 loops=1)
-> Seq Scan on h3 (cost=0.00..41.88 rows=13 width=4) ( actual time=0.013..0.016 rows= 1 loops=1)
Filter: (id = 1)
Rows Removed byFilter: 3
Planning time: 0.346 ms
Execution time: 0.061 ms
(6 rows)
postgres=# explain analyzeselect * from h where id in ( 1,5);;
QUERY PLAN
------------------------------------------------------------ ------------------------------ ----------
Append (cost=0.00..83.75rows=52 width=4) (actual time= 0.016..0.028 rows=2 loops=1)
-> Seq Scan on h1 (cost=0.00..41.88 rows=26 width=4) ( actual time=0.015..0.018 rows= 1 loops=1)
Filter: (id = ANY ('{1,5}'::integer[]))
Rows Removed byFilter: 6
-> Seq Scan on h3 (cost=0.00..41.88 rows=26 width=4) ( actual time=0.005..0.007 rows= 1 loops=1)
Filter: (id = ANY ('{1,5}'::integer[]))
Rows Removed byFilter: 3
Planning time: 0.720 ms
Execution time: 0.074 ms
(9 rows)
postgres=# explain analyzeselect * from h where id = 1 or id = 5;;
QUERY PLAN
------------------------------------------------------------ ------------------------------ ----------
Append (cost=0.00..96.50rows=50 width=4) (actual time= 0.017..0.078 rows=2 loops=1)
-> Seq Scan on h1 (cost=0.00..48.25 rows=25 width=4) ( actual time=0.015..0.019 rows= 1 loops=1)
Filter: ((id = 1) OR(id = 5))
Rows Removed byFilter: 6
-> Seq Scan on h3 (cost=0.00..48.25 rows=25 width=4) ( actual time=0.005..0.010 rows= 1 loops=1)
Filter: ((id = 1) OR(id = 5))
Rows Removed byFilter: 3
Planning time: 0.396 ms
Execution time: 0.139 ms
(9 rows)
Can not detach / attach /drop partition table. Best regards, young
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Rushabh Lathia
There's something I don't understand.
If there's a hash partition, it won't be adding new ones, right?
Is that what I need to get under him?
If there's a hash partition, it won't be adding new ones, right?
Is that what I need to get under him?
祝工作顺利!
----------------------------------
杨 杰 产品开发部
瀚高基础软件股份有限公司
地址:济南市高新区新泺大街2117号铭盛大厦20层
手机:159-6633-5315 邮箱:yangjie@highgo.com
From: Rushabh LathiaDate: 2017-08-28 14:27CC: pgsql-hackersSubject: Re: [HACKERS] hash partitioning based on v10Beta2div.FoxDIV_20170828143947501 { font-size: 10.5pt } On Sat, Aug 26, 2017 at 10:10 AM, yangjie@highgo.com <yangjie@highgo.com> wrote:Hi all,
Now we have had the range /list partition, but hash partitioning is not implemented yet.
Attached is a POC patch basedon the v10Beta2 to add the hash partitioning feature.
Although we will need morediscussions about the syntax and other specifications before going ahead the project,
but I think this runnablecode might help to discuss what and how we implement this. FYI, there is already an existing commitfest entry for this project.Description
The hash partition'simplement is on the basis of the original range / list partition,and using similar syntax.
To create a partitioned table,use:
CREATE TABLE h (id int)PARTITION BY HASH(id);
The partitioning key supportsonly one value, and I think the partition key can support multiple values, which may be difficult toimplement when querying, but it is not impossible.
A partition table can becreate as bellow:
CREATE TABLE h1 PARTITION OFh;
CREATE TABLE h2 PARTITION OFh;
CREATE TABLE h3 PARTITION OFh;
FOR VALUES clause cannot beused, and the partition bound is calclulated automatically as partition index of single integer value.
An inserted record is storedin a partition whose index equals
DatumGetUInt32(OidFunctionCall1(lookup_type_ cache(key->parttypid[0], TYPECACHE_HASH_PROC)->hash_ proc, values[0])) % nparts /* Number of partitions */ ;
In the above example, this isDatumGetUInt32( OidFunctionCall1(lookup_type_ cache(key->parttypid[0], TYPECACHE_HASH_PROC)->hash_ proc, id)) % 3;
postgres=# insert into hselect generate_series(1,20);
INSERT 0 20
postgres=# select tableoid::regclass,* from h;
tableoid | id
----------+----
h1 | 3
h1 | 5
h1 | 17
h1 | 19
h2 | 2
h2 | 6
h2 | 7
h2 | 11
h2 | 12
h2 | 14
h2 | 15
h2 | 18
h2 | 20
h3 | 1
h3 | 4
h3 | 8
h3 | 9
h3 | 10
h3 | 13
h3 | 16
(20 rows)
The number of partitions herecan be dynamically added, and if a new partition is created, the number of partitions changes, the calculated target partitions will change, and the same data is not reasonable in different partitions,So you need to re- calculate the existing data and insert the target partition when you create a new partition.
postgres=# create table h4partition of h;
CREATE TABLE
postgres=# select tableoid::regclass,* from h;
tableoid | id
----------+----
h1 | 5
h1 | 17
h1 | 19
h1 | 6
h1 | 12
h1 | 8
h1 | 13
h2 | 11
h2 | 14
h3 | 1
h3 | 9
h3 | 2
h3 | 15
h4 | 3
h4 | 7
h4 | 18
h4 | 20
h4 | 4
h4 | 10
h4 | 16
(20 rows)
When querying the data, thehash partition uses the same algorithm as the insertion, and filters out the table that does not need to be scanned.
postgres=# explain analyzeselect * from h where id = 1;
QUERY PLAN
------------------------------------------------------------ ------------------------------ ----------
Append (cost=0.00..41.88rows=13 width=4) (actual time= 0.020..0.023 rows=1 loops=1)
-> Seq Scan on h3 (cost=0.00..41.88 rows=13 width=4) ( actual time=0.013..0.016 rows= 1 loops=1)
Filter: (id = 1)
Rows Removed byFilter: 3
Planning time: 0.346 ms
Execution time: 0.061 ms
(6 rows)
postgres=# explain analyzeselect * from h where id in ( 1,5);;
QUERY PLAN
------------------------------------------------------------ ------------------------------ ----------
Append (cost=0.00..83.75rows=52 width=4) (actual time= 0.016..0.028 rows=2 loops=1)
-> Seq Scan on h1 (cost=0.00..41.88 rows=26 width=4) ( actual time=0.015..0.018 rows= 1 loops=1)
Filter: (id = ANY ('{1,5}'::integer[]))
Rows Removed byFilter: 6
-> Seq Scan on h3 (cost=0.00..41.88 rows=26 width=4) ( actual time=0.005..0.007 rows= 1 loops=1)
Filter: (id = ANY ('{1,5}'::integer[]))
Rows Removed byFilter: 3
Planning time: 0.720 ms
Execution time: 0.074 ms
(9 rows)
postgres=# explain analyzeselect * from h where id = 1 or id = 5;;
QUERY PLAN
------------------------------------------------------------ ------------------------------ ----------
Append (cost=0.00..96.50rows=50 width=4) (actual time= 0.017..0.078 rows=2 loops=1)
-> Seq Scan on h1 (cost=0.00..48.25 rows=25 width=4) ( actual time=0.015..0.019 rows= 1 loops=1)
Filter: ((id = 1) OR(id = 5))
Rows Removed byFilter: 6
-> Seq Scan on h3 (cost=0.00..48.25 rows=25 width=4) ( actual time=0.005..0.010 rows= 1 loops=1)
Filter: ((id = 1) OR(id = 5))
Rows Removed byFilter: 3
Planning time: 0.396 ms
Execution time: 0.139 ms
(9 rows)
Can not detach / attach /drop partition table. Best regards, young
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers --Rushabh Lathia
On Sat, Aug 26, 2017 at 12:40 AM, yangjie@highgo.com <yangjie@highgo.com> wrote: > A partition table can be create as bellow: > > CREATE TABLE h1 PARTITION OF h; > CREATE TABLE h2 PARTITION OF h; > CREATE TABLE h3 PARTITION OF h; This syntax is very problematic for reasons that have been discussed on the existing hash partitioning thread. Fortunately, a solution has already been implemented... you're the third person to try to write a patch for this feature. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi,
When the number of partitions and the data are more, adding new partitions, there will be some efficiency problems.
I don't know how the solution you're talking about is how to implement a hash partition?
On 8/28/2017 22:25,Robert Haas<robertmhaas@gmail.com> wrote:
On Sat, Aug 26, 2017 at 12:40 AM, yangjie@highgo.com <yangjie@highgo.com> wrote:
> A partition table can be create as bellow:
>
> CREATE TABLE h1 PARTITION OF h;
> CREATE TABLE h2 PARTITION OF h;
> CREATE TABLE h3 PARTITION OF h;
This syntax is very problematic for reasons that have been discussed
on the existing hash partitioning thread. Fortunately, a solution has
already been implemented... you're the third person to try to write a
patch for this feature.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Aug 28, 2017 at 10:44 PM, yangjie <yangjie@highgo.com> wrote: > When the number of partitions and the data are more, adding new partitions, > there will be some efficiency problems. > I don't know how the solution you're talking about is how to implement a > hash partition? I am having difficulty understanding this. There was discussion on the other thread of how splitting partitions could be done reasonably efficiently with the proposed design; of course, it's never going to be super-cheap. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company