Re: Declarative partitioning - Mailing list pgsql-hackers
From | Ildar Musin |
---|---|
Subject | Re: Declarative partitioning |
Date | |
Msg-id | 573C8BFB.7050707@postgrespro.ru Whole thread Raw |
In response to | Re: Declarative partitioning (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>) |
Responses |
Re: Declarative partitioning
|
List | pgsql-hackers |
Hi Amit and all,
Here is an experimental patch that optimizes planning time for range partitioned tables (it could be considered as a "proof of concept"). Patch should be applied on top of Amit's declarative partitioning patch. It handles only a very special case (often used though) where partitioning key consists of just a single attribute and doesn't contain expressions.
The main idea is the following:
* we are looking for clauses like 'VAR OP CONST' (where VAR is partitioning key attribute, OP is a comparison operator);
* using binary search find a partition (X) that fits CONST value;
* based on OP operator determine which partitions are also covered by clause. There are possible cases:
1. If OP is '<' or '<=' then we need partitions standing left from X (including)
2. If OP is '>' or '>=' then we need partitions standing right from X (including)
3. If OP is '=' the we need only X partition
(for '<' and '>' operators we also check if CONST value is equal to a lower or upper boundary (accordingly) and if it's true then exclude X).
For boolean expressions we evaluate left and right sides accordingly to algorithm above and then based on boolean operator find intersection (for AND) or union (for OR).
I run some benchmarks on:
1. original constraint exclusion mechanism,
2. optimized version (this patch) and
3. optimized version using relation->rd_partdesc pointer instead of RelationGetPartitionDesc() function (see previous discussion).
Initial conditions:
CREATE TABLE abc (id SERIAL NOT NULL, a INT, dt TIMESTAMP) PARTITION BY RANGE (a);
CREATE TABLE abc_1 PARTITION OF abc FOR VALUES START (0) END (1000);
CREATE TABLE abc_2 PARTITION OF abc FOR VALUES START (1000) END (2000);
...
etc
INSERT INTO %s (a) SELECT generate_series(0, <partitions_count> * 1000);
pgbench scripts: https://gist.github.com/zilder/872e634a8eeb405bd045465fc9527e53 (where :partitions is a number of partitions).
The first script tests fetching a single row from the partitioned table. Results (tps):
# of partitions | constraint excl. | optimized | optimized (using pointer)
----------------+------------------+---------------+----------------------------
100 | 658 | 2906 | 3079
1000 | 45 | 2174 | 3021
2000 | 22 | 1667 | 2919
The second script tests fetching all data from a single partition. Results (tps):
# of partitions | constraint excl. | optimized | optimized (using pointer)
----------------+------------------+---------------+----------------------------
100 | 317 | 1001 | 1051
1000 | 34 | 941 | 1023
2000 | 15 | 813 | 1016
Optimized version works much faster on large amount of partitions and degradates slower than constraint exclusion. But still there is a noticeable performance degradation from copying PartitionDesc structure: with 2000 partitions RelationGetPartitionDesc() function spent more than 40% of all execution time on copying in first benchmark (measured with `perf`). Using reference counting as Amit suggests will allow to significantily decrease performance degradation.
Any comments and suggestions are very welcome. Thanks!
On 18.05.2016 04:26, Amit Langote wrote:
On 2016/05/18 2:22, Tom Lane wrote:The two ways that we've dealt with this type of hazard are to copy data out of the relcache before using it; or to give the relcache the responsibility of not moving a particular portion of data if it did not change. From memory, the latter applies to the tuple descriptor and trigger data, but we've done most other things the first way.It seems that tuple descriptor is reference-counted; however trigger data is copied. The former seems to have been done on performance grounds (I found 06e10abc). So for a performance-sensitive relcache data structure, refcounting is the way to go (although done quite rarely)? In this particular case, it is a "partition descriptor" that could get big for a partitioned table with partitions in hundreds or thousands. Thanks, Amit
Here is an experimental patch that optimizes planning time for range partitioned tables (it could be considered as a "proof of concept"). Patch should be applied on top of Amit's declarative partitioning patch. It handles only a very special case (often used though) where partitioning key consists of just a single attribute and doesn't contain expressions.
The main idea is the following:
* we are looking for clauses like 'VAR OP CONST' (where VAR is partitioning key attribute, OP is a comparison operator);
* using binary search find a partition (X) that fits CONST value;
* based on OP operator determine which partitions are also covered by clause. There are possible cases:
1. If OP is '<' or '<=' then we need partitions standing left from X (including)
2. If OP is '>' or '>=' then we need partitions standing right from X (including)
3. If OP is '=' the we need only X partition
(for '<' and '>' operators we also check if CONST value is equal to a lower or upper boundary (accordingly) and if it's true then exclude X).
For boolean expressions we evaluate left and right sides accordingly to algorithm above and then based on boolean operator find intersection (for AND) or union (for OR).
I run some benchmarks on:
1. original constraint exclusion mechanism,
2. optimized version (this patch) and
3. optimized version using relation->rd_partdesc pointer instead of RelationGetPartitionDesc() function (see previous discussion).
Initial conditions:
CREATE TABLE abc (id SERIAL NOT NULL, a INT, dt TIMESTAMP) PARTITION BY RANGE (a);
CREATE TABLE abc_1 PARTITION OF abc FOR VALUES START (0) END (1000);
CREATE TABLE abc_2 PARTITION OF abc FOR VALUES START (1000) END (2000);
...
etc
INSERT INTO %s (a) SELECT generate_series(0, <partitions_count> * 1000);
pgbench scripts: https://gist.github.com/zilder/872e634a8eeb405bd045465fc9527e53 (where :partitions is a number of partitions).
The first script tests fetching a single row from the partitioned table. Results (tps):
# of partitions | constraint excl. | optimized | optimized (using pointer)
----------------+------------------+---------------+----------------------------
100 | 658 | 2906 | 3079
1000 | 45 | 2174 | 3021
2000 | 22 | 1667 | 2919
The second script tests fetching all data from a single partition. Results (tps):
# of partitions | constraint excl. | optimized | optimized (using pointer)
----------------+------------------+---------------+----------------------------
100 | 317 | 1001 | 1051
1000 | 34 | 941 | 1023
2000 | 15 | 813 | 1016
Optimized version works much faster on large amount of partitions and degradates slower than constraint exclusion. But still there is a noticeable performance degradation from copying PartitionDesc structure: with 2000 partitions RelationGetPartitionDesc() function spent more than 40% of all execution time on copying in first benchmark (measured with `perf`). Using reference counting as Amit suggests will allow to significantily decrease performance degradation.
Any comments and suggestions are very welcome. Thanks!
-- Ildar Musin i.musin@postgrespro.ru
Attachment
pgsql-hackers by date: