Home > mailing lists

Re: Declarative partitioning - Mailing list pgsql-hackers

From	Ildar Musin
Subject	Re: Declarative partitioning
Date	May 18, 2016 15:36:42
Msg-id	573C8BFB.7050707@postgrespro.ru Whole thread Raw
In response to	Re: Declarative partitioning (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Responses	Re: Declarative partitioning
List	pgsql-hackers

Tree view

Hi Amit and all,

On 18.05.2016 04:26, Amit Langote wrote:

On 2016/05/18 2:22, Tom Lane wrote:

The two ways that we've dealt with this type of hazard are to copy data
out of the relcache before using it; or to give the relcache the
responsibility of not moving a particular portion of data if it did not
change.  From memory, the latter applies to the tuple descriptor and
trigger data, but we've done most other things the first way.

It seems that tuple descriptor is reference-counted; however trigger data
is copied.  The former seems to have been done on performance grounds (I
found 06e10abc).

So for a performance-sensitive relcache data structure, refcounting is the
way to go (although done quite rarely)?  In this particular case, it is a
"partition descriptor" that could get big for a partitioned table with
partitions in hundreds or thousands.

Thanks,
Amit

Here is an experimental patch that optimizes planning time for range partitioned tables (it could be considered as a "proof of concept"). Patch should be applied on top of Amit's declarative partitioning patch. It handles only a very special case (often used though) where partitioning key consists of just a single attribute and doesn't contain expressions.

The main idea is the following:
* we are looking for clauses like 'VAR OP CONST' (where VAR is partitioning key attribute, OP is a comparison operator);
* using binary search find a partition (X) that fits CONST value;
* based on OP operator determine which partitions are also covered by clause. There are possible cases:
   1. If OP is '<' or '<=' then we need partitions standing left from X (including)
   2. If OP is '>' or '>=' then we need partitions standing right from X (including)
   3. If OP is '=' the we need only X partition
(for '<' and '>' operators we also check if CONST value is equal to a lower or upper boundary (accordingly) and if it's true then exclude X).

For boolean expressions we evaluate left and right sides accordingly to algorithm above and then based on boolean operator find intersection (for AND) or union (for OR).

I run some benchmarks on:
1. original constraint exclusion mechanism,
2. optimized version (this patch) and
3. optimized version using relation->rd_partdesc pointer instead of RelationGetPartitionDesc() function (see previous discussion).

Initial conditions:

CREATE TABLE abc (id SERIAL NOT NULL, a INT, dt TIMESTAMP) PARTITION BY RANGE (a);
CREATE TABLE abc_1 PARTITION OF abc FOR VALUES START (0) END (1000);
CREATE TABLE abc_2 PARTITION OF abc FOR VALUES START (1000) END (2000);
...
etc
INSERT INTO %s (a) SELECT generate_series(0, <partitions_count> * 1000);

pgbench scripts: https://gist.github.com/zilder/872e634a8eeb405bd045465fc9527e53 (where :partitions is a number of partitions).
The first script tests fetching a single row from the partitioned table. Results (tps):

# of partitions | constraint excl. |     optimized | optimized (using pointer)
----------------+------------------+---------------+----------------------------
            100 |              658 |          2906 |           3079
           1000 |               45 |          2174 |           3021
           2000 |               22 |          1667 |           2919

The second script tests fetching all data from a single partition. Results (tps):

# of partitions | constraint excl. |     optimized | optimized (using pointer)
----------------+------------------+---------------+----------------------------
            100 |              317 |          1001 |           1051
           1000 |               34 |           941 |           1023
           2000 |               15 |           813 |           1016

Optimized version works much faster on large amount of partitions and degradates slower than constraint exclusion. But still there is a noticeable performance degradation from copying PartitionDesc structure: with 2000 partitions RelationGetPartitionDesc() function spent more than 40% of all execution time on copying in first benchmark (measured with `perf`). Using reference counting as Amit suggests will allow to significantily decrease performance degradation.

Any comments and suggestions are very welcome. Thanks!

-- 
Ildar Musin
i.musin@postgrespro.ru

Attachment

20160518_decl_part_opt.patch

pgsql-hackers by date:

From: Joe Conway
Date: 18 May 2016, 14:10:01
Subject: Re: Reviewing freeze map code

From: chang chao
Date: 18 May 2016, 15:42:26
Subject: Re: The rewritting of join conditions caused a very slow query plan.

Re: Declarative partitioning - Mailing list pgsql-hackers

Attachment

Previous

Next