Home > mailing lists

Re: Declarative partitioning - Mailing list pgsql-hackers

From	Amit Langote
Subject	Re: Declarative partitioning
Date	March 8, 2016 01:39:44
Msg-id	56DE2D4B.1040001@lab.ntt.co.jp Whole thread Raw
In response to	Re: Declarative partitioning (Corey Huinker <corey.huinker@gmail.com>)
Responses	Re: Declarative partitioning Re: Declarative partitioning
List	pgsql-hackers

Tree view

Hi Corey,

Sorry for replying so late.

On 2016/02/25 3:31, Corey Huinker wrote:
> 

[ ... ]

> So I would assume that we'd use a syntax that presumed the columns were in
> a composite range type.
> 
> Which means your creates would look like (following Robert Haas's implied
> suggestion that we leave off the string literal quotes):
> 
> CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES (      , (b,2) );
> CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,2), (b,3) );
> CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,3), (b,4) );
> CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (b,4), (c,2) );
> CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (c,2), (c,3) );
> CREATE TABLE foo_ax1x PARTITION OF foo FOR VALUES [ (c,3), (c,4) );
> 
> That's not terrible looking.

So I tried a grammar that looked like the following:

range_spec: lb_inc bound ',' bound ub_inc    {<create-range-struct>}

lb_inc: '[' { $$ = true; } | '('  { $$ = false; }
ub_inc: ']' { $$ = true; } | ')' { $$ = false; }

bound: a_expr      {         if (IsA($1, RowExpr))    /* (key1, key2, ...) */             $$ = ((RowExpr) $1)->args;
    else /* key */             $$ = list_make1($1);      }      | /* EMPTY */ { $$ = NIL; }

Everything seemed to go dandy until I tried FOR VALUES (blah , blah],
where psql wouldn't send the command string without accepting the closing
parenthesis, :(.  So maybe I should try to put the whole thing in '', that
is, accept the full range_spec in a string, but then we are back to
requiring full-blown range parse function which I was trying to avoid by
using the aforementioned grammar.  So, I decided to move ahead with the
following grammar for time being:

START (lower-bound) [ EXCLUSIVE ]
| END (upper-bound) [ INCLUSIVE ]
| START (lower-bound) [ EXCLUSIVE ] END (upper-bound) [ INCLUSIVE ]

Where,

*-bound: a_expr            | *-bound ',' a_expr

Note that in the absence of explicit specification, lower-bound is
inclusive and upper-bound is exclusive.

So the verbosity couldn't be helped unless accepting range literal in
string form and exporting rangetypes.c:range_parse() with range
partitioning-specific hacks (consider composite bounds) to parse it are
acceptable things.

>> IOW, one shouldn't create an unbounded partition if more partitions in the
>> unbounded direction are expected to be created.  It would be OK for
>> unbounded partitions to be on the lower end most of the times.
>>
> 
> On this I'll have to disagree. My own use case where I use my
> range_partitioning extension starts off with a single partition () and all
> new partitions are splits of that. The ranges evolve over time as
> partitions grow and slow down. It's nice because we're not trying to
> predict where growth will be, we split where growth is.

Okay, perhaps I should not presume a certain usage.  However, as you know,
the usage like yours requires some mechanism of data redistribution (also
not without some syntax), which I am not targeting with the initial patch.If that was the only way of creating
partitions,matters would be a

little easier - you only specify a split point and have some rule about
inclusivity around the split point.  But we have to start with the usage
where each new partition is separately created with explicit partition
bound specification that is complete in itself and that's where the logic
to check partition invariants may get a bit complicated.

> Ok, I'll wait a bit. In the mean time I can tell you a bit about the
> existing production system I'm hoping to replicate in true partitioning
> looks like this:
> 
> Big Master Table:
>      Range partition by C collated text
>            Date Range
>            Date Range
>            ...
>      Range partition by C collated text
>            Date Range
>            Date Range
>            ...
>     ...
> 
> Currently this is accomplished through my range_partitioning module, and
> then using pg_partman on those partitions. It works, but it's a lot of
> moving parts.
> 
> The machine will be a 32 core AWS box. As per usual with AWS, it will be
> have ample memory and CPU, and be somewhat starved for I/O.
> 
> Question: I haven't dove into the code, but I was curious about your tuple
> routing algorithm. Is there any way for the algorithm to begin it's scan of
> candidate partitions based on the destination of the last row inserted this
> statement? I ask because most use cases (that I am aware of) have data that
> would naturally cluster in the same partition.

No.  Actually the tuple-routing function starts afresh for each row.  For
range partitions, it's binary search over an array of upper bounds.  There
is no row-to-row state caching in the partition module itself.

Thanks,
Amit

pgsql-hackers by date:

From: Peter Geoghegan
Date: 08 March 2016, 01:09:34
Subject: Re: Freeze avoidance of very large table.

From: Vitaly Burovoy
Date: 08 March 2016, 01:50:21
Subject: Re: [PATH] Jsonb, insert a new value into an array at arbitrary position

Re: Declarative partitioning - Mailing list pgsql-hackers

Previous

Next