Home > mailing lists
Re: Declarative partitioning - another take - Mailing list pgsql-hackers

From	Amit Langote
Subject	Re: Declarative partitioning - another take
Date	November 10, 2016 12:41:12
Msg-id	c7972ce6-50b8-2a13-c1e5-658a32b549ca@lab.ntt.co.jp Whole thread Raw
In response to	Re: Declarative partitioning - another take (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Declarative partitioning - another take
List	pgsql-hackers
Tree view
Thanks again for the prompt review!

On 2016/11/10 2:00, Robert Haas wrote:
> On Wed, Nov 9, 2016 at 6:14 AM, Amit Langote
> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>> But none of that is really a problem.  Sure, the 'P' case will never
>>> arise in 9.6, but so what?  I'd really like to not keep duplicating
>>> these increasingly-complex hunks of code if we can find some way to
>>> avoid that.
>>
>> We cannot reference pg_catalog.pg_get_partkeydef() in the SQL query that
>> getTables() sends to pre-10 servers, right?  But I suppose we need not
>> call it in that particular SQL query in the first place.
>
> Oh, yeah, that's a problem; the query will error out against older
> servers.  You could do something like:
>
> char *partkeydef;
> if (version <= 90600)
>     partkeydef = "NULL";
> else
>     partkeydef = "CASE WHEN c.relkind = 'P' THEN
> pg_catalog.pg_get_partkeydef(c.oid) ELSE NULL END";
>
> ...and the use %s to interpolate that into the query string.

Yeah, that's a way.

>>> I think that in most places were are referring to the "partitioning
>>> method" (with ing) but the "partition key" (without ing). Let's try to
>>> be consistent.
>>
>> I'm inclined to switch to "partitioning method" and "partitioning key",
>> but do you mean just the documentation or throughout?  Beside
>> documentation, I mean source code comments, error messages, etc.  I have
>> assumed throughout.
>
> I think "partitioning key" is a bit awkward and actually prefer
> "partiton key".  But "partition method" sounds funny so I would go
> with "partitioning method".

OK, "partition key" and "partitioning method" it is then.  Source code
comments, error messages, variables call the latter (partitioning)
"strategy" though which hopefully is fine.

>>> I don't think it's OK to allow the partition to be added if it
>>> contains rows that might not be valid.  We are generally vary wary
>>> about adding options that can cause data integrity failures and I
>>> think we shouldn't start here, either.  On the other hand, it's also
>>> not desirable for adding a partition to take O(n) time in all cases.
>>> So what would be nice is if the new partition could self-certify that
>>> contains no problematic rows by having a constraint that matches the
>>> new partitioning constraint.  Then we can skip the scan if that
>>> constraint is already present.
>>
>> I agree that NO VALIDATE is undesirable and it would be better if we could
>> get rid of the same.  I want to clarify one thing though: what does it
>> mean for the new partition to have a constraint that *matches* the new
>> partitioning constraint?  Does it mean the new partition's constraint
>> *implies* the partitioning constraint?  Or as certified by equal()?
>
> Implies would be better, but equal() might be tolerable.

I think a equal() -based method would fail to help in most cases. Consider
the following example:

create table foo1 (a int);
alter table foo add constraint check_a check (a < 10 and a >= 1);

create table foo (a int) partition by range (a);
alter table foo attach partition foo1 for values from (1) to (10);

The last command will internally generate the partition constraints that
is basically a list of implicitly AND'd expressions viz. a is not null, a
>= 1 and a < 10 which we wrap into a BoolExpr.  It would not be equal()
with foo1's constraint even though they are basically the same constraint.
 So, simple structural equality may prove to be less productive, IMHO.

It seems a *implies* -based solution would work much better, although a
bit slower for obvious reasons.  I reckon slownsess is not a big issue in
this case.  So I prototyped the same using predicate_implied_by() which
seems to work reasonably.  Looks something like this:

    skip_validate = false;
    if (predicate_implied_by(partConstraint, existConstraint))
        skip_validate = true;

Where partConstraint is 1-member list with the new partition constraint
and existConstraint is a list of the existing constraints of the table
being attached, derived from its TupleConstr.

>>> +                                                       /*
>>> +                                                        * Never put a
>>> null into the values array, flag
>>> +                                                        * instead for
>>> the code further down below where
>>> +                                                        * we
>>> construct the actual relcache struct.
>>> +                                                        */
>>> +
>>> found_null_partition = true;
>>> +
>>> null_partition_index = i;
>>>
>>> How about we elog(ERROR, ...) if found_null_partition is already set?
>>
>> Makes sense.  However, let me mention here that duplicates either within
>> one partition's list or across the partitions are not possible.  That's
>> because in case of the former, we de-duplicate before storing the list
>> into the catalog and the latter would simply be an overlap error.  Could
>> this be made an Assert() then?
>
> Well, I think that wouldn't be as good, because for example some
> misguided person might do a manual update of the catalog.  It seems to
> me that if the system catalog contents are questionable, it's better
> to error out than to have completely undefined behavior.  In other
> words, we probably want it checked in non-Assert builds.  See similar
> cases where we have things like:
>
> elog(ERROR, "conpfeqop is not a 1-D Oid array");

Good point, agreed.

>>> +               /*
>>> +                * Is lower_val = upper_val?
>>> +                */
>>> +               if (lower_val && upper_val)
>>>
>>> So, that comment does not actually match that code.  That if-test is
>>> just checking whether both bounds are finite.  What I think you should
>>> be explaining here is that if lower_val and upper_val are both
>>> non-infinite, and if the happen to be equal, we want to emit an
>>> equality test rather than a >= test plus a <= test because $REASON.
>>
>> OK, I have added some explanatory comments around this code.
>
> So, I was kind of assuming that the answer to "why are we doing this"
> was "for efficiency".  It's true that val >= const AND const <= val
> can be simplified to val = const, but it wouldn't be necessary to do
> so for correctness.  It just looks nicer and runs faster.

I guess I ended up with this code following along a bit different way of
thinking about it, but in the end what you're saying is true.

> By the way, if you want to keep the inclusive stuff, I think you
> probably need to consider this logic in light of the possibility that
> one bound might be exclusive.  Maybe you're thinking that can't happen
> anyway because it would have been rejected earlier, but an elog(ERROR,
> ...) might still be appropriate just to guard against messed-up
> catalogs.

Yes, that makes sense.  I guess you mean a case like the one shown below:

create table foo5 partition of foo for values start (3, 10) end (3, 10);
ERROR:  cannot create range partition with empty range

I think the following would suffice as a guard (checked only if it turns
out that lower_val and upper_val are indeed equal):

    if (i == key->partnatts - 1 && spec->lowerinc != spec->upperinc)
        elog(ERROR, "invalid range bound specification");


>>> What happens if you try to attach a table as a partition of itself or
>>> one of its ancestors?
>>
>> The latter fails with "already a partition" error.
>>
>> The former case was not being handled at all which has now been fixed.
>> ATExecAddInherit() prevents that case as an instance of preventing the
>> circularity of inheritance.  It says: ERROR: circular inheritance not
>> allowed.  And then: DETAIL: "rel" is already a child of "rel".
>>
>> Should ATExecAttachPartition() use the same trick and keep using the same
>> message(s)?  The above detail message doesn't quite sound appropriate when
>> one tries to attach a table as partition of itself.
>
> Whichever way is easier to code seems fine.  It's a pretty obscure
> case, so I don't think we need to add code just to get a very slightly
> better error message.

OK, so let's keep it same as with regular inheritance.

>>> +                       | FOR VALUES START '(' range_datum_list ')' lb_inc
>>> +                                                END_P '('
>>> range_datum_list ')' ub_inc
>>>
>>> Just a random idea.  Would for VALUES FROM ( ... ) TO ( ... ) be more
>>> idiomatic than START and END?
>>
>> It would actually.  Should I go ahead and change to FROM (...) TO (...)?
>
> +1!

Done!

>> Related to range partitioning, should we finalize on inclusive START/FROM
>> and exclusive END/TO preventing explicit specification of the inclusivity?
>
> I would be in favor of committing the initial patch set without that,
> and then considering the possibility of adding it later.  If we
> include it in the initial patch set we are stuck with it.

OK, I have removed the syntactic ability to specify INCLUSIVE/EXCLUSIVE
with each of the range bounds.

I haven't changed any code (such as comparison functions) that manipulates
instances of PartitionRangeBound which has a flag called inclusive.  I
didn't remove the flag, but is instead just set to (is_lower ? true :
false) when initializing from the parse node. Perhaps, there is some scope
for further simplifying that code, which you probably alluded to when you
proposed that we do this.

>> Attached updated patches take care of the above comments and few other
>> fixes.  There are still a few items I have not addressed right away:
>>
>> - Remove NO VALIDATE clause in ATTACH PARTITION and instead rely on the
>>   new partition's constraints to skip the validation scan
>> - Remove the syntax to specify inclusivity of each of START and END bounds
>>   of range partitions and instead assume inclusive START and exclusive END
>
> OK - what is the time frame for these changes?

I have implemented both of these in the attached patch.  As mentioned
above, the logic to skip the validation scan using the new partition's
constraints is still kind of a prototype solution, but it seems to work so
far.  Comments on the same would be very helpful.

Thanks,
Amit
Attachment

pgsql-hackers by date:
From: Dean Rasheed
Date: 10 November 2016, 12:17:54
Subject: Re: Improving RLS planning
From: Emre Hasegeli
Date: 10 November 2016, 12:41:24
Subject: Re: Floating point comparison inconsistencies of the geometric types
Re: Declarative partitioning - another take - Mailing list pgsql-hackers

Attachment

Previous

Next