Re: Declarative partitioning - another take - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: Declarative partitioning - another take |
Date | |
Msg-id | c7972ce6-50b8-2a13-c1e5-658a32b549ca@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Declarative partitioning - another take (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Declarative partitioning - another take
|
List | pgsql-hackers |
Thanks again for the prompt review! On 2016/11/10 2:00, Robert Haas wrote: > On Wed, Nov 9, 2016 at 6:14 AM, Amit Langote > <Langote_Amit_f8@lab.ntt.co.jp> wrote: >>> But none of that is really a problem. Sure, the 'P' case will never >>> arise in 9.6, but so what? I'd really like to not keep duplicating >>> these increasingly-complex hunks of code if we can find some way to >>> avoid that. >> >> We cannot reference pg_catalog.pg_get_partkeydef() in the SQL query that >> getTables() sends to pre-10 servers, right? But I suppose we need not >> call it in that particular SQL query in the first place. > > Oh, yeah, that's a problem; the query will error out against older > servers. You could do something like: > > char *partkeydef; > if (version <= 90600) > partkeydef = "NULL"; > else > partkeydef = "CASE WHEN c.relkind = 'P' THEN > pg_catalog.pg_get_partkeydef(c.oid) ELSE NULL END"; > > ...and the use %s to interpolate that into the query string. Yeah, that's a way. >>> I think that in most places were are referring to the "partitioning >>> method" (with ing) but the "partition key" (without ing). Let's try to >>> be consistent. >> >> I'm inclined to switch to "partitioning method" and "partitioning key", >> but do you mean just the documentation or throughout? Beside >> documentation, I mean source code comments, error messages, etc. I have >> assumed throughout. > > I think "partitioning key" is a bit awkward and actually prefer > "partiton key". But "partition method" sounds funny so I would go > with "partitioning method". OK, "partition key" and "partitioning method" it is then. Source code comments, error messages, variables call the latter (partitioning) "strategy" though which hopefully is fine. >>> I don't think it's OK to allow the partition to be added if it >>> contains rows that might not be valid. We are generally vary wary >>> about adding options that can cause data integrity failures and I >>> think we shouldn't start here, either. On the other hand, it's also >>> not desirable for adding a partition to take O(n) time in all cases. >>> So what would be nice is if the new partition could self-certify that >>> contains no problematic rows by having a constraint that matches the >>> new partitioning constraint. Then we can skip the scan if that >>> constraint is already present. >> >> I agree that NO VALIDATE is undesirable and it would be better if we could >> get rid of the same. I want to clarify one thing though: what does it >> mean for the new partition to have a constraint that *matches* the new >> partitioning constraint? Does it mean the new partition's constraint >> *implies* the partitioning constraint? Or as certified by equal()? > > Implies would be better, but equal() might be tolerable. I think a equal() -based method would fail to help in most cases. Consider the following example: create table foo1 (a int); alter table foo add constraint check_a check (a < 10 and a >= 1); create table foo (a int) partition by range (a); alter table foo attach partition foo1 for values from (1) to (10); The last command will internally generate the partition constraints that is basically a list of implicitly AND'd expressions viz. a is not null, a >= 1 and a < 10 which we wrap into a BoolExpr. It would not be equal() with foo1's constraint even though they are basically the same constraint. So, simple structural equality may prove to be less productive, IMHO. It seems a *implies* -based solution would work much better, although a bit slower for obvious reasons. I reckon slownsess is not a big issue in this case. So I prototyped the same using predicate_implied_by() which seems to work reasonably. Looks something like this: skip_validate = false; if (predicate_implied_by(partConstraint, existConstraint)) skip_validate = true; Where partConstraint is 1-member list with the new partition constraint and existConstraint is a list of the existing constraints of the table being attached, derived from its TupleConstr. >>> + /* >>> + * Never put a >>> null into the values array, flag >>> + * instead for >>> the code further down below where >>> + * we >>> construct the actual relcache struct. >>> + */ >>> + >>> found_null_partition = true; >>> + >>> null_partition_index = i; >>> >>> How about we elog(ERROR, ...) if found_null_partition is already set? >> >> Makes sense. However, let me mention here that duplicates either within >> one partition's list or across the partitions are not possible. That's >> because in case of the former, we de-duplicate before storing the list >> into the catalog and the latter would simply be an overlap error. Could >> this be made an Assert() then? > > Well, I think that wouldn't be as good, because for example some > misguided person might do a manual update of the catalog. It seems to > me that if the system catalog contents are questionable, it's better > to error out than to have completely undefined behavior. In other > words, we probably want it checked in non-Assert builds. See similar > cases where we have things like: > > elog(ERROR, "conpfeqop is not a 1-D Oid array"); Good point, agreed. >>> + /* >>> + * Is lower_val = upper_val? >>> + */ >>> + if (lower_val && upper_val) >>> >>> So, that comment does not actually match that code. That if-test is >>> just checking whether both bounds are finite. What I think you should >>> be explaining here is that if lower_val and upper_val are both >>> non-infinite, and if the happen to be equal, we want to emit an >>> equality test rather than a >= test plus a <= test because $REASON. >> >> OK, I have added some explanatory comments around this code. > > So, I was kind of assuming that the answer to "why are we doing this" > was "for efficiency". It's true that val >= const AND const <= val > can be simplified to val = const, but it wouldn't be necessary to do > so for correctness. It just looks nicer and runs faster. I guess I ended up with this code following along a bit different way of thinking about it, but in the end what you're saying is true. > By the way, if you want to keep the inclusive stuff, I think you > probably need to consider this logic in light of the possibility that > one bound might be exclusive. Maybe you're thinking that can't happen > anyway because it would have been rejected earlier, but an elog(ERROR, > ...) might still be appropriate just to guard against messed-up > catalogs. Yes, that makes sense. I guess you mean a case like the one shown below: create table foo5 partition of foo for values start (3, 10) end (3, 10); ERROR: cannot create range partition with empty range I think the following would suffice as a guard (checked only if it turns out that lower_val and upper_val are indeed equal): if (i == key->partnatts - 1 && spec->lowerinc != spec->upperinc) elog(ERROR, "invalid range bound specification"); >>> What happens if you try to attach a table as a partition of itself or >>> one of its ancestors? >> >> The latter fails with "already a partition" error. >> >> The former case was not being handled at all which has now been fixed. >> ATExecAddInherit() prevents that case as an instance of preventing the >> circularity of inheritance. It says: ERROR: circular inheritance not >> allowed. And then: DETAIL: "rel" is already a child of "rel". >> >> Should ATExecAttachPartition() use the same trick and keep using the same >> message(s)? The above detail message doesn't quite sound appropriate when >> one tries to attach a table as partition of itself. > > Whichever way is easier to code seems fine. It's a pretty obscure > case, so I don't think we need to add code just to get a very slightly > better error message. OK, so let's keep it same as with regular inheritance. >>> + | FOR VALUES START '(' range_datum_list ')' lb_inc >>> + END_P '(' >>> range_datum_list ')' ub_inc >>> >>> Just a random idea. Would for VALUES FROM ( ... ) TO ( ... ) be more >>> idiomatic than START and END? >> >> It would actually. Should I go ahead and change to FROM (...) TO (...)? > > +1! Done! >> Related to range partitioning, should we finalize on inclusive START/FROM >> and exclusive END/TO preventing explicit specification of the inclusivity? > > I would be in favor of committing the initial patch set without that, > and then considering the possibility of adding it later. If we > include it in the initial patch set we are stuck with it. OK, I have removed the syntactic ability to specify INCLUSIVE/EXCLUSIVE with each of the range bounds. I haven't changed any code (such as comparison functions) that manipulates instances of PartitionRangeBound which has a flag called inclusive. I didn't remove the flag, but is instead just set to (is_lower ? true : false) when initializing from the parse node. Perhaps, there is some scope for further simplifying that code, which you probably alluded to when you proposed that we do this. >> Attached updated patches take care of the above comments and few other >> fixes. There are still a few items I have not addressed right away: >> >> - Remove NO VALIDATE clause in ATTACH PARTITION and instead rely on the >> new partition's constraints to skip the validation scan >> - Remove the syntax to specify inclusivity of each of START and END bounds >> of range partitions and instead assume inclusive START and exclusive END > > OK - what is the time frame for these changes? I have implemented both of these in the attached patch. As mentioned above, the logic to skip the validation scan using the new partition's constraints is still kind of a prototype solution, but it seems to work so far. Comments on the same would be very helpful. Thanks, Amit
Attachment
- 0001-Catalog-and-DDL-for-partitioned-tables-12.patch
- 0002-psql-and-pg_dump-support-for-partitioned-tables-12.patch
- 0003-Catalog-and-DDL-for-partitions-12.patch
- 0004-psql-and-pg_dump-support-for-partitions-12.patch
- 0005-Teach-a-few-places-to-use-partition-check-quals-12.patch
- 0006-Introduce-a-PartitionTreeNode-data-structure-12.patch
- 0007-Tuple-routing-for-partitioned-tables-12.patch
- 0008-Update-DDL-Partitioning-chapter-to-reflect-new-devel-12.patch
pgsql-hackers by date: