Re: Declarative partitioning - another take - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Declarative partitioning - another take |
Date | |
Msg-id | CA+Tgmobs7C+4t5xm9LGopUb=2HF+SBE2EQ2NRH5EMd+drs6OUQ@mail.gmail.com Whole thread Raw |
In response to | Re: Declarative partitioning - another take (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>) |
Responses |
Re: Declarative partitioning - another take
|
List | pgsql-hackers |
On Thu, Sep 15, 2016 at 4:53 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > [ new patches ] Re-reviewing 0001. + <entry><structfield>partexprs</structfield></entry> + <entry><type>pg_node_tree</type></entry> This documentation doesn't match pg_partition_table.h, which has partexprsrc and partexprbin. I don't understand why it's a good idea to have both, and there seem to be no comments or documentation supporting that choice anywhere. + The optional <literal>PARTITION BY</> clause specifies a method of + partitioning the table and the corresponding partition key. Table + thus created is called <firstterm>partitioned</firstterm> table. Key + consists of an ordered list of column names and/or expressions when + using the <literal>RANGE</> method, whereas only a single column or + expression can be specified when using the <literal>LIST</> method. + The type of a key column or an expression must have an associated + btree operator class or one must be specified along with the column + or the expression. Both of the sentences in this paragraph that do not begin with "the" need to begin with "the". (In my experience, it's generally a feature of English as spoken in India that connecting words like "the" and "a" are sometimes left out where non-Indian speakers of English would include them, so it would be good to watch out for this issue in general.) Also, I think this should be rephrased a bit to be more clear about how the partitioning key works, like this: The optional <literal>PARTITION BY</literal> clause specifies a method of partitioning the table. The table thus created is called a <firstterm>partitioned</firstterm> table. The parenthesized list of expressions forms the <firsttem>partitioning key</firstterm> for the table. When using range partitioning, the partioning key can include multiple columns or expressions, but for list partitioning, the partitioning key must consist of a single column or expression. If no btree operator class is specified when creating a partitioned table, the default btree operator class for the datatype will be used. If there is none, an error will be reported. + case RELKIND_PARTITIONED_TABLE: options = heap_reloptions(classForm->relkind, datum, false); Why? None of the reloptions that pertain to heap seem relevant to a relkind without storage. But, ah, do partitioned tables have storage? I mean, do we end up with an empty file, or no relfilenode at all? Can I CLUSTER, VACUUM, etc. a partitioned table? It would seem cleaner for the parent to have no relfilenode at all, but I'm guessing we might need some more changes for that to work out. + pg_collation.h pg_range.h pg_transform.h pg_partitioned_table.h\ Whitespace. Also, here and elsewhere, how about using alphabetical order, or anyway preserving it insofar as the existing list is alphabetized? + /* Remove NO INHERIT flag if rel is a partitioned table */ + if (is_no_inherit && + rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + ereport(ERROR, + (errcode(ERRCODE_INVALID_TABLE_DEFINITION), + errmsg("cannot add NO INHERIT constraint to partitioned table \"%s\"", + RelationGetRelationName(rel)))); The code and the comment disagree. I think the code is right and the comment should be adjusted to say something like /* Partitioned tables do not have storage, so a NO INHERIT constraint makes no sense. */ + * IDENTIFICATION + * src/backend/utils/misc/partition.c Wrong. +} KeyTypeCollInfo; I don't like this type name very much. Can we get "Part" in there someplace? It doesn't seem to be very well-designed, either. The number of entries in each array is determined by the partnatts flag in PartitionKeyData, which has also got various other arrays whose lengths are determined by partnatts. Why do we have some arrays in one structure and some arrays in another structure? Would it hurt anything to merge everything into one structure? Or could PartitionKeyData include a field of type KeyTypeCollInfo rather than KeyTypeCollInfo *, saving one pointer reference every place we access this data? + /* Allocate in the supposedly short-lived working context */ Why supposedly? + datum = fastgetattr(tuple, Anum_pg_partitioned_table_partattrs, + RelationGetDescr(catalog), + &isnull); Isn't the point of putting the fixed-length fields first that we can use GETSTRUCT() here? And even for partattrs as the first variable-length thing? + /* + * Run the expressions through eval_const_expressions. This is + * not just an optimization, but is necessary, because eventually + * the planner will be comparing them to similarly-processed qual + * clauses, and may fail to detect valid matches without this. + * We don't bother with canonicalize_qual, however. + */ I'm a bit confused by this, because I would think this processing ought to have been done before storing anything in the system catalogs. I don't see why it should be necessary to do it again after pulling data back out of the system catalogs. + Value *str = lfirst(partexprsrc_item); + key->partcolnames[i] = pstrdup(str->val.str); Should have a blank line in between. +/* + * Partition key information inquiry functions + */ +int +get_partition_key_strategy(PartitionKey key) +{ + return key->strategy; +} + +int +get_partition_key_natts(PartitionKey key) +{ + return key->partnatts; +} + +List * +get_partition_key_exprs(PartitionKey key) +{ + return key->partexprs; +} + +/* + * Partition key information inquiry functions - one column + */ +int16 +get_partition_col_attnum(PartitionKey key, int col) +{ + return key->partattrs[col]; +} + +Oid +get_partition_col_typid(PartitionKey key, int col) +{ + return key->tcinfo->typid[col]; +} + +int32 +get_partition_col_typmod(PartitionKey key, int col) +{ + return key->tcinfo->typmod[col]; +} If we're going to add notation for this, I think we should use macros (or static inline functions defined in the header file). Doing it this way adds more cycles for no benefit. + newkey->partattrs = (AttrNumber *) + palloc0(newkey->partnatts * sizeof(AttrNumber)); + memcpy(newkey->partattrs, fromkey->partattrs, + newkey->partnatts * sizeof(AttrNumber)); It's wasteful to use palloc0 if you're immediately going to overwrite every byte in the array. Use regular palloc instead. + * Copy the partition key, opclass info into arrays (should we + * make the caller pass them like this to start with?) Only if it happens to be convenient for the caller, which doesn't seem to be the case here. + /* Only this can ever be NULL */ + if (!partexprbinDatum) + { + nulls[Anum_pg_partitioned_table_partexprbin - 1] = true; + nulls[Anum_pg_partitioned_table_partexprsrc - 1] = true; + } How can it be valid to have no partitioning expressions? + /* Tell world about the key */ + CacheInvalidateRelcache(rel); Is this really needed? Isn't the caller going to do something similar pretty soon? + heap_freetuple(tuple); Probably useless - might as well let the context reset clean it up. + simple_heap_delete(rel, &tuple->t_self); + + /* Update the indexes on pg_partitioned_table */ + CatalogUpdateIndexes(rel, tuple); You don't need CatalogUpdateIndexes() after a delete, only after an insert or update. + if (classform->relkind != relkind && + (relkind == RELKIND_RELATION && + classform->relkind != RELKIND_PARTITIONED_TABLE)) That's broken. Note that all of the conditions are joined using &&, so if any one of them fails then we won't throw an error. In particular, it's no longer possible to throw an error when relkind is not RELKIND_RELATION. +/* Checks if a Var node is for a given attnum */ +static bool +find_attr_reference_walker(Node *node, find_attr_reference_context *context) +{ + if (node == NULL) + return false; + + if (IsA(node, Var)) + { + Var *var = (Var *) node; + char *varattname = get_attname(context->relid, var->varattno); + + if (!strcmp(varattname, context->attname)) + return true; + } + + return expression_tree_walker(node, find_attr_reference_walker, context); +} Hrm. The comment says we're matching on attnum, but the code says we're matching on attname. is_partition_attr() has the same confusion between comments and code. Maybe instead of this whole approach it would be better to use pull_varattnos(), then get_attnum() to find the attribute number for the one you want, then bms_is_member(). +static PartitionBy * +transformPartitionBy(Relation rel, PartitionBy *partitionby) +{ + PartitionBy *partby; + ParseState *pstate; + RangeTblEntry *rte; + ListCell *l; + + partby = (PartitionBy *) makeNode(PartitionBy); + + partby->strategy = partitionby->strategy; + partby->location = partitionby->location; + partby->partParams = NIL; + + /* + * Create a dummy ParseState and insert the target relation as its sole + * rangetable entry. We need a ParseState for transformExpr. + */ + pstate = make_parsestate(NULL); Why isn't this logic being invoked from transformCreateStmt()? Then we could use the actual parseState for the query instead of a fake one. + if (IsA(expr, CollateExpr)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("cannot use COLLATE in partition key expression"))); I assume there is a good reason for this seemingly-arbitrary restriction, but there's no comment saying what it is. One thing that's odd is that this will only prohibit a CollateExpr at the top level, not in some more-deeply nested position. That seems suspicious. + /* + * User wrote "(column)" or "(column COLLATE something)". + * Treat it like simple attribute anyway. + */ Evidently, the user did not do that, because you just prohibited the second one of those. + if (IsA(expr, Const)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("cannot use a constant expression as partition key"))); + + if (contain_mutable_functions(expr)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("functions in partition key expression must be marked IMMUTABLE"))); Do these checks parallel what we do for CHECK constraints? It might be good to apply about the same level of rigor in both cases. + exprsrc = deparse_expression(expr, + deparse_context_for(RelationGetRelationName(rel), + RelationGetRelid(rel)), + false, false); Why bother? The output of this doesn't seem like a useful thing to store. The fact that we've done similar things elsewhere doesn't make it a good idea. I think we did it in other cases because we used to be dumber than we are now. + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg("data type %s has no default btree operator class", + format_type_be(atttype)), + errhint("You must specify an existing btree operator class or define one for the type."))); The hint is not really accurate, because the type may well have a btree operator class. Just not a default one. + char relkind = ((CreateStmt *) stmt)->partby != NULL + ? RELKIND_PARTITIONED_TABLE + : RELKIND_RELATION; Let's push this down into DefineRelation(). i.e. if (stmt->partby != NULL) { if (relkind != RELKIND_RELATION) ereport(...); relkind = RELKIND_PARTITION_TABLE; } + RelationBuildPartitionKey(relation); I wonder if RelationBuildPartitionKey should really be in relcache.c. What do we do in similar cases? +} PartitionBy; Maybe PartitionSpec? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: