Re: Declarative partitioning - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Declarative partitioning |
Date | |
Msg-id | CA+TgmoZZMfcf16YaHuhP1Vk=j8PDFeHCvfj+FJQd+eFhs+7P8A@mail.gmail.com Whole thread Raw |
In response to | Re: Declarative partitioning (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>) |
Responses |
Re: Declarative partitioning
|
List | pgsql-hackers |
On Mon, Dec 14, 2015 at 2:14 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > Syntax to create a partitioned table (up to 2 levels of partitioning): > > CREATE TABLE foo ( > ... > ) > PARTITION BY R/L ON (key0) > SUBPARTITION BY R/L ON (key1) > [(PARTITION foo_1 FOR VALUES <val> [<storage_params>] [<tblspc>] > [(SUBPARTITION foo_1_1 FOR VALUES <val> [<storage_params>] [<tblspc>], > ...)], ...)]; > > The above creates two pg_partitioned_rel entries for foo with partlevel 0 > and 1, for key0 and key1, respectively. For foo_1 and foo_1_1, this > creates pg_partition entries, with foo and foo_1 as partparent, > respectively. > > Why just 2 levels? - it seems commonplace and makes the syntax more > intuitive? I guess it might be possible to generalize the syntax for > multi-level partitioning. Ideas? If we want to support the notion of > sub-partition template in future, that would require some thought, more > importantly proper catalog organization for the same. I do not think this is a particularly good idea. You're going to need to dump each partition separately at least in --binary-upgrade mode, because each is going to have its own magic OIDs that need to be restored, and also because there will most likely be at least some properties that are going to vary between partitions. You could require that every partition have exactly the same set of columns, constraints, rules, triggers, policies, attribute defaults, comments, column comments, and everything else that might be different from one partition to another, and further require that they have exactly matching indexes. It would take a fair amount of code to prohibit all that, but it could be done. However, do we really want that? There may well be some things were we want to enforce that the parent and the child are exactly identical, but I doubt we want that for absolutely every property, current and future, of the partition. And even if you did, because of the --binary-upgrade stuff, you still need to to be able to dump them separately. Therefore, I believe it is a whole lot better to make the primary syntax for table partitioning something where you issue a CREATE statement for the parent and then a CREATE statement for each child. If we want to also have a convenience syntax so that people who want to create a parent and a bunch of children in one fell swoop can do so, fine. I would not choose to model the syntax for creating partitions on Oracle. I don't find that syntax particularly nice or easy to remember. I say PARTITION BY RANGE, and then inside the parentheses I use the PARTITION keyword for each partition? Really? But I think copying the style while having the details be incompatible is an even worse idea. > What about ALTER TABLE? - Instead of allowing ALTER TABLE to be applied > directly to partitions on case-by-case basis (they are tables under the > hood after all), we should restrict AT to the master table. Most of the AT > changes implicitly propagate from the master table to its partitions. Some > of them could be directly applied to partitions and/or sub-partitions such > as rename, storage manipulations like - changing tablespace, storage > parameters (reloptions), etc.: > > ALTER TABLE foo > RENAME PARTITION <partition-name> TO <new-name>; > > ALTER TABLE foo > RENAME SUBPARTITION <sub-partition-name> TO <new-name>; > > ALTER TABLE foo > SET TABLESPACE ... [DEFAULT] FOR PARTITION <partition-name>; > > ALTER TABLE foo > SET TABLESPACE ... FOR SUBPARTITION <sub-partition-name>; > > ALTER TABLE foo > SET (storage_parameter = value) [DEFAULT] FOR PARTITION <partition-name>; > > ALTER TABLE foo > SET (storage_parameter = value) FOR SUBPARTITION <sub-partition-name>; I don't think this is a very good idea. This is basically proposing that for every DDL command that you can apply to a table, you have to spell it differently for a partition. That seems like a lot of extra work for no additional functionality. > By the way, should we also allow changing the logging of > partitions/sub-partitions as follows? Again, I think you're coming at this from the wrong direction. Instead of saying we're going to disallow all changes to the partitions and then deciding we need to allow certain changes after all, I think we should allow everything that is currently allowed for an inherited table and then decide which of those things we need to prohibit, and why. For example, if you insist that a child table has to have a tuple descriptor that matches the parent, that can improve efficiency: Append won't need to project, and so on. But it now becomes very difficult to support taking a stand-alone table and making it a partition of an existing partitioned table, because the set of dropped columns might not match. Having to give an error in that case amounts to "we're sorry, we can't attach your partition to the partitioning hierarchy because of some invisible state that you can't see" isn't very nice. Now I'm not saying that isn't the right decision, but I think the design choices here need to be carefully thought about. Stepping away from that particular example, a blanket prohibition on changing any attribute of a child table seems like it will prohibit a lot of useful things that really ought to work. And again, I don't think it's a good idea to implement separate syntax for changing a partition vs. changing a table. If I want to set a partition as unlogged, I should be able to say ALTER TABLE partition_name UNLOGGED or maybe ALTER PARTITION partition_name UNLOGGED, not be forced to use some new grammar production that looks completely different. > What about index constraints, ie, PRIMARY KEY, UNIQUE and EXCLUSION > constraints - 2 things must be clear here: cannot create these constraints > on individual partitions and all partition columns (key0 + key1) must be > the leading columns of the key. On a related note, creating index on the > master table should create the index on all "leaf" partitions. The index > on the mater table itself would be just a logical index. Should we allow > creating or dropping indexes on partitions directly? I don't find this to be particularly clear. You are assuming that nobody wants to create a constraint that a certain value is unique within a partition. That might not be a real common thing to want to do, but it could certainly be useful to somebody, and the current system with table inheritance allows it. For example, suppose that we have orders partitioned on the order_date column, by month. The user might want to create a UNIQUE index on order_id on each partition. Maybe they start over with order_id 1 at the beginning of each month. But even if, as is more likely, the order IDs keep counting up from month to month, they don't want to be forced to include the whole partitioning key in the index in order to have it marked UNIQUE. That may be enough, in practice, to ensure the global uniqueness of order IDs even though the system doesn't technically enforce it in all cases. If you want an index created on the parent to cascade down to all children, that's a big project to get right. Suppose I create an index on the parent table. After a while, I notice that it's getting bloated, so I created another index with the same definition. Now, I drop one of the two indexes. One of the two indexes from each child table needs to go away, and moreover it can't be picked arbitrarily - it has to be the one that was created at the same time as the parent index I'm dropping. If you want it to behave like this, you need a whole system of bookkeeping to make it work right. For version 1, I'd go the other way and prohibit index definitions on the empty parent rels. Let people create indexes on the children as they wish. In a future version, we can add stuff to cascade from parents to children. > It would be interesting to talk about features like SPLIT, MERGE, EXCHANGE > but I'm inclined to leave them as future enhancements. For a functionality > similar to EXCHANGE, there are commands like ATTACH/DETACH in the latest > patch. We could extend them to also consider sub-partitions: We don't need to have these in the first version, but we have to make some architectural decisions that affect how feasible they are to implement and in which cases, as noted above. > One cannot define rules, triggers, and RLS policies on them. Although, > AR triggers defined on a partitioned master table are propagated to the > "leaf" partitions. What value do you see us getting out of restricting these particular things? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: