Re: Declarative partitioning - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: Declarative partitioning |
Date | |
Msg-id | 55E9317C.60705@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Declarative partitioning (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: Declarative partitioning
|
List | pgsql-hackers |
Sorry about the long delay in replying, to this message or the others posted in the last few days. I should have notified in advance of my vacation with rather limited Internet access. On 2015-08-26 PM 11:00, Simon Riggs wrote: > On 18 August 2015 at 11:30, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> > >> 3. Multi-level partitioning >> >> CREATE TABLE table_name >> PARTITION OF partitioned_table_name >> FOR VALUES values_spec >> PARTITION BY {RANGE|LIST} ON (columns_list) >> >> This variant implements a form of so called composite or sub-partitioning >> with arbitrarily deep partitioning structure. A table created using this >> form has both the relkind RELKIND_PARTITIONED_REL and >> pg_class.relispartition set to true. >> > > Multi-level partitioning is probably going to complicate things beyond > sanity. > > One RELKIND_PARTITIONED_REL with lots of partitions sounds best to me. We > can still have N dimensions of partitioning (or partitioning and > subpartitioning, if you prefer that term) > Implementation in the patch lets RELKIND_PARTITIONED_REL under another and so on to which appears to give a capability to have arbitrarily deep partitioning structure. That might feel unnecessary. Initial motivation to go for such design was to not have to complicate the partition key catalog with details of how to accommodate some notion of sub-partition template. ISTM, most usecases (as covered by other databases) would not require to use more than 2 levels. Maybe, we should consider catering to only that set of usecases using some sub-partition template based design (and then some specialized syntax). Or, we could just not handle sub-partitioning initially, by which I mean composite partitioning where you can have schemes such as range-list, list-range, range-hash, etc. We do still have multi-column range partitioning. In any case, we have to have a design that will cater to the multi-level partitioning. > >> The patch does not yet implement any planner changes for partitioned >> tables, although I'm working on the same and post updates as soon as >> possible. That means, it is not possible to run SELECT/UPDATE/DELETE >> queries on partitioned tables without getting: >> >> postgres=# SELECT * FROM persons; >> ERROR: could not open file "base/13244/106975": No such file or directory >> >> Given that there would be more direct ways of performing partition pruning >> decisions with the proposed, it would be nice to utilize them. >> Specifically, I would like to avoid having to rely on constraint exclusion >> for partition pruning whereby subquery_planner() builds append_rel_list >> and the later steps exclude useless partitions. >> > > This is really the heart of this patch/design. You can work for months on > all the rest of this, but you will live or die by how the optimization > works because that is the thing we really need to work well. Previous > attempts ignored this aspect and didn't get committed. It's hard, perhaps > even scary, but its critical. It's the 80/20 rule in reverse - 20% of the > code is 80% of the difficulty. > > I suggest you write a partition query test script .sql and work towards > making this work. Not exhaustive and weird tests, but 5-10 key queries that > need to be optimized precisely and quickly. I'm sure that's been done > before. > Yes, I am working on this and hope to have something to show soon. > > I couldn't see why you invented a new form of Alter Table recursion. > It was intended to keep the ALTER TABLE considerations for inherited tables (and typed tables) separate from those for partitioned tables. But... This begs a larger question that I did not try to answer in this design/patch - for partitions, do we need to have any catalog entries other than the pg_class tuple? If we manage to not require them, we would not need any AT recursion business at all except for cases that require pg_class tuple updates. The pg_class tuple would be the only authoritative catalog entry for partitions. Everything else belongs with the master table. That includes catalog entries for attributes, constraints, triggers, etc. Alvaro had mentioned something like this in his proposal. Although, I hope that such radical design is not very difficult to realize/code. Also, we might have to go back to the slightly controversial question of whether partitions share the same namespace as normal tables (partitioned or not). In the patch, partitions are created using CREATE TABLE, altered with ALTER TABLE (albeit with several restrictions). How about rather partitions are created/altered using: ALTER TABLE master CREATE PARTITION <name> ..., ALTER TABLE master MODIFY PARTITION <name> ... (as mentioned above) AT commands covered by the latter should only ever require updating the pg_class tuple for the named partition. ALTER TABLE <name> directly on a partition can be made to say something like the following under this scheme: ERROR: <name> is not a table Although, I'm not thinking of hiding partitions from everyone. For example, maintenance commands like VACUUM/ANALYZE (including autovacuum, of course) would be able to see them. > We will need to support multi-row batched COPY. Currently, tuple-routing may switch the ResultRelInfo for every consecutive tuple. So, at once I abandoned any hope of adapting the routing mechanism to heap_multi_insert() API. But, perhaps there is a smarter way. Let me think about that. > > I'm pleased to see this patch and will stay with it to completion, perhaps > others also. We have 3 more CFs in this release, Nov, Jan, Mar - so this > has a great chance of making it into 9.6. The current patch implements a > bunch of stuff, but its hard to say what, how or why it does it and without > the planner stuff its all moot. My recommendation is we say "Returned with > Feedback" on this now, looking forward to next patch. > > If you submit another patch before Nov, I will review it without waiting > for Nov 1. > > There will be much discussion on syntax, but that is not the key point. DDL > Support routines are usually pretty straightforward too, so that can be > left for now. > Sure, let me address number of points you have raised. In the upcoming versions, I will try to provide extensive documentation (both internal and user). As for the syntax and DDL changes you think can be dealt with later, I tend to think let's keep the bare essentials necessary. Thanks, Amit
pgsql-hackers by date: