Re: Dynamic Partitioning using Segment Visibility Maps - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Dynamic Partitioning using Segment Visibility Maps |
Date | |
Msg-id | 1199891288.4266.300.camel@ebony.site Whole thread Raw |
In response to | Re: Dynamic Partitioning using Segment Visibility Maps (Gavin Sherry <swm@alcove.com.au>) |
Responses |
Re: Dynamic Partitioning using Segment Visibility Maps
|
List | pgsql-hackers |
Gavin and all, This is quite a long reply, so apologies for that. On Wed, 2008-01-09 at 07:28 +0100, Gavin Sherry wrote: > On Wed, Jan 02, 2008 at 05:56:14PM +0000, Simon Riggs wrote: > > This technique would be useful for any table with historical data keyed > > by date or timestamp. It would also be useful for data where a > > time-of-insert component is implicit, such as many major entity tables > > where the object ids are assigned by a sequence. e.g. an Orders table > > with an OrderId as PK. Once all orders placed in a period have been > > shipped/resolved/closed then the segments will be marked read-only. > > > A novel approach to the problem. For me, this proposal addresses some > of the other problems in postgres (visibility in the heap vs. index) > rather than the problems with partitioning itself. I think you're right that it isn't attempting to address the problems with partitioning directly, it is attempting to address the core use case itself. I think we have an opportunity to bypass the legacy-of-thought that Oracle has left us and implement something more usable. There are some low-level technical issues associated with declarative partitioning that I'll address last, which are in many ways the big reason to want to go to non-declarative partitioning. > It might seem otherwise, but for me partitioning is tool for managing > large volumes of data. It allows the user to encode what they know about > the nature of their data, and that's often about management. In this > way, your proposal actually makes partitioning too smart: the user wants > to tell the system how to organise the data, as opposed to the other way > around. > At Greenplum, we've been discussing this in depth. Interestingly, we > also felt that the storage layer could be much smarter with large tables > with predictable layouts and predictable data patterns. But the thing > is, people with large tables like partitioning, putting different > partitions on different storage; they like the ability to merge and > split partitions; they like truncating old partitions. In a word, they > seem to like the managability partitions give them -- as well as the > performance that comes with this. Do people really "like" running all that DDL? There is significant manpower cost in implementing and maintaining a partitioning scheme, plus significant costs in getting it wrong. If people with large tables like partitioning why is Oracle moving towards automated partitioning in 11g? Automated partitioning was one of the major goals for this next set of Postgres partitioning functionality also, whether or not we have declarative partitioning. My thinking is lets go for the best ideas and skip over the stuff that will be (is) obsolete before its left the design stage. I see many more systems in the Terabyte range now than I did 10 years ago, but I see about the same number of DBAs. We'll always need experts, but I feel we should be using our expertise to simplify the standard cases, not just maintain the same level of difficulty in managing them. One of the big benefits of the dynamic partitioning approach is that it needs no DDL. So it will work out-of-the-box, for anybody. Deleting older data would be optimised under the proposed scheme, so that's not really a problem. Loading data is actually slightly harder and slower with declarative partitioning (see below). Merging and splitting partitions are tools for fine tuning a very complex partitioning scheme. They do also allow a non-linear segmentation scheme, which might aid performance in some cases. (On a different note, I'm strongly in favour of a declarative approach to other optimizer information to allow the DBA to say what they know about how a table will be used. But that's off-topic for now.) One major advantage of the dynamic approach is that it can work on multiple dimensions simultaneously, which isn't possible with declarative partitioning. For example if you have a table of Orders then you will be able to benefit from Segment Exclusion on all of these columns, rather than just one of them: OrderId, OrderDate, RequiredByDate, LastModifiedDate. This will result in some "sloppiness" in the partitioning, e.g. if we fill 1 partition a day of Orders, then the OrderId and OrderData columns will start out perfectly arranged. Any particular RequiredByDate will probably be spread out over 7 partitions, but thats way better than being spread out over 365+ partitions. When we look at the data in the partition we can look at any number of columns. When we declaratively partition, you get only one connected set of columns, which is one of the the reasons you want multi-dimensional partitioning in the first place. > To this end, we (well, Jeff Cohen) looked at the syntax and semantics of > partitining in leading databases (Oracle, Informix, DB2) and came up > with a highly expressive grammar which takes the best of each I think > (I'll post details on the grammar in a seperate thread). The idea is > that range (for example, a date range), list (a list of distinct values) > and hash partitioning be supported on multiple columns. Partitions can > be added, merged, truncated. Partitions can reside on different > tablespaces. The existing issues with the rewriter, COPY support and the > like go away by smartening up the backend. > To explore the grammar and semantics Jeff and I (to a small extent) have > implemented the parsing of such a grammar in the backend. At the moment, > this just results in the generation of a bunch of rules (i.e., it > produces the current behaviour, as if someone had written rules > themselves) and debugging output. > > The fully fledged approach will see partitioning rules stored in > a new system catalog which can be interrogated by the planner or COPY to > determine which partition a given tuple belongs in or which partition(s) > a WHERE clause matches. When loading data we would need to examine each incoming tuple and distribute it to the correct place in the partitioned heap. That's a non-trivial overhead on data loading that I would like to avoid, since if we rely on the natural locality of loaded data then that overhead is zero. > ...partitioning rules stored in > a new system catalog which can be interrogated... Agreed. I think this would be the best approach whatever we do. pg_class would eventually impose a limit on the number of partitions for us. > Yes, this is the traditional approach but I think it still has merit. We > shall continue working on this approach because it is what our customers > have asked for. We would also like to see it get into postgres too. I think there very likely some merit hiding in the "traditional approach", but we should be clear about what that is and is not. (I still have half my brain in that way of doing things, so I'm working through everything to make sure I miss as little as possible.) I'd like to hear from anybody with some descriptions of use cases where the differences between what's been proposed and what is missing provide a significant gain. If nobody has ever run such tests or had those thoughts, lets examine them now to make sure we're doing the right thing. I don't want to copy what others have done without considering whether it is correct for us. After thinking about what you've said ISTM that the declarative partition approach has these advantages over the dynamic approach - allows partitions of different sizes, for when a non-linear partitioning scheme gives considerable performance gain - allows partitioning to continue to work whilst UPDATEs continue within a particular partition and these disadvantages - only works with one set of columns, cannot exclude partitions by other columns - requires manual definition - requires partition names and numbers to be user visible - declarative partitioing can improve the situation, or make it worse, depending upon whether the definitions match the reality of the data - acquires table locks during DDL execution, to enforce validity of the rules, possibly for extended periods while data moves/is checked - requires loaded data to execute partitioning code to identify where it should insert itself My assessment is that the dynamic partitioning approach wins in most common cases for large growing tables, though there are additional use cases where more control is desirable. (The two advantages of declarative partitioning might be something we could change with the dynamic approach, but not in first phase. For example we might see a way to break down segments into 4 sub-segments recursively using a quad tree approach. Thus when segments are being updated we simply isolate the issue around the changing data.) I would like to proceed in a way that allows dynamic partitioning to occur naturally as the default, with the option to add explicit partitioning, if required and possible at a later stage. At this stage I'm not sure whether those additional cases apply to truly growing tables and whether there is sufficient additional advantage to make declarative partitioning worth considering. Technical Issues ---------------- The main problems I see occur in the storage layer. If we can discuss what you think is needed at the storage layer then we might stand a chance of laying the foundations for both options. The catalog changes and executor changes are probably fairly similar for both. My first problem sounds quite mundane: expandability. If you define a partition manually and logically then it needs to cope with an arbitrary number of rows. That means that the view of a partition as a range of contiguous blocks disappears and is replaced by a more complex definition. When you turn partitioning on its head and say "What data is in this range of blocks?" you don't need to have expandability and the partition can be a fixed set of contiguous blocks. That works well with any shape of data, so merging/splitting operations are not required because we never have fat or thin partitions to cope with. You can also redefine partitioning simply by changing the segment size and re-VACUUMing the table. At the moment, smgr provides the executor with a single uniform view of a table as a single contiguous range of blocks. The current implementation of md.c allows a table to actually consist of a range of files, without the executor knowing. If we violate that, then we'd need to change all code that accepts a BlockNumber, i.e. indexes, buffer manager etc. So anything we do that is of any complexity must be below the smgr layer. Allowing expandable partitions eventually implies a non-contiguous range of blocks. In the worst case coping with a non-contiguous block range means you need to consult a block index for each block, which would be horrible. One better solution might be to have logical partitions broken up into physical partitions all of the same size. Various solutions possible, but most seem to detract from the thing which we were seeking in the first place and/or seem to restrict the range of block addresses available, unless we go for variable-size BlockNumbers also. Urgh. If anybody thinks we can get around this by putting a maximum size on partitions, then I'd say you're mad. The last thing we would want is a mechanism that stops you from adding new data if you get the partition definitions wrong, especially since my experience is that most people set up partitioning in a way that is proven sub-optimal after some time in real operation. The same comments would also apply to Oracle's Hash Cluster and DB2's Multi-Dimensional Clustering techniques. They're great performance features when declared correctly, but almost unusable in real applications that can't control what business and the real world will throw at them. So it seems likely that we'd be heading for a level of change within the storage layer that would not be easily accepted, given that not everybody needs partitioning in Postgres. We're probably caught on the question of whether this is a general purpose or special purpose database system. Realistically, I have to set limits, so I'd have to describe my current goal of being able to manage 16TB easily and quickly with Postgres, without too much additional thought on the part of the DBA, when they have one or more naturally growing tables. Special purpose requirements for parallelism or larger databases can use another database system, e.g. GP, XDB, ParAccel etc. So, there are specific disadvantages that I see to declarative partitioning with respect to its deeper implementation for us. I see no need to rule it out completely, but I do see reasons to de-prioritise it. Detailed comments and thoughts most welcome. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
pgsql-hackers by date: