Re: Dynamic Partitioning using Segment Visibility Maps - Mailing list pgsql-hackers
From | Chris Browne |
---|---|
Subject | Re: Dynamic Partitioning using Segment Visibility Maps |
Date | |
Msg-id | 60abnfymho.fsf@dba2.int.libertyrms.com Whole thread Raw |
In response to | Dynamic Partitioning using Segment Visibility Maps (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Dynamic Partitioning using Segment Visibility Maps
Re: Dynamic Partitioning using Segment Visibility Maps |
List | pgsql-hackers |
simon@2ndquadrant.com (Simon Riggs) writes: > I think we have an opportunity to bypass the legacy-of-thought that > Oracle has left us and implement something more usable. This seems like a *very* good thing to me, from a couple of perspectives. 1. I think you're right on in terms of the issue of the cost of "running all that DDL" in managing partitioning schemes. When I was working as DBA, I was decidedly *NOT* interested in doing a lot of low level partition management work, andthose that are in that role now would, I'm quite sure, agree that they are not keen on spending a lot of their timetrying to figure out what tablespace to shift a particular table into, or what tablespace filesystem to get sysadminsto set up. 2. Blindly following what Oracle does has always been a dangerous sort of thing to do. There are two typical risks: a) There's always the worry that they may have patented some part of how they implement things, and if you followtoo closely, There Be Dragons... b) They have enough billion$ of development dollar$ and development re$ource$ that they can follow strategiesthat are too expensive for us to even try to follow. 3. If, rather than blindly following, we create something at least quasi-new, there is the chance of doing fundamentallybetter. This very thing happened when it was discovered that IBM had a patent on the ARC cacheing scheme; the "clock" systemthat emerged was a lot better than ARC ever was. > One major advantage of the dynamic approach is that it can work on > multiple dimensions simultaneously, which isn't possible with > declarative partitioning. For example if you have a table of Orders then > you will be able to benefit from Segment Exclusion on all of these > columns, rather than just one of them: OrderId, OrderDate, > RequiredByDate, LastModifiedDate. This will result in some "sloppiness" > in the partitioning, e.g. if we fill 1 partition a day of Orders, then > the OrderId and OrderData columns will start out perfectly arranged. Any > particular RequiredByDate will probably be spread out over 7 partitions, > but thats way better than being spread out over 365+ partitions. I think it's worth observing both the advantages and demerits of this together. In effect, with the dynamic approach, Segment Exclusion provides its benefits as an emergent property of the patterns of how INSERTs get drawn into segments. The tendancy will correspondly be that Segment Exclusion will be able to provide useful constraints for those patterns that can naturally emerge from the INSERTs. We can therefore expect useful constraints for attributes that are assigned in some kind of more or less chronological order. Such attributes will include: - Object ID, if set by a sequence- Processing dates There may be a bit of sloppiness, but the constraints may still be useful enough to exclude enough segments to improve efficiency. _On The Other Hand_, there will be attributes that are *NOT* set in a more-or-less chronological order, and Segment Exclusion will be pretty useless for these attributes. In order to do any sort of "Exclusion" for non-"chronological" attributes, it will be necessary to use some mechanism other than the patterns that fall out of "natural chronological insertions." If you want exclusion on such attributes, then there needs to be some sort of rule system to spread such items across additional partitions. Mind you, if you do such, that will weaken the usefulness of Segment Exclusion. For instance, suppose you have 4 regions, and scatter insertions by region. In that case, there will be more segments that overlap any given chronological range. > When we look at the data in the partition we can look at any number of > columns. When we declaratively partition, you get only one connected set > of columns, which is one of the the reasons you want multi-dimensional > partitioning in the first place. Upside: Yes, you get to exclude based on examining any number of columns. Downside: You only get the exclusions that are "emergent properties" of the data... The more I'm looking at the dynamic approach, the more I'm liking it... -- "cbbrowne","@","cbbrowne.com" http://linuxfinances.info/info/linuxxian.html "Feel free to contribute build files. Or work on your motivational skills, and maybe someone somewhere will write them for you..." -- "Fredrik Lundh" <effbot@telia.com>
pgsql-hackers by date: