Re: Dynamic Partitioning using Segment Visibility Maps - Mailing list pgsql-hackers
From | Andrew Sullivan |
---|---|
Subject | Re: Dynamic Partitioning using Segment Visibility Maps |
Date | |
Msg-id | 20080107154146.GA18581@crankycanuck.ca Whole thread Raw |
In response to | Re: Dynamic Partitioning using Segment Visibility Maps (Markus Schiltknecht <markus@bluegap.ch>) |
Responses |
Re: Dynamic Partitioning using Segment Visibility Maps
|
List | pgsql-hackers |
On Sat, Jan 05, 2008 at 08:02:41PM +0100, Markus Schiltknecht wrote: > Well, management of relations is easy enough, known to the DBA and most > importantly: it already exists. Having to set up something which is > *not* tied to a relation complicates things just because it's an > additional concept. But we're already dealing with some complicated concepts. There isn't anything that will prevent current-style partitioning strategies from continuing to work in the face of Simon's proposal. But let me see if I can outline the sort of cases where I see real value in what he's outlined. There is a tendency in data systems to gather all manner of data that, in retrospect, _might_ turn out to be be valuable; but which, at the time, is not really valuable at all. Moreover, the value later on might be relatively low: if you can learn something much later from that data, and do so easily, then it will be worth doing. But if the work involved passes some threshold (say 1/2 a day), it's suddenly not worth it any more. It's simple economics: below a certain cost, the data is valuable. Above a certain cost, you simply shouldn't keep the data in the first place, because the cost of using it is higher than any value you'll likely be able to extract. Simon's proposal changes the calculations you have to do. If keeping some data online longer does not impose administrative or operational overhead (you have it marked read only, so there's no I/O for vacuum; you don't need to do anything to get the data marked read only; &c.), then all it costs is a little more disk, which is relatively cheap these days. More importantly, if the longer-term effect of this strategy is to make it possible to move such data offline _without imposing a big cost_ when moving it back online, then the value is potentially very high. Without even trying, I can think of a dozen examples in the past 5 years where I could have used that sort of functionality. Because the cost of data retrieval was high enough, we had to decide that the question wasn't worth answering. Some of those answers might have been quite valuable indeed to the Internet community, to be frank; but because I had to pay the cost without getting much direct benefit, it just wasn't worth the effort. The thing about Simon's proposal that is beguiling is that it is aimed at a very common use pattern. The potential for automatic management under such a use pattern makes it seem to me to be worth exploring in some detail. > Agreed. I'd say that's why the DBA needs to be able to define the split > point between partitions: only he knows the meaning of the data. I think this is only partly true. A casual glance at the -general list will reveal all manner of false assumptions on the parts of administrators about how their data is structured. My experience is that, given that the computer has way more information about the data than I do, it is more likely to make the right choice. To the extent it doesn't do so, that's a problem in the planning (or whatever) algorithms, and it ought to be fixed there. A
pgsql-hackers by date: