Re: Allow "snapshot too old" error, to prevent bloat - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: Allow "snapshot too old" error, to prevent bloat |
Date | |
Msg-id | 20150219011401.GG6717@tamriel.snowman.net Whole thread Raw |
In response to | Re: Allow "snapshot too old" error, to prevent bloat (Kevin Grittner <kgrittn@ymail.com>) |
Responses |
Re: Allow "snapshot too old" error, to prevent bloat
|
List | pgsql-hackers |
* Kevin Grittner (kgrittn@ymail.com) wrote: > Stephen Frost <sfrost@snowman.net> wrote: > > I also agree with the general idea that it makes sense to provide a way > > to control bloat, but I think you've missed what Andres was getting at > > with his suggestion (if I understand correctly, apologies if I don't). > > > > The problem is that we're only looking at the overall xmin / xmax > > horizon when it comes to deciding if a given tuple is dead. That's > > not quite right- the tuple might actually be dead to all *current* > > transactions by being newer than the oldest transaction but dead for all > > later transactions. Basically, there exist gaps between our cluster > > wide xmin / xmax where we might find actually dead rows. Marking those > > rows dead and reusable *would* stop the bloat, not just slow it down. > > > > In the end, with a single long-running transaction, the worst bloat you > > would have is double the size of the system at the time the long-running > > transaction started. > > I agree that limiting bloat to one dead tuple for every live one > for each old snapshot is a limit that has value, and it was unfair > of me to characterize that as not being a limit. Sorry for that. > > This possible solution was discussed with the user whose feedback > caused me to write this patch, but there were several reasons they > dismissed that as a viable total solution for them, two of which I > can share: > > (1) They have a pool of connections each of which can have several > long-running cursors, so the limit from that isn't just doubling > the size of their database, it is limiting it to some two-or-three > digit multiple of the necessary size. This strikes me as a bit off-the-cuff; was an analysis done which deteremined that would be the result? If there is overlap between the long-running cursors then there would be less bloat, and most systems which I'm familiar with don't turn the entire database over in 20 minutes, 20 hours, or even 20 days except in pretty specific cases. Perhaps this is one of those, and if so then I'm all wet, but the feeling I get is that this is a way to dismiss this solution because it's not what's wanted, which is "what Oracle did." > (2) They are already prepared to deal with "snapshot too old" > errors on queries that run more than about 20 minutes and which > access tables which are modified. They would rather do that than > suffer the bloat beyond that point. That, really, is the crux here- they've already got support for dealing with it the way Oracle did and they'd like PG to do that too. Unfortunately, that, by itself, isn't a good reason for a particular capability (we certainly aren't going to be trying to duplicate PL/SQL in PG any time soon). That said, there are capabilities in other RDBMS's which are valuable and which we *do* want, so the fact that Oracle does this also isn't a reason to not include it. > IMO all of these changes people are working are very valuable, and > complement each other. This particular approach is likely to be > especially appealing to those moving from Oracle because it is a > familiar mechanism, and one which some of them have written their > software to be aware of and deal with gracefully. For my 2c, I'd much rather provide them with a system where they don't have to deal with broken snapshots than give them a way to have them the way Oracle provided them. :) That said, even the approach Andres outlined will cause bloat and it may be beyond what's acceptable in some environments, and it's certainly more complicated and unlikely to get done in the short term. > > I'm not against having a knob like this, which is defaulted to off, > > Thanks! I'm not sure that amounts to a +1, but at least it doesn't > sound like a -1. :-) So, at the time I wrote that, I wasn't sure if it was a +1 or not myself. I've been thinking about it since then, however, and I'm leaning more towards having the capability than not, so perhaps that's a +1, but it doesn't excuse the need to come up with an implementation that everyone can be happy with and what you've come up with so far doesn't have a lot of appeal, based on the feedback (I've only glanced through it myself, but I agree with Andres and Tom that it's a larger footprint than we'd want for this and the number of places having to be touched is concerning as it could lead to future bugs). A lot of that would go away if there was a way to avoid having to mess with the index AMs, I'd think, but I wonder if we'd actually need more done there- it's not immediately obvious to me how an index-only scan is safe with this. Whenever an actual page is visited, we can check the LSN, and the relation can't be truncated by vacuum since the transaction will still have a lock on the table which prevents it, but does the visibility-map update check make sure to never mark pages all-visible when one of these old transactions is running around? On what basis? > > but I do think we'd be a lot better off with a system that could > > realize when rows are not visible to any currently running transaction > > and clean them up. > > +1 > > But they are not mutually exclusive; I see them as complementary. I can see how they would be, provided we can be confident that we're going to actually throw an error when the snapshot is out of date and not end up returning incorrect results. We need to be darn sure of that, both now and in a few years from now when many of us may have forgotten about this knob.. ;) > > If this knob's default is off then I don't think > > we'd be likely to get the complaints which are being discussed (or, if > > we did, we could point the individual at the admin who set the knob...). > > That's how I see it, too. I would not suggest making the default > anything other than "off", but there are situations where it would > be a nice tool to have in the toolbox. Agreed. Thanks! Stephen
pgsql-hackers by date: