Re: decoupling table and index vacuum - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: decoupling table and index vacuum |
Date | |
Msg-id | CAH2-Wzm2LFd=1v3JxXL8d0SHhMSXdyoJRQO0tn0H2iT5pzC_ug@mail.gmail.com Whole thread Raw |
In response to | Re: decoupling table and index vacuum (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: decoupling table and index vacuum
|
List | pgsql-hackers |
On Thu, Apr 22, 2021 at 12:27 PM Robert Haas <robertmhaas@gmail.com> wrote: > I agree strongly with this. In fact, I seem to remember saying similar > things to you in the past. If something wins $1 in 90% of cases and > loses $5 in 10% of cases, is it a good idea? Well, it depends on how > the losses are distributed. If every user can be expected to hit both > winning and losing cases with approximately those frequencies, then > yes, it's a good idea, because everyone will come out ahead on > average. But if 90% of users will see only wins and 10% of users will > see only losses, it sucks. Right. It's essential that we not disadvantage any workload by more than a small fixed amount (and only with a huge upside elsewhere). The even more general version is this: the average probably doesn't even exist in any meaningful sense. Bottom-up index deletion tends to be effective either 100% of the time or 0% of the time, which varies on an index by index basis. Does that mean we should split the difference, and assume that it's effective 50% of the time? Clearly not. Clearly that particular framing is just wrong. And clearly it basically doesn't matter if it's half of all indexes, or a quarter, or none, whatever. Because it's all of those proportions, and also because who cares. > That being said, I don't know what this really has to do with the > proposal on the table, except in the most general sense. If you're > just saying that decoupling stuff is good because different indexes > have different needs, I am in agreement, as I said in my OP. Mostly what I'm saying is that I would like to put together a rough list of things that we could do to improve VACUUM along the lines we've discussed -- all of which stem from $SUBJECT. There are literally dozens of goals (some of which are quite disparate) that we could conceivably set out to pursue under the banner of $SUBJECT. Ideally there would be soft agreement about which ideas were more promising. Ideally we'd avoid painting ourselves into a corner with respect to one of these goals, in pursuit of any other goal. I suspect that we'll need somewhat more of a top-down approach to this work, which is something that we as a community don't have much experience with. It might be useful to set the parameters of the discussion up-front, which seems weird to me too, but might actually help. (A lot of the current problems with VACUUM seem like they might be consequences of pgsql-hackers not usually working like this.) > It sort > of sounded like you were saying that it's not important to try to > estimate the number of undeleted dead tuples in each index, which > puzzled me, because while knowing doesn't mean everything is > wonderful, not knowing it sure seems worse. But I guess maybe that's > not what you were saying, so I don't know. I agree that it matters that we are able to characterize how bloated a partial index is, because an improved VACUUM implementation will need to know that. My main point about that was that it's complicated in surprising ways that actually matter. An approximate solution seems quite possible to me, but I think that that will probably have to involve the index AM directly. Sometimes 10% - 30% of the extant physical index tuples will be dead and it'll be totally fine in every practical sense -- the index won't have grown by even one page since the last VACUUM! Other times it might be as few as 2% - 5% that are now dead when VACUUM is considered, which will in fact be a serious problem (e.g., it's concentrated in one part of the keyspace, say). I would say that having some rough idea of which case we have on our hands is extremely important here. Even if the distinction only arises in rare cases (though FWIW I don't think that these differences will be rare at all). (I also tried to clarify what I mean about qualitative bloat in passing in my response about the case of a bloated partial index.) > I feel like we're in danger > of drifting into discussions about whether we're disagreeing with each > other rather than, as I would like, focusing on how to design a system > for $SUBJECT. While I am certainly guilty of being kind of hand-wavy and talking about lots of stuff all at once here, it's still kind of unclear what practical benefits you hope to attain through $SUBJECT. Apart from the thing about global indexes, which matters but is hardly the overwhelming reason to do all this. I myself don't expect your goals to be super crisp just yet. As I said, I'm happy to talk about it in very general terms at first -- isn't that what you were doing yourself? Or did I misunderstand -- are global indexes mostly all that you're thinking about here? (Even if they are all you care about, it still seems like you're still somewhat obligated to generalize the dead TID fork/map thing to help with a bunch of other things, just to justify the complexity of adding a dead TID relfork.) -- Peter Geoghegan
pgsql-hackers by date: