Re: decoupling table and index vacuum - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: decoupling table and index vacuum |
Date | |
Msg-id | CAH2-Wz=8E5QecDmzVcEWhwCyVhc2wsGRzviDZq0CyCwiv=zgLw@mail.gmail.com Whole thread Raw |
In response to | Re: decoupling table and index vacuum (Andres Freund <andres@anarazel.de>) |
Responses |
Re: decoupling table and index vacuum
|
List | pgsql-hackers |
On Thu, Apr 22, 2021 at 11:44 AM Andres Freund <andres@anarazel.de> wrote: > I'm honestly getting a bit annoyed about this stuff. You're easily annoyed. > Yes it's a cool > improvement, but no, it doesn't mean that there aren't still relevant > issues in important cases. It doesn't help that you repeatedly imply > that people that don't see it your way need to have their view "cleared > up". I don't think that anything that I've said about it contradicts anything that you or Robert said. What I said that you're missing a couple of important subtleties (or that you seem to be). It's not really about the optimization in particular -- it's about the subtleties that it exploits. I think that they're generalizable. Even if there was only a 1% chance of that being true, it would still be worth exploring in depth. I think that everybody's beliefs about VACUUM tend to be correct. It almost doesn't matter if scenario A is the problem in 90% or cases versus 10% of cases for scenario B (or vice-versa). What actually matters is that we have good handling for both. (It's probably some weird combination of scenario A and scenario B in any case.) > "Bottom up index deletion" is practically *irrelevant* for a significant > set of workloads. You're missing the broader point. Which is that we don't know how much it helps in each case, just as we don't know how much some other complementary optimization helps. It's important to develop complementary techniques precisely because (say) bottom-up index deletion only solves one class of problem. And because it's so hard to predict. I actually went on at length about the cases that the optimization *doesn't* help. Because that'll be a disproportionate source of problems now. And you really need to avoid all of the big sources of trouble to get a really good outcome. Avoiding each and every source of trouble might be much much more useful than avoiding all but one. > > You both seem to be assuming that everything would be fine if you > > could somehow inexpensively know the total number of undeleted dead > > tuples in each index at all times. > > I don't think we'd need an exact number. Just a reasonable approximation > so we know whether it's worth spending time vacuuming some index. I agree. > You also have to assume that you have roughly evenly distributed index > insertions and deletions. But workloads that insert into some parts of a > value range and delete from another range are common. > > I even would say that *precisely* because "Bottom up index deletion" can > be very efficient in some workloads it is useful to have per-index stats > determining whether an index should be vacuumed or not. Exactly! > Except that heap bloat not index bloat might be the more pressing > concern. Or that there will be no meaningful amount of bottom-up > deletions. Or ... Exactly! -- Peter Geoghegan
pgsql-hackers by date: