Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Deleting older versions in unique indexes to avoid page splits |
Date | |
Msg-id | CAH2-Wzno9q1Wj7a9rk5G4jQe7BupRe5y6NVobVw8EiVLDcad8w@mail.gmail.com Whole thread Raw |
In response to | Re: Deleting older versions in unique indexes to avoid page splits (Victor Yegorov <vyegorov@gmail.com>) |
Responses |
Re: Deleting older versions in unique indexes to avoid page splits
|
List | pgsql-hackers |
On Mon, Jan 18, 2021 at 6:11 AM Victor Yegorov <vyegorov@gmail.com> wrote: > If I understand you correctly, you suggest to track _all_ the hints that came > from the executor for possible non-HOT logical duplicates somewhere on > the page. And when we hit the no-space case, we'll check not only the last > item being hinted, but all items on the page, which makes it more probable > to kick in and do something. > Also, I'm not sure where to put it. We've deprecated the BTP_HAS_GARBAGE > flag, maybe it can be reused for this purpose? There actually was a similar flag (named BTP_UNCHANGED_UPDATE and later BTP_HAS_DUPS) that appeared in earlier versions of the patch (several versions in total, up to and including v6). This was never discussed on the list because I assumed that that wouldn't be helpful (I was already writing too many overlong e-mails). It was unsettled in my mind at the time, so it didn't make sense to start discussing. I changed my mind at some point, and so it never came up until now, when Amit raised the question. Looking back on my personal notes, I am reminded that I debated this exact question with myself at length. The argument for keeping the nbtree flag (i.e. what Amit is arguing for now) involved an isolated example that seems very similar to the much more recent example from Amit (that he arrived at independently). I am at least sympathetic to this view of things, even now. Let me go into why I changed my mind after a couple of months of on and off deliberation. In general, the highly speculative nature of the extra work that heapam.c does for index deleters in the bottom-up caller case can only be justified because the cost is paid by non-HOT updaters that are definitely about to split the page just to fit another version, and because we only risk wasting one heap page access at any given point of the entire process (the latter point about risk/cost is not 100% true, because you have additional fixed CPU costs and stuff like that, but it's at least true in spirit). We can justify "gambling" like this only because the game is rigged in our favor to an *absurd* degree: there are many ways to win big (relative to the likely version churn page split baseline case), and we only have to put down relatively small "bets" at any given point -- heapam.c will give up everything when it encounters one whole heap page that lacks a single deletable entry, no matter what the reason is. The algorithm *appears* to behave very intelligently when seen from afar, but is in fact stupid and opportunistic when you look at it up close. It's possible to be so permissive about the upside benefit by also being very conservative about the downside cost. Almost all of our individual inferences can be wrong, and yet we still win in the end. And the effect is robust over time. You could say that it is an organic approach: it adapts to the workload, rather than trying to make the workload adapt to it. This is less radical than you'd think -- in some sense this is how B-Trees have always worked. In the end, I couldn't justify imposing this cost on anything other than a non-HOT updater, which is what the flag proposal would require me to do -- then it's not 100% clear that the relative cost of each "bet" placed in heapam.c really is extremely low (we can no longer say for sure that we have very little to lose, given the certain alternative that is a version churn page split). The fact is that "version chains in indexes" still cannot get very long. Plus there are other subtle ways in which it's unlikely to be a problem (e.g. the LP_DEAD deletion stuff also got much better, deduplication can combine with bottom-up deletion so that we'll trigger a useful bottom-up deletion pass sooner or later without much downside, and possibly other things I haven't even thought of). It's possible that I have been too conservative. I admit that my decision on this point is at least a little arbitrary, but I stand by it for now. I cannot feel too bad about theoretically leaving some gains on the table, given that we're only talking about deleting a group of related versions a little later than we would otherwise, and only in some circumstances, and in a way that seems likely to be imperceptible to users. I reserve the right to change my mind about it, but for now it doesn't look like an absurdly good deal, and those are the kind of deals that it makes sense to focus on here. I am very happy about the fact that it is relatively easy to understand why the worst case for bottom-up index deletion cannot be that bad. -- Peter Geoghegan
pgsql-hackers by date: