Re: Do we need so many hint bits? - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Do we need so many hint bits? |
Date | |
Msg-id | CA+Tgmoag0BYeEinvNWxFpSiAMTWjK6XiCQiP-U6+GHVn9UdqWw@mail.gmail.com Whole thread Raw |
In response to | Do we need so many hint bits? (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Do we need so many hint bits?
|
List | pgsql-hackers |
On Thu, Nov 15, 2012 at 7:42 PM, Jeff Davis <pgsql@j-davis.com> wrote: > But the other tuple hint bits seem to be there just for symmetry, > because they shouldn't last long. If HEAP_XMIN_INVALID or > HEAP_XMAX_COMMITTED is set, then it's (hopefully) going to be vacuumed > soon, and gone completely. And if HEAP_XMAX_INVALID is set, then it > should just be changed to InvalidTransactionId. "Soon" is a relative term. I doubt that we can really rely on vacuum to be timely enough to avoid pain here - you can easily have tens of thousands of hits on the same tuple before vacuum gets around to dealing with it. Now, we might be able to rejigger things to avoid that. For example, maybe it'd be possible to arrange things so that when we see an invalid xmin, we set the flag that triggers a HOT prune instead of setting the hint bit. That would probably be good enough to dispense with the hint bit, and maybe better altogether better than the current system, because now the next time someone (including us) locks the buffer we'll nuke the entire tuple, which would not only make it cheaper to scan but also frees up space in the buffer sooner. However, that solution only works for invalid-xmin. For committed-xmax, there could actually be quite a long time before the page can be pruned, because there can be some other backend holding an old snapshot open. A one-minute reporting query in another database, which is hardly an unreasonable scenario, could result in many, many additional CLOG lookups, which are already a major contention point at high concurrencies. I think that bit is probably pretty important, and I don't see a viable way to get rid of it, though maybe someone can think of one. For invalid-xmax, I agree that we could probably just change xmax to InvalidTransactionId, if we need to save bit-space. In the past Tom and I think also Alvaro have been skeptical about anything that would overwrite xmin/xmax values too quickly for forensic reasons, but maybe it's worth considering. > Also, I am wondering about PD_ALL_VISIBLE. It was originally introduced > in the visibility map patch, apparently as a way to know when to clear > the VM bit when doing an update. It was then also used for scans, which > showed a significant speedup. But I wonder: why not just use the > visibilitymap directly from those places? Well, you'd have to look up, lock and pin the page to do that. I suspect that overhead is pretty significant. The benefit of noticing that the flag is set is that you need not call HeapTupleSatisfiesMVCC for each tuple on the page: checking one bit in the page header is a lot cheaper than calling that function for every tuple. However, if you had to lock and pin a second page in order to check whether the page is all-visible, I suspect it wouldn't be a win; you'd probably be better off just doing the HeapTupleSatisfiesMVCC checks for each tuple. One of the main advantages of PD_ALL_VISIBLE is that if you do an insert, update, or delete on a page where that bit isn't set, you need not lock and pin the visibility map page, because you already know that the bit will be clear in the visibility map. If the data is being rapidly modified, you'll get the benefit of this optimization most of the time, only losing it when vacuum has visited recently. I hope that's not premature optimization because I sure sweat a lot of blood last release cycle to keep it working like that. I had a few doubts at the time about how much we were winning there, but I don't actually have any hard data either way, so I would be reluctant to assume it doesn't matter. Even if it doesn't, the sequential-scan optimization definitely matters a LOT, as you can easily verify. One approach that I've been hoping to pursue is to find a way to make CLOG lookups cheaper and more concurrent. I started to work on some concurrent hash table code, which you can find here: http://git.postgresql.org/gitweb/?p=users/rhaas/postgres.git;a=shortlog;h=refs/heads/chash The concurrency properties of this code are vastly better than what we have now, but there are cases where it loses vs. dynahash when there's no concurrency. That might be fixable or just not a big deal, though.A bigger problem is that I got sucked off into otherthings before I was able to get as far with it as I wanted to; in particular, I have only unit test results for it, and haven't tried to integrate it into the SLRU code yet. But I'm not sure any of this is going to fundamentally chip away at the need for hint bits all that much. Making CLOG lookups cheaper or less frequent is all to the good, but the prognosis for improving things enough that we can dump some or all of the hint bits completely seems uncertain at best. Even if we COULD dump everything but heap-xmin-committed, how much would that really help with the disk-write problem? I bet heap-xmin-committed gets set far more often than the other three put together. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: