Re: Turning off HOT/Cleanup sometimes - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Turning off HOT/Cleanup sometimes |
Date | |
Msg-id | CA+TgmoZ73sAe-MFFg7f6OUHq4yfXxzQWrOhkmt+5G=qUhWSZXw@mail.gmail.com Whole thread Raw |
In response to | Re: Turning off HOT/Cleanup sometimes (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Turning off HOT/Cleanup sometimes
|
List | pgsql-hackers |
On Thu, Jan 9, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> The problem with saying that we should let VACUUM do this work is the >> same as the problem with saying that if you're late for your Concorde >> flight, you should go running across the tarmac and try to catch it. >> The cost of dead tuples is related in a linear fashion to the rate at >> which pages are accessed. Not coincidentally, the number of >> opportunities for HOT pruning is *also* related in a linear fashion to >> the rate at which pages are accessed. This is why it works so well. > > That seems like a large oversimplification. Some (most?) of the costs of > dead tuples are proportional to the rate of dead tuple creation. I grant > that there are also some costs proportional to the rate at which scans > visit dead tuples, but I really don't believe that the latter are > dominant. So I think it's bogus to claim that the current behavior is > somehow optimal. This doesn't make any sense to me at all. What costs are proportional to the rate of dead tuple creation? I'm referring specifically to the performance penalty that scans incur for having to skip over dead tuples, and those costs aren't incurred when the tuples are created, but rather when you try to access the still-live data afterwards. If anything, our system is a big WIN at the time tuples are created, precisely because we leave the old tuples around to be cleaned up later rather than getting rid of them at once. That's why, for example, we tend to win delete-heavy benchmarks vs. other database systems. > One more time: the sole reason it works the way it does now is that that > was the path of least resistance back in 2007, and we never yet got around > to trying to optimize that. I'm glad to see someone wanting to revisit > the issue, but I don't think that we necessarily have to go as far as > creating user-visible knobs in order to make it better. Sure, I'm not denying that. The fact that it was the path of least resistance doesn't mean it was a bad idea. I'm happy to see it improved, too, but I think it's important to understand what happens now. And at least on the pgbench tests I've done, what happens is that VACUUM makes no significant contribution to pruning; IIRC, it would have to visit pages at least 1000 times more often to be relevant. So when somebody says "relying on vacuum instead of doing HOT pruning" what I hear is "flush performance down the toilet"... but of course the real way to resolve this is to test whatever patch Simon or someone else eventually posts, not to speculate without data. >> The rate at which vacuuming happens does not ramp up in the same way; >> it's limited by autovacuum cost settings (which people tend not have >> set correctly, and don't adjust themselves on the fly) > > True, but that seems like a pretty well-defined improvement project right > there (as well as an argument against user-visible knobs in general ;-)). > Nasby's speculations just upthread could be useful here, too. Fair point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: