Re: Eager page freeze criteria clarification - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Eager page freeze criteria clarification |
Date | |
Msg-id | CAH2-WzmwKS-h4TzXmGz2-8FKu8YSyRMPvqA5pdNZoL4MRNoKEg@mail.gmail.com Whole thread Raw |
In response to | Re: Eager page freeze criteria clarification (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Eager page freeze criteria clarification
|
List | pgsql-hackers |
On Fri, Sep 29, 2023 at 11:27 AM Robert Haas <robertmhaas@gmail.com> wrote: > I think that's true. For me, the issue is what a user is practically > likely to notice and care about. I submit that on a > not-particularly-busy system, it would probably be fine to freeze > aggressively in almost every situation, because you're only incurring > costs you can afford to pay. On a busy system, it's more important to > be right, or at least not too badly wrong. But even on a busy system, > I think that when the time between data being written and being frozen > is more than a few tens of minutes, it's very doubtful that anyone is > going to notice the contribution that freezing makes to the overall > workload. They're much more likely to notice an annoying autovacuum > than they are to notIce a bit of excess freezing that ends up getting > reversed. But when you start cranking the time between writing data > and freezing it down into the single-digit numbers of minutes, and > even more if you push down to tens of seconds or less, now I think > people are going to care more about useless freezing work than about > long-term autovacuum risks. Because now their database is really busy > so they care a lot about performance, and seemingly most of the data > involved is ephemeral anyway. I think that that's all true. As I believe I pointed out at least once in the past, my thinking about the practical requirements from users was influenced by this paper/talk: https://www.microsoft.com/en-us/research/video/cost-performance-in-modern-data-stores-how-data-cashing-systems-succeed/ For the types of applications that Postgres is a natural fit for, most of the data is cold -- very cold ("freezing" cold, even). If you have a 50GB pgbench_accounts table, Postgres will always perform suboptimally. But so will SQL Server, Oracle, and MySQL. While pgbench makes a fine stress-test, for the most part its workload is highly unrealistic. And yet we seem to think that it's just about the most important benchmark of all. If we're not willing to get over even small regressions in pgbench, I fear we'll never make significant progress in this area. It's at least partly a cultural problem IMV. The optimal freezing strategy for pgbench_accounts is to never freeze, even once. It doesn't matter if you vary the distributions of the accounts table updates -- it's still inevitable that each and every row will get updated before too long. In fact, it's not just freezing that should always be avoided here -- same applies with pruning by VACUUM. As I suggested in my last email, the lesson offered by pgbench_accounts table seems to be "never VACUUM at all, except perhaps to advance relfrozenxid" (which shouldn't actually require any freezing even one page). If you haven't tuned heap fill factor, then you might want to VACUUM a bit, at first. But, overall, vacuuming is bad. That is the logical though absurd conclusion. It completely flies in the face of practical experience. -- Peter Geoghegan
pgsql-hackers by date: