Re: New strategies for freezing, advancing relfrozenxid early - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: New strategies for freezing, advancing relfrozenxid early |
Date | |
Msg-id | CA+TgmoZLj5SBboFQrz-a+dhFdaT6JaMbaZczYJA1c6Oqy+iEnw@mail.gmail.com Whole thread Raw |
In response to | Re: New strategies for freezing, advancing relfrozenxid early (Andres Freund <andres@anarazel.de>) |
Responses |
Re: New strategies for freezing, advancing relfrozenxid early
|
List | pgsql-hackers |
On Fri, Jan 27, 2023 at 12:58 AM Andres Freund <andres@anarazel.de> wrote: > Essentially the "any fpi" logic is a very coarse grained way of using the page > LSN as a measurement. As I said, I don't think "has a checkpoint occurred > since the last write" is a good metric to avoid unnecessary freezing - it's > too coarse. But I think using the LSN is the right thought. What about > something like > > lsn_threshold = insert_lsn - (insert_lsn - lsn_of_last_vacuum) * 0.1 > if (/* other conds */ && PageGetLSN(page) <= lsn_threshold) > FreezeMe(); > > I probably got some details wrong, what I am going for with lsn_threshold is > that we'd freeze an already dirty page if it's not been updated within 10% of > the LSN distance to the last VACUUM. I think this might not be quite the right idea for a couple of reasons. First, suppose that the table is being processed just by autovacuum (no manual VACUUM operations) and that the rate of WAL generation is pretty even, so that LSN age is a good proxy for time. If autovacuum processes the table once per hour, this will freeze if it hasn't been updated in the last six minutes. That sounds good. But if autovacuum processes the table once per day, then this will freeze if it hasn't been updated in 2.4 hours. That might be OK, but it sounds a little on the long side. If autovacuum processes the table once per week, then this will freeze if it hasn't been updated in 16.8 hours. That sounds too conservative. Conversely, if autovacuum processes the table every 3 minutes, then this will freeze the data if it hasn't been updated in the last 18 seconds, which sounds awfully aggressive. Maybe I'm wrong here, but I feel like the absolute amount of wall-clock time we're talking about here probably matters to some degree. I'm not sure whether a strict time-based threshold like, say, 10 minutes would be a good idea, leaving aside the difficulties of implementation. It might be right to think that if the table is being vacuumed a lot, freezing more aggressively is smart, and if it's being vacuumed infrequently, freezing less aggressively is smart, because if the table has enough activity that it's being vacuumed frequently, that might also be a sign that we need to freeze more aggressively in order to avoid having things go sideways. However, I'm not completely sure about that, and I think it's possible that we need some guardrails to avoid going too far in either direction. Second, and more seriously, I think this would, in some circumstances, lead to tremendously unstable behavior. Suppose somebody does a bunch of work on a table and then they're like "oh, we should clean up, VACUUM" and it completes quickly because it's been a while since the last vacuum and so it doesn't freeze much. Then, for whatever reason, they decide to run it one more time, and it goes bananas and starts freezing all kinds of stuff because the LSN distance since the last vacuum is basically zero. Or equally, you run a manual VACUUM, and you get completely different behavior depending on how long it's been since the last autovacuum ran. In some ways, I think this proposal has many of the same problems as vacuum_freeze_min_age. In both cases, the instinct is that we should use something on the page to let us know how long it's been since the page was modified, and proceed on the theory that if the page has not been modified recently, it probably isn't about to be modified again. That's a reasonable instinct, but the rate of XID advancement and the rate of LSN advancement are both highly variable, even on a system that's always under some load. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: