Re: Eager page freeze criteria clarification - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Eager page freeze criteria clarification |
Date | |
Msg-id | CAAKRu_b8BTWKxtTyayRmdjbh+3VQ-fnFYc2kDRKH62wyp8HUGA@mail.gmail.com Whole thread Raw |
In response to | Re: Eager page freeze criteria clarification (Melanie Plageman <melanieplageman@gmail.com>) |
Responses |
Re: Eager page freeze criteria clarification
Re: Eager page freeze criteria clarification Re: Eager page freeze criteria clarification |
List | pgsql-hackers |
On Fri, Jul 28, 2023 at 3:27 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > On Fri, Jul 28, 2023 at 3:00 PM Peter Geoghegan <pg@bowt.ie> wrote: > > > Is this test meant to guard against unnecessary freezing or to avoid > > > freezing when the cost is too high? That is, are we trying to > > > determine how likely it is that the page has been recently modified > > > and avoid eager freezing when it would be pointless (because the page > > > will soon be modified again)? > > > > Sort of. This cost of freezing over time is weirdly nonlinear, so it's > > hard to give a simple answer. > > > > The justification for the FPI trigger optimization is that FPIs are > > overwhelmingly the cost that really matters when it comes to freezing > > (and vacuuming in general) -- so we might as well make the best out of > > a bad situation when pruning happens to get an FPI. There can easily > > be a 10x or more cost difference (measured in total WAL volume) > > between freezing without an FPI and freezing with an FPI. > ... > > In 16 VACUUM just "makes the best > > out of a bad situation" when an FPI was already required during > > pruning. We have already "paid for the privilege" of writing some WAL > > for the page at that point, so it's reasonable to not squander a > > window of opportunity to avoid future FPIs in future VACUUM > > operations, by freezing early. > > > > We're "taking a chance" on being able to get freezing out of the way > > early when an FPI triggers freezing. It's not guaranteed to work out > > in each individual case, of course, but even if we assume it's fairly > > unlikely to work out (which is very pessimistic) it's still very > > likely a good deal. > > > > This strategy (the 16 strategy of freezing eagerly because we already > > got an FPI) seems safer than a strategy involving freezing eagerly > > because we won't get an FPI as a result. If for no other reason than > > this: with the approach in 16 we already know for sure that we'll have > > written an FPI anyway. It's hard to imagine somebody being okay with > > the FPIs, but not being okay with the other extra WAL. > > I see. I don't have an opinion on the "best of a bad situation" > argument. Though, I think it is worth amending the comment in the code > to include this explanation. > > But, ISTM that there should also be some independent heuristic to > determine whether or not it makes sense to freeze the page. That could > be related to whether or not it will be cheap to do so (in which case > we can check if we will have to emit an FPI as part of the freeze > record) or it could be related to whether or not the freezing is > likely to be pointless (we are likely to update the page again soon). > > It sounds like it was discussed before, but I'd be interested in > revisiting it and happy to test out various ideas. Hi, in service of "testing various ideas", I've benchmarked the heuristics proposed in this thread, as well a few others that Andres and I considered, for determining whether or not to opportunistically freeze a page during vacuum. Note that this heuristic would be in addition to the existing criterion that we only opportunistically freeze pages that can be subsequently set all frozen in the visibility map. I believe that there are two goals that should dictate whether or not we should perform opportunistic freezing: 1. Is it cheap? For example, if the buffer is already dirty, then no write amplification occurs, since it must be written out anyway. Freezing is also less expensive if we can do it without emitting an FPI. 2. Will it be effective; that is, will it stay frozen? Opportunistically freezing a page that will immediately be modified is a waste. The current heuristic on master meets neither of these goals: it freezes a page if pruning emitted an FPI for it. This doesn't evaluate whether or not freezing itself would be cheap, but rather attempts to hide freezing behind an expensive operation. Furthermore, it often fails to freeze cold data and may indiscriminately freeze hot data. For the second goal, I've relied on past data to predict future behavior, so I tried several criteria to estimate the likelihood that a page will not be imminently modified. What was most effective was Andres' suggestion of comparing the page LSN to the insert LSN at the end of the last vacuum of that table; this approximates whether the page has been recently modified, which is a decent proxy for whether it'll be modified in the future. To do this, we need to save that insert LSN somewhere. In the attached WIP patch, I saved it in the table stats, for now -- knowing that those are not crash-safe. Other discarded heuristic ideas included comparing the next transaction ID at the end of the vacuum of a relation to the visibility cutoff xid in the page -- but that wasn't effective for freezing data from bulk loads. The algorithms I evaluated all attempt to satisfy goal (1) by freezing only if the buffer is already dirty and also by considering whether or not an FPI would be emitted. Those that attempt to satisfy goal (2) do so using the LSN comparison with varying thresholds. I ended up testing master and the following five alternatives: 1. Dirty buffer, no FPI required 2. Dirty buffer, no FPI required OR page LSN is older than 10% of the LSNs since the last vacuum of the table. 3. Dirty buffer, no FPI required AND page LSN is older than 10% of the LSNs since the last vacuum of the table. 4. Dirty buffer, no FPI required OR page LSN is older than 33% of the LSNs since the last vacuum of the table. 5. Dirty buffer, no FPI required AND page LSN is older than 33% of the LSNs since the last vacuum of the table. I ran several benchmarks and compared these based on two metrics: 1. Percentage of pages frozen at the end of the benchmark. For workloads with a working set much smaller than their data set, this metric should be high. Conversely, for workloads with a working set that is more-or-less their entire data set, this metric should be low. 2. Page freezes per page frozen at the end of the benchmark. This should be as low as possible. Since each benchmark starts with zero frozen pages, a metric of 1 indicates that each frozen page was frozen only once. Some of the benchmarks were run for a fixed number of transactions. Those that do not specify a number of transactions were run for 45 minutes. I collected metrics from OS utilities and Postgres statistics to examine throughput, FPIs emitted, and many other performance metrics over the course of the benchmark. Below, I've summarized the results and pointed out any notable negative performance impacts. Overall, the two algorithms that seem to strike the best balance are (4) and (5). The OR condition in algorithm (4), as you might expect, results in freezing much more of the cold data in workloads with a smaller working set than data set. It tends to cause more FPIs to be emitted, since the age criteria alone can trigger freezing -- even when the freeze record would contain an FPI. Though, these FPIs may happen when the cold data is eventually frozen in a wraparound vacuum. That is, the absence of FPIs tracks the absence of frozen data quite closely. For a workload in which only 10% of the data is being updated, master often freezes the wrong data and still emits FPIs. For this kind of workload, algorithms 1 and 2 also did not perform well and emitted more FPIs than the other algorithms. In my examples, I found that the 10% cutoff froze data too aggressively -- freezing data that was modified soon after. The workloads I benchmarked were as follows: A. gaussian tpcb-like + select-only: pgbench scale 600 (DB < SB) with indexes on updated columns WL 1: 2 clients tpcb-like pgbench with gaussian access distribution WL 2: 16 clients select-only pgbench freezing more is better B. tpcb-like pgbench scale 600, 16 clients freezing less is better C. shifting hot set, autovacuum off, vacuum at end 1 client inserting a single row and updating an indexed column of that row. 2 million transactions. freezing more is better D. shifting hot set, delete old data 10 MB table with index on updated column WL 1: 1 client inserting one row, updating that row WL 2: 1 client, rate limited to 0.02 TPS, delete old data keeping table at 5000 rows freezing less is better E. shifting hot set, delete new data, access new data WL 1: 1 client cycling through 2 inserts of a single row each, updating an indexed column of the most recently inserted row, and then deleting that row WL 2: rate-limited to 0.2 TPS, selecting data from the last 300 seconds freezing more is better F. many COPYs, autovacuum on 1 client, copying a total of 50 GB of data, autovacuum will run ~2x, ~2 checkpoints freezing more is better G. several COPYs, autovacuum off, vacuum at end 1 client, copying a total of 10 GB of data, no checkpoints freezing more is better H. append only table, autovacuum off, vacuum at end 1 client, inserting a single row at a time for 3million transactions freezing more is better I. work queue 1 client, inserting a row, sleep for half a second, delete that row for 5000 transactions. freezing less is better Note that the page freezes/page frozen metric can be misleading when the overall number of pages freezes is low. This is the case for master. It did few page freezes but those tended to be pages that were modified again soon after. Page Freezes/Page Frozen (less is better) | | Master | (1) | (2) | (3) | (4) | (5) | |---+--------+---------+---------+---------+---------+---------| | A | 28.50 | 3.89 | 1.08 | 1.15 | 1.10 | 1.10 | | B | 1.00 | 1.06 | 1.65 | 1.03 | 1.59 | 1.00 | | C | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | | D | 2.00 | 5199.15 | 5276.85 | 4830.45 | 5234.55 | 2193.55 | | E | 7.90 | 3.21 | 2.73 | 2.70 | 2.69 | 2.43 | | F | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | | G | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | | H | N/A | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | | I | N/A | 42.00 | 42.00 | N/A | 41.00 | N/A | % Frozen at end of run | | Master | (1) | (2) | (3) | (4) | (5) | |---+--------+-----+-----+-----+------+-----+ | A | 0 | 1 | 99 | 0 | 81 | 0 | | B | 71 | 96 | 99 | 3 | 98 | 2 | | C | 0 | 9 | 100 | 6 | 92 | 5 | | D | 0 | 1 | 1 | 1 | 1 | 1 | | E | 0 | 63 | 100 | 68 | 100 | 67 | | F | 0 | 5 | 14 | 6 | 14 | 5 | | G | 0 | 100 | 100 | 92 | 100 | 67 | | H | 0 | 11 | 100 | 9 | 86 | 5 | | I | 0 | 100 | 100 | 0 | 100 | 0 | I can provide exact pgbench commands, configurations, or detailed results upon request. - Melanie
pgsql-hackers by date: