Re: Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule? - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule? |
Date | |
Msg-id | 566F4BFB.7060802@2ndquadrant.com Whole thread Raw |
In response to | Re: Let PostgreSQL's On Schedule checkpoint write buffer smooth spread cycle by tuning IsCheckpointOnSchedule? (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: Let PostgreSQL's On Schedule checkpoint write buffer
smooth spread cycle by tuning IsCheckpointOnSchedule?
|
List | pgsql-hackers |
Hi, I was planning to do some review/testing on this patch, but then I noticed it was rejected with feedback in 2015-07 and never resubmitted into another CF. So I won't waste time in testing this unless someone shouts that I should do that anyway. Instead I'll just post some ideas about how we might improve the patch, because I'd forget about them otherwise. On 07/05/2015 09:48 AM, Heikki Linnakangas wrote: > > The ideal correction formula f(x), would be such that f(g(X)) = X, where: > > X is time, 0 = beginning of checkpoint, 1.0 = targeted end of > checkpoint (checkpoint_segments), and > > g(X) is the amount of WAL generated. 0 = beginning of checkpoint, 1.0 > = targeted end of checkpoint (derived from max_wal_size). > > Unfortunately, we don't know the shape of g(X), as that depends on the > workload. It might be linear, if there is no effect at all from > full_page_writes. Or it could be a step-function, where every write > causes a full page write, until all pages have been touched, and after > that none do (something like an UPDATE without a where-clause might > cause that). In pgbench-like workloads, it's something like sqrt(x). I > picked X^1.5 as a reasonable guess. It's close enough to linear that it > shouldn't hurt too much if g(x) is linear. But it cuts the worst spike > at the very beginning, if g(x) is more like sqrt(x). Exactly. I think the main "problem" here is that we do mix two types of WAL records, with quite different characteristics: (a) full_page_writes - very high volume right after checkpoint, then usually drops to much lower volume (b) regular records - about the same volume over time (well, lower volume right after the checkpoint, as that's whereFPWs happen) We completely ignore this when computing elapsed_xlogs, because we compute it (about) like this: elapsed_xlogs = wal_since_checkpoint / CheckPointSegments; which of course gets confused when we write a lot of WAL right after a checkpoint, because of FPW. But what if we actually tracked the amount of WAL produced by FWP in a checkpoint (which we current don't AFAIK)? Then we could compute the expected *remaining* amount of WAL to be produced within the checkpoint interval, and use that to compute a better progress like this: wal_bytes - WAL (total) wal_fpw_bytes - WAL (due to FPW) prev_wal_bytes - WAL (total) in previous checkpoint prev_wal_fpw_bytes - WAL (due to FPW) in previous checkpoint So we know that we should expect about (prev_wal_bytes - wal_bytes) + (prev_wal_fpw_bytes - wal_fpw_bytes) ( regular WAL ) + ( FPW WAL ) to be produced until the end of the current checkpoint. I don't have a clear idea how to transform this into the 'progress' yet, but I'm pretty sure tracking the two types of WAL is a key to a better solution. The x^1.5 is probably a step in the right direction, but I don't feel particularly confident about the 1.5 (which is rather arbitrary). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: