Re: Redesigning checkpoint_segments - Mailing list pgsql-hackers
From | Greg Smith |
---|---|
Subject | Re: Redesigning checkpoint_segments |
Date | |
Msg-id | 51B148D3.9010208@2ndQuadrant.com Whole thread Raw |
In response to | Re: Redesigning checkpoint_segments (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: Redesigning checkpoint_segments
|
List | pgsql-hackers |
On 6/6/13 4:41 AM, Heikki Linnakangas wrote: > I was thinking of letting the estimate > decrease like a moving average, but react to any increases immediately. > Same thing we do in bgwriter to track buffer allocations: Combine what your submitted patch does and this idea, and you'll have something I prototyped a few years ago. I took the logic and tested it out in user space by parsing the output from log_checkpoints to see how many segments were being used. That approach coughed out a value about as good for checkpoint_segments as I picked by hand. The main problem was it liked to over-tune the segments based on a small bursts of activity, leaving a value higher than you might want to use the rest of the time. The background writer didn't worry about this very much because the cost of making a mistake for one 200ms cycle was pretty low. Setting checkpoint_segments high is a more expensive issue. When I set these by hand, I'll aim more to covera 99th percentile of the maximum segments number rather than every worst case seen. I don't think that improvement is worth spending very much effort on though. The moving average approach is more than good enough in most cases. I've wanted checkpoint_segments to go away in exactly this fashion for a while. The general complaint the last time I suggested a change in this area, to make checkpoint_segments larger for the average user, was that some people had seen workloads where that was counterproductive. Pretty sure Kevin Grittner said he'd seen that happen. That's how I remember this general idea dying the last time, and I still don't have enough data to refute that doesn't happen. As far as the UI, if it's a soft limit I'd suggest wal_size_target for the name. What I would like to see is a single number here in memory units that replaces both checkpoint_segments and wal_keep_segments. If you're willing to use a large chunk of disk space to handle either one of activity spikes or the class of replication issues wal_keep_segments targets, I don't see why you'd want to ban using that space for the other one too. To put some perspective on how far we've been able to push this in the field with minimal gripes, the repmgr tool requires wal_keep_segments be >=5000, which works out to 78GB. I still see somepeople use 73GB SAS drives in production servers for their WAL files, but that's the only time I've seen that number become scary when deploying repmgr. Meanwhile, the highest value for checkpoint_segments I've set based on real activity levels was 1024, on a server where checkpoint_timeout is 15 minutes (and can be no shorter without checkpoint spikes). At no point during that fairly difficult but of tuning work did checkpoint_segments do anything but get in the way. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
pgsql-hackers by date: