Thread: Optimal checkpoint_setting
Hello All, I have a brand new machine to tune: x3750 M4, 4xE5-4640v2 2.2GHz; 512GBRAM (32x16GB), 4x300GB SAS + SSD (Easy Tier) in RAID 10 What's particularly important now is to choose optimal configuration for write operations. We have discussion about checkpoint_segments and checkpoint_timeout parameters. Test, which was based on pg_replay, has shown that the biggest amount of data is written when checkpoint_segments are set to 10 000 and checkpoint_timeout to 30 min, but I'm worrying about amount of time needed for crash recovery. On the other hand it's mighty machine :) and I don't wanna go back with 30 segments. I'd appreciate any helpful remark on that. The rest of parameters from the test: shared_buffers=2GB temp_buffers=128MB max_files_per_process=500 work_mem=256MB maintenance_work_mem=128MB effective_io_concurrency=50 synchronous_commit=local wal_buffers=512kB wal_writer_delay=1ms commit_delay=100 commit_siblings=1000 random_page_cost=1.0 cpu_tuple_cost=2.0 effective_cache_size=500GB geqo_threshold=20 geqo_effort=10 geqo_selection_bias=2.0 join_collapse_limit=16 -- View this message in context: http://postgresql.1045698.n5.nabble.com/Optimal-checkpoint-setting-tp5822359.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Thu, Oct 9, 2014 at 3:52 AM, pinker <pinker@onet.eu> wrote:
Hello All,
I have a brand new machine to tune:
x3750 M4, 4xE5-4640v2 2.2GHz; 512GBRAM (32x16GB), 4x300GB
SAS + SSD (Easy Tier) in RAID 10
What's particularly important now is to choose optimal configuration for
write operations. We have discussion about checkpoint_segments and
checkpoint_timeout parameters. Test, which was based on pg_replay, has shown
that the biggest amount of data is written when checkpoint_segments are set
to 10 000 and checkpoint_timeout to 30 min, but I'm worrying about amount of
time needed for crash recovery.
Since you already have pg_replay running, kill -9 some backend (my favorite victim is the bgwriter) during the middle of pg_replay, and see how long it takes to recover.
You might want to try it with and without clobbering the FS cache, or simply rebooting the whole machine, depending on what kind of crash you think is more likely.
Recovering into a cold cache can be painfully slow. If your database mostly fits in memory, you can speed it up by using something (like "tar -cf - pgdata | wc -c" to) read the entire pg data directory in sequential fashion and hopefully cache it. If you find recovery too slow, you might want to try to this trick (and put it in your init scripts) rather than lowering checkpoint_segments.
Cheers,
Jeff
Hi, a little off the general topic but I am just wondering if the “tar -cf - pgdata | wc -c” trick can be used as general trick to pre-warm the cache?
Thanks.
On Oct 9, 2014, at 10:55 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Thu, Oct 9, 2014 at 3:52 AM, pinker <pinker@onet.eu> wrote:Hello All,
I have a brand new machine to tune:
x3750 M4, 4xE5-4640v2 2.2GHz; 512GBRAM (32x16GB), 4x300GB
SAS + SSD (Easy Tier) in RAID 10
What's particularly important now is to choose optimal configuration for
write operations. We have discussion about checkpoint_segments and
checkpoint_timeout parameters. Test, which was based on pg_replay, has shown
that the biggest amount of data is written when checkpoint_segments are set
to 10 000 and checkpoint_timeout to 30 min, but I'm worrying about amount of
time needed for crash recovery.Since you already have pg_replay running, kill -9 some backend (my favorite victim is the bgwriter) during the middle of pg_replay, and see how long it takes to recover.You might want to try it with and without clobbering the FS cache, or simply rebooting the whole machine, depending on what kind of crash you think is more likely.Recovering into a cold cache can be painfully slow. If your database mostly fits in memory, you can speed it up by using something (like "tar -cf - pgdata | wc -c" to) read the entire pg data directory in sequential fashion and hopefully cache it. If you find recovery too slow, you might want to try to this trick (and put it in your init scripts) rather than lowering checkpoint_segments.Cheers,Jeff
On 10/9/2014 1:13 PM, Dennis wrote: > Hi, a little off the general topic but I am just wondering if the “tar > -cf - pgdata | wc -c” trick can be used as general trick to pre-warm > the cache? I wouldn't do this if your pgdata is larger than about 50% of your physical ram. -- john r pierce 37N 122W somewhere on the middle of the left coast
John R Pierce <pierce@hogranch.com> wrote: > On 10/9/2014 1:13 PM, Dennis wrote: >> Hi, a little off the general topic but I am just wondering if the “tar >> -cf - pgdata | wc -c” trick can be used as general trick to pre-warm >> the cache? > > I wouldn't do this if your pgdata is larger than about 50% of your > physical ram. Nor would I do this if pgdata is anywhere near the size of a memory segment on a NUMA machine. I have seen cache pre-warming through a single process on NUMA absolutely kill performance. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Thank you for your answer, but I don't have access to this server any more and have to just interpret and pick some parameters from test results, but your advice about tar trick is something I'm gonna try for sure in next test. So, because I don't have any chance to do some more tests, should I change checkpoint_segments parameter? My colleagues prefer old setting as shown below, because of maintenance reasons, but I still would like to convince them to much higher setting. 30 segments for machine like that seems to be too humble. checkpoint_segments = 30 checkpoint_timeout = 8min The rest of config looks like this: shared_buffers=2GB temp_buffers=128MB max_files_per_process=1000 work_mem=384MB maintenance_work_mem=10240MB effective_io_concurrency=1 synchronous_commit=on wal_buffers=16MB wal_writer_delay=200ms commit_delay=0 commit_siblings=5 random_page_cost=1.0 cpu_tuple_cost= 0.01 effective_cache_size=450GB geqo_threshold=12 geqo_effort=5 geqo_selection_bias=2.0 join_collapse_limit=8 Any ideas about rest of config? Maybe those connected with write operations? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Optimal-checkpoint-setting-tp5822359p5822951.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.