Thread: Optimal checkpoint_setting

Optimal checkpoint_setting

From

pinker

Date:

09 October 2014, 10:52:55

Hello All,
I have a brand new machine to tune:
x3750 M4, 4xE5-4640v2 2.2GHz; 512GBRAM (32x16GB), 4x300GB
SAS + SSD (Easy Tier) in RAID 10

What's particularly important now is to choose optimal configuration for
write operations. We have discussion about checkpoint_segments and
checkpoint_timeout parameters. Test, which was based on pg_replay, has shown
that the biggest amount of data is written when checkpoint_segments are set
to 10 000 and checkpoint_timeout to 30 min, but I'm worrying about amount of
time needed for crash recovery. On the other hand it's mighty machine :) and
I don't wanna go back with 30 segments.

I'd appreciate any helpful remark on that.

The rest of parameters from the test:
shared_buffers=2GB
temp_buffers=128MB
max_files_per_process=500
work_mem=256MB
maintenance_work_mem=128MB
effective_io_concurrency=50
synchronous_commit=local
wal_buffers=512kB
wal_writer_delay=1ms
commit_delay=100
commit_siblings=1000
random_page_cost=1.0
cpu_tuple_cost=2.0
effective_cache_size=500GB
geqo_threshold=20
geqo_effort=10
geqo_selection_bias=2.0
join_collapse_limit=16










--
View this message in context: http://postgresql.1045698.n5.nabble.com/Optimal-checkpoint-setting-tp5822359.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: Optimal checkpoint_setting

From

Jeff Janes

Date:

09 October 2014, 16:55:11

On Thu, Oct 9, 2014 at 3:52 AM, pinker <pinker@onet.eu> wrote:

Hello All,
I have a brand new machine to tune:
x3750 M4, 4xE5-4640v2 2.2GHz; 512GBRAM (32x16GB), 4x300GB
SAS + SSD (Easy Tier) in RAID 10

What's particularly important now is to choose optimal configuration for
write operations. We have discussion about checkpoint_segments and
checkpoint_timeout parameters. Test, which was based on pg_replay, has shown
that the biggest amount of data is written when checkpoint_segments are set
to 10 000 and checkpoint_timeout to 30 min, but I'm worrying about amount of
time needed for crash recovery.

Since you already have pg_replay running, kill -9 some backend (my favorite victim is the bgwriter) during the middle of pg_replay, and see how long it takes to recover.

You might want to try it with and without clobbering the FS cache, or simply rebooting the whole machine, depending on what kind of crash you think is more likely.

Recovering into a cold cache can be painfully slow. If your database mostly fits in memory, you can speed it up by using something (like "tar -cf - pgdata | wc -c" to) read the entire pg data directory in sequential fashion and hopefully cache it. If you find recovery too slow, you might want to try to this trick (and put it in your init scripts) rather than lowering checkpoint_segments.

Cheers,

Jeff

Re: Optimal checkpoint_setting

From

Dennis

Date:

09 October 2014, 20:13:21

Hi, a little off the general topic but I am just wondering if the “tar -cf - pgdata | wc -c” trick can be used as general trick to pre-warm the cache?

Thanks.

On Oct 9, 2014, at 10:55 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Thu, Oct 9, 2014 at 3:52 AM, pinker <pinker@onet.eu> wrote:
Hello All,
I have a brand new machine to tune:
x3750 M4, 4xE5-4640v2 2.2GHz; 512GBRAM (32x16GB), 4x300GB
SAS + SSD (Easy Tier) in RAID 10

What's particularly important now is to choose optimal configuration for
write operations. We have discussion about checkpoint_segments and
checkpoint_timeout parameters. Test, which was based on pg_replay, has shown
that the biggest amount of data is written when checkpoint_segments are set
to 10 000 and checkpoint_timeout to 30 min, but I'm worrying about amount of
time needed for crash recovery.

Since you already have pg_replay running, kill -9 some backend (my favorite victim is the bgwriter) during the middle of pg_replay, and see how long it takes to recover.

You might want to try it with and without clobbering the FS cache, or simply rebooting the whole machine, depending on what kind of crash you think is more likely.

Recovering into a cold cache can be painfully slow. If your database mostly fits in memory, you can speed it up by using something (like "tar -cf - pgdata | wc -c" to) read the entire pg data directory in sequential fashion and hopefully cache it. If you find recovery too slow, you might want to try to this trick (and put it in your init scripts) rather than lowering checkpoint_segments.

Cheers,

Jeff

Re: Optimal checkpoint_setting

From

John R Pierce

Date:

09 October 2014, 20:40:33

On 10/9/2014 1:13 PM, Dennis wrote:
> Hi, a little off the general topic but I am just wondering if the “tar
> -cf - pgdata | wc -c” trick can be used as general trick to pre-warm
> the cache?

I wouldn't do this if your pgdata is larger than about 50% of your
physical ram.



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: Optimal checkpoint_setting

From

Kevin Grittner

Date:

09 October 2014, 20:49:42

John R Pierce <pierce@hogranch.com> wrote:
> On 10/9/2014 1:13 PM, Dennis wrote:

>> Hi, a little off the general topic but I am just wondering if the “tar
>> -cf - pgdata | wc -c” trick can be used as general trick to pre-warm
>> the cache?
>
> I wouldn't do this if your pgdata is larger than about 50% of your
> physical ram.

Nor would I do this if pgdata is anywhere near the size of a memory
segment on a NUMA machine.  I have seen cache pre-warming through a
single process on NUMA absolutely kill performance.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Optimal checkpoint_setting

From

pinker

Date:

14 October 2014, 10:31:09

Thank you for your answer, but I don't have access to this server any more
and have to just interpret and pick some parameters from test results, but
your advice about tar trick is something I'm gonna try for sure in next
test.

So, because I don't have any chance to do some more tests, should I change
checkpoint_segments parameter?
My colleagues prefer old setting as shown below, because of maintenance
reasons, but I still would like to convince them to much higher setting. 30
segments for machine like that seems to be too humble.

checkpoint_segments = 30
checkpoint_timeout = 8min

The rest of config looks like this:
shared_buffers=2GB
temp_buffers=128MB
max_files_per_process=1000
work_mem=384MB
maintenance_work_mem=10240MB
effective_io_concurrency=1
synchronous_commit=on
wal_buffers=16MB
wal_writer_delay=200ms
commit_delay=0
commit_siblings=5
random_page_cost=1.0
cpu_tuple_cost= 0.01
effective_cache_size=450GB
geqo_threshold=12
geqo_effort=5
geqo_selection_bias=2.0
join_collapse_limit=8

Any ideas about rest of config? Maybe those connected with write operations?






--
View this message in context: http://postgresql.1045698.n5.nabble.com/Optimal-checkpoint-setting-tp5822359p5822951.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.