Re: Changing default value of wal_sync_method to open_datasync onLinux - Mailing list pgsql-hackers
From | Mark Kirkwood |
---|---|
Subject | Re: Changing default value of wal_sync_method to open_datasync onLinux |
Date | |
Msg-id | b7422f6a-91dc-c562-7315-aa6ca64cab5c@catalyst.net.nz Whole thread Raw |
In response to | Changing default value of wal_sync_method to open_datasync on Linux ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>) |
Responses |
RE: Changing default value of wal_sync_method to open_datasync onLinux
|
List | pgsql-hackers |
On 20/02/18 13:27, Tsunakawa, Takayuki wrote: > Hello, > > I propose changing the default value of wal_sync_method from fdatasync to open_datasync on Linux. The patch is attached. I'm feeling this may be controversial, so I'd like to hear your opinions. > > The reason for change is better performance. Robert Haas said open_datasync was much faster than fdatasync with NVRAMin this thread: > > https://www.postgresql.org/message-id/flat/C20D38E97BCB33DAD59E3A1@lab.ntt.co.jp#C20D38E97BCB33DAD59E3A1@lab.ntt.co.jp > > pg_test_fsync shows higher figures for open_datasync: > > [SSD on bare metal, ext4 volume mounted with noatime,nobarrier,data=ordered] > -------------------------------------------------- > 5 seconds per test > O_DIRECT supported on this platform for open_datasync and open_sync. > > Compare file sync methods using one 8kB write: > (in wal_sync_method preference order, except fdatasync is Linux's default) > open_datasync 50829.597 ops/sec 20 usecs/op > fdatasync 42094.381 ops/sec 24 usecs/op > fsync 42209.972 ops/sec 24 usecs/op > fsync_writethrough n/a > open_sync 48669.605 ops/sec 21 usecs/op > -------------------------------------------------- > > > [HDD on VM, ext4 volume mounted with noatime,nobarrier,data=writeback] > (the figures seem oddly high, though; this may be due to some VM configuration) > -------------------------------------------------- > 5 seconds per test > O_DIRECT supported on this platform for open_datasync and open_sync. > > Compare file sync methods using one 8kB write: > (in wal_sync_method preference order, except fdatasync is Linux's default) > open_datasync 34648.778 ops/sec 29 usecs/op > fdatasync 31570.947 ops/sec 32 usecs/op > fsync 27783.283 ops/sec 36 usecs/op > fsync_writethrough n/a > open_sync 35238.866 ops/sec 28 usecs/op > -------------------------------------------------- > > > pgbench only shows marginally better results, although the difference is within an error range. The following is the tpsof the default read/write workload of pgbench. I ran the test with all the tables and indexes preloaded with pg_prewarm(except pgbench_history), and the checkpoint not happening. I ran a write workload before running the benchmarkso that no new WAL file would be created during the benchmark run. > > [SSD on bare metal, ext4 volume mounted with noatime,nobarrier,data=ordered] > -------------------------------------------------- > 1 2 3 avg > fdatasync 17610 17164 16678 17150 > open_datasync 17847 17457 17958 17754 (+3%) > > [HDD on VM, ext4 volume mounted with noatime,nobarrier,data=writeback] > (the figures seem oddly high, though; this may be due to some VM configuration) > -------------------------------------------------- > 1 2 3 avg > fdatasync 4911 5225 5198 5111 > open_datasync 4996 5284 5317 5199 (+1%) > > > As the removed comment describes, when wal_sync_method is open_datasync (or open_sync), open() fails with errno=EINVALif the ext4 volume is mounted with data=journal. That's because open() specifies O_DIRECT in that case. I don'tthink that's a problem in practice, because data=journal will not be used for performance, and wal_level needs to bechanged from its default replica to minimal and max_wal_senders must be set to 0 for O_DIRECT to be used. > > I think the use of 'nobarrier' is probably disabling most/all reliable writing to the devices. What do the numbers look like if use remove this option? regards Mark
pgsql-hackers by date: