Re: New Linux xfs/reiser file systems - Mailing list pgsql-hackers
From | teg@redhat.com (Trond Eivind Glomsrød) |
---|---|
Subject | Re: New Linux xfs/reiser file systems |
Date | |
Msg-id | xuyhez1p341.fsf@halden.devel.redhat.com Whole thread Raw |
In response to | New Linux xfs/reiser file systems (Bruce Momjian <pgman@candle.pha.pa.us>) |
Responses |
Re: New Linux xfs/reiser file systems
|
List | pgsql-hackers |
I got some information from Stephen Tweedie on this - please keep him "Cc:" as he's not on this list ************************************************************************ Bruce Momjian <pgman@candle.pha.pa.us> writes: > I was talking to a Linux user yesterday, and he said that performance > using the xfs file system is pretty bad. He believes it has to do with > the fact that fsync() on log-based file systems requires more writes. Performance doing what? XFS has known performance problems doing unlinks and truncates, but not synchronous IO. The user should be using fdatasync() for databases, btw, not fsync(). First, XFS, ext3 and reiserfs are *NOT* log-based filesystems. They are journaling filesystems. They have a log, but they are not log-based because they do not store data permanently in a log structure. Berkeley LFS, Sprite and Spiralog are log-based filesystems. > With a standard BSD/ext2 file system, WAL writes can stay on the same > cylinder to perform fsync. Is that true of log-based file systems? Not true on ext2 or BSD. Write-aheads are _usually_ close to the inode, but not always. For true log-based filesystems, writes are always completely sequential, so the issue just goes away. For journaling filesystems, depending on the setup there may be a seek to the journal involved, but some journaling filesystems can use a separate disk for the journal so no seek is required. > I know xfs and reiser are both log based. Do we need to be concerned > about PostgreSQL performance on these file systems? I use BSD FFS with > soft updates here, so it doesn't affect me. A database normally preallocates its data files and then performs most of its writes using update-in-place. In such cases, fsync() is almost always the wrong thing to be doing --- the data writes have changed nothing in the inode except for the timestamps, and there's no need to flush the timestamps to disk for every write. fdatasync() is designed for this --- if the only inode change is timestamps, fdatasync() will skip the seek to the inode and will only update the data. If any significant inode fields have been changed, then a full flush is done. Using fdatasync, most filesystems will incur no seeks for data flush, regardless of whether the filesystem is journaling or not. Cheers,Stephen ************************************************************************ -- Trond Eivind Glomsrød Red Hat, Inc.
pgsql-hackers by date: