Re: New Linux xfs/reiser file systems - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: New Linux xfs/reiser file systems |
Date | |
Msg-id | 200105041749.f44HnsJ29002@candle.pha.pa.us Whole thread Raw |
In response to | Re: New Linux xfs/reiser file systems (teg@redhat.com (Trond Eivind Glomsrød)) |
Responses |
Re: New Linux xfs/reiser file systems
|
List | pgsql-hackers |
[ Charset ISO-8859-1 unsupported, converting... ] > I got some information from Stephen Tweedie on this - please keep him > "Cc:" as he's not on this list > > ************************************************************************ > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > I was talking to a Linux user yesterday, and he said that performance > > using the xfs file system is pretty bad. He believes it has to do with > > the fact that fsync() on log-based file systems requires more writes. > > > Performance doing what? XFS has known performance problems doing > unlinks and truncates, but not synchronous IO. The user should be > using fdatasync() for databases, btw, not fsync(). This is hugely helpful. In PostgreSQL 7.1, we do use fdatasync() by default it is available on a platform. > First, XFS, ext3 and reiserfs are *NOT* log-based filesystems. They > are journaling filesystems. They have a log, but they are not > log-based because they do not store data permanently in a log > structure. Berkeley LFS, Sprite and Spiralog are log-based > filesystems. Sorry, I get those mixed up. > > With a standard BSD/ext2 file system, WAL writes can stay on the same > > cylinder to perform fsync. Is that true of log-based file systems? > > Not true on ext2 or BSD. Write-aheads are _usually_ close to the > inode, but not always. For true log-based filesystems, writes are > always completely sequential, so the issue just goes away. For > journaling filesystems, depending on the setup there may be a seek to > the journal involved, but some journaling filesystems can use a > separate disk for the journal so no seek is required. > > > I know xfs and reiser are both log based. Do we need to be concerned > > about PostgreSQL performance on these file systems? I use BSD FFS with > > soft updates here, so it doesn't affect me. > > A database normally preallocates its data files and then performs most > of its writes using update-in-place. In such cases, fsync() is almost > always the wrong thing to be doing --- the data writes have changed > nothing in the inode except for the timestamps, and there's no need to > flush the timestamps to disk for every write. fdatasync() is > designed for this --- if the only inode change is timestamps, > fdatasync() will skip the seek to the inode and will only update the > data. If any significant inode fields have been changed, then a full > flush is done. We do pre-allocate our log file space in chunks to avoid inode/block index writes. > Using fdatasync, most filesystems will incur no seeks for data flush, > regardless of whether the filesystem is journaling or not. Thanks. That is a big help. I wonder if people reporting performance problems were using 7.0.3. We only added fdatasync() in 7.1. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
pgsql-hackers by date: