Thread: Filesystem options for storing pg_data
Hello all, I am in a position where I'm torn between using ext2 vs ext3 to keep the pg_data, pg_xlog, and pg_clog contents. The main concern is that switching to ext2 will not respond well to an improper shutdown, power loss. My question is what is the prefered filesystem to keep this data to be able to optimize performance and still have some fault tolerance. -Joe
On Wed, 2005-04-20 at 11:07, Joe Maldonado wrote: > Hello all, > > I am in a position where I'm torn between using ext2 vs ext3 to keep the > pg_data, pg_xlog, and pg_clog contents. > > The main concern is that switching to ext2 will not respond well to an > improper shutdown, power loss. > > My question is what is the prefered filesystem to keep this data to be > able to optimize performance and still have some fault tolerance. Generally XFS and JFS are considered superior to ext2/3. ext3, in my experience, isn't much slower than ext2. Plus the decreased time required to bring up a server after a power outage is worth something too. Having used ext3 quite a bit, I'd say it's fairly stable and reliable, but I have seen references here to know, possibly unfixable bugs. I've used XFS a few years back, and there was no great gain for what we were doing at the time, as we were CPU, not I/O bound.
On Wed, 2005-04-20 at 11:18, Scott Marlowe wrote: > On Wed, 2005-04-20 at 11:07, Joe Maldonado wrote: > > Hello all, > > > > I am in a position where I'm torn between using ext2 vs ext3 to keep the > > pg_data, pg_xlog, and pg_clog contents. > > > > The main concern is that switching to ext2 will not respond well to an > > improper shutdown, power loss. > > > > My question is what is the prefered filesystem to keep this data to be > > able to optimize performance and still have some fault tolerance. > > Generally XFS and JFS are considered superior to ext2/3. > > ext3, in my experience, isn't much slower than ext2. Plus the decreased > time required to bring up a server after a power outage is worth > something too. > > Having used ext3 quite a bit, I'd say it's fairly stable and reliable, > but I have seen references here to know, possibly unfixable bugs. > > I've used XFS a few years back, and there was no great gain for what we > were doing at the time, as we were CPU, not I/O bound. Oh, and if you use ext3, definitely turn off atime (use the noatime option at mount time)
[I've got a private reply from Scott, which I won't quote here, which can be fairly (I hope) summarized as "search the pgsql-performance list". Well, I've done it, and I feel it's due to bring the issue back in public. So if I seems I'm replying to myself, it's not, I'm replying to Scott. I've realized the reply was private only just before sending this out.] > > On Wed, 2005-04-20 at 12:07, Marco Colombo wrote: > > > On Wed, 2005-04-20 at 11:18 -0500, Scott Marlowe wrote: > > > > > > Generally XFS and JFS are considered superior to ext2/3. > > > > Do you mind posting a reference? I'm really interested in the comparison > > but everytime I asked for a pointer, I got no valid resource, so far. > [...] Well, my point being the ones I find lead to the conclusion that EXT3 is "considered superior" to XFS and JFS. One for all: http://www.oracle.com/technology/oramag/webcolumns/2002/techarticles/scalzo_linux02.html "It's reassuring when various industry-standard benchmarks yield similar results. In case you're wondering, I obtained similar results with Benchmark Factory's other half dozen or so database benchmarks-so for me, it'll be ext3." Have a look at the graphs, EXT3 is almost twice as fast in these (database) benchmarks. Another one is: http://www.kerneltraffic.org/kernel-traffic/kt20020401_160.html#8 Again ext3 is the winner (among journalled fs), but by a small edge only. And again, there are a lot of variables. Using for example data=journal with a big journal file on a different disk would be extremely interesting, just as using a different disk for WALs is at PostgreSQL level (the result might be the same). All the other benchmarks I've found, with a simple search for 'filesystem benchmark' on the pgsql-performance list, either are the usual Bonnie/iozone irrelevant benchmarks, or don't seem to care to tune ext3 mount options and use the defaults (thus comparing apples to oranges). I'm not stating that EXT3 is better. My opinion on the matter is that you shouldn't care about the filesystem much (EXT3, JFS, XFS being the same for _most_ purposes with PostgreSQL). That is, it's a small little spot in the big picture of performance tuning. You'd better look at the big picture. I'm only countering your claim: "Generally XFS and JFS are considered superior to ext2/3". There's no general agreement on the lists about that, so just handwaving and saying "look at the lists" isn't enough. Mind posting a pointer to _any_ serious PostegreSQL (or any database, at least) based benchmark that consistently shows XFS and JFS as superior? One that cares to show ext3/noatime/data=ordered,data=writeback,data=journal results, too? If I were to choose based on the results posted on the list (that I've managed to find), ext3 would be the winner. Maybe I've missed something. > > > Having used ext3 quite a bit, I'd say it's fairly stable and reliable, > > > but I have seen references here to know, possibly unfixable bugs. > > > > Again, mind posting a reference? > [...] I've searched for 'EXT3 bug' but got nothing. I'm (loosely) following l-k, and never heard of "possibly unfixable bugs" in EXT3 by any developer. Care to post any real reference? There have been bugs of course, but that holds true for everything, XFS and JFS included. Having re-read many many messages right now, I'm under a even stronger impression that _all_ negative comments on both the stability and the performance of EXT3 start with "I've heard that..." w/o almost noone providing direct experience. Many comments display little understanding of the subject: some don't know about data= mount option (there's little point in comparing to XFS, if you don't use data=writeback), some have misconceptions about what the option does, and what difference it makes when the application keeps _syncing_ the files (I don't know well either). See the data=journal case. .TM. -- ____/ ____/ / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _____/ _____/ _/ Colombo@ESI.it
On 4/21/05, Marco Colombo <pgsql@esiway.net> wrote: > > > > Generally XFS and JFS are considered superior to ext2/3. > > > Do you mind posting a reference? I'm really interested in the comparison > > > but everytime I asked for a pointer, I got no valid resource, so far. > Well, my point being the ones I find lead to the conclusion that EXT3 is > "considered superior" to XFS and JFS. One for all: First of all, my workload is not IO bound, so don't consider what I write as solutions for IO heavy setups. Personally I use ext3 (with ~128 KB per inode ratio, to save some space and keep inodes more closely together), with noatime option. I've tried JFS some time ago and got away from it soon after. The reasons were that: 1. JFS dynamic inode allocation left less free space for apps than ext3 (I usually decrease inode ratio to some reasonable limit (like 4 times current ratio for given directory set)). (Yeah, not a serious issue, yet I admit I tend to consider it). 2. FSCK. Back then JFS had an ugly feature of mounting only 'clean' filesystems, i.e. fsck had to be done in userspace (unlike ext3 which does it as a part of mount process). I don't know if it is still that way. 3. Performance. For my workload, mostly single threaded and bursty, ext3 appeared a bit faster. Yet it was a good while ago, JFS might have changed a good bit since then. I have no experience with XFS, but I've heard a lot of good about it. > Again ext3 is the winner (among journalled fs), but by a small edge > only. And again, there are a lot of variables. Using for example > data=journal with a big journal file on a different disk would > be extremely interesting, just as using a different disk for WALs > is at PostgreSQL level (the result might be the same). Some time ago I thought it could be nice thought experiment to 'tune' ext3 for PostgreSQL needs. (Mark WAL files for immediate updates, journal other updates (filesize changes, creations etc), and keep journal close to WAL files... ;) > I'm not stating that EXT3 is better. My opinion on the matter is that > you shouldn't care about the filesystem much (EXT3, JFS, XFS being the > same for _most_ purposes with PostgreSQL). That is, it's a small little > spot in the big picture of performance tuning. You'd better look at the > big picture. > > I'm only countering your claim: > "Generally XFS and JFS are considered superior to ext2/3". You can certainly say that XFS/JFS are more complex and were engineered to better handle high work load. Ext3 is relatively simple; and its simplicity may also be a big advantage when handling high load. Summary: I'm not arguing JFS/XFS are worser/same. All I want to say is that ext3 is a decent filesystem. Ext3's greatest advantage, I guess, is the ease of deployment -- it comes "out of the box" with most distributions. With a little tuning it can perform reasonably well for most needs. Regards, Dawid
References: http://archives.postgresql.org/pgsql-performance/2005-01/msg00131.php http://archives.postgresql.org/pgsql-performance/2004-05/msg00130.php http://archives.postgresql.org/pgsql-performance/2003-08/msg00191.php http://groups-beta.google.com/group/comp.os.linux.misc/msg/b299a71fd540c2b8?q=ext2+corrupt+%22power+failure%22&hl=en&lr=&ie=UTF-8&rnum=9 http://oss.sgi.com/projects/xfs/papers/filesystem-perf-tm.pdf http://www.oracle.com/technology/oramag/webcolumns/2002/techarticles/scalzo_linux02.html http://jamesthornton.com/hotlist/linux-filesystems/ It took me all of about 10 minutes to find all of those. But I've got work to do, so I'll leave further research here to the rest of the list.
On Thu, 21 Apr 2005, Scott Marlowe wrote: > References: > > http://archives.postgresql.org/pgsql-performance/2005-01/msg00131.php > http://archives.postgresql.org/pgsql-performance/2004-05/msg00130.php > http://archives.postgresql.org/pgsql-performance/2003-08/msg00191.php > http://groups-beta.google.com/group/comp.os.linux.misc/msg/b299a71fd540c2b8?q=ext2+corrupt+%22power+failure%22&hl=en&lr=&ie=UTF-8&rnum=9 > http://oss.sgi.com/projects/xfs/papers/filesystem-perf-tm.pdf > http://www.oracle.com/technology/oramag/webcolumns/2002/techarticles/scalzo_linux02.html > http://jamesthornton.com/hotlist/linux-filesystems/ > > It took me all of about 10 minutes to find all of those. But I've got > work to do, so I'll leave further research here to the rest of the list. Thanks for your precious time, but when I say I searched the archives I really mean it. If you cared to read _my_ message, I was looking for any benchmark (or comment) under the following conditions: 1) PostgreSQL load - that is, a benchmarck based on PostgreSQL, or, alternatively, on another database, or on artificial write+fsync load. Any other (cached) write load is _meaningless_ to our purposes. 2) the author was aware of mount options, and actually used them. I think there's enough evidence that ext3 default mount options are on the safe side (_safer_ than other fses, it seems), so there's no point in comparing default ext3 alone (comparing all modes _is_ interesting, tho). I've spend much more than 10 minutes of my time, and found nothing, but the links that _I_ posted. I'll invest more time, and comment on the links you posted (which I had read already, of course): http://archives.postgresql.org/pgsql-performance/2005-01/msg00131.php it's not clear at all, it possibly fails both 1) and 2). The authors says nothing about a write+fsync benchmark or about ext3 mount options. http://archives.postgresql.org/pgsql-performance/2004-05/msg00130.php that's the one I got Bert Scalzo's article from. Other links fail to meet 1) and some 2). Note that fsync is likely to disrupt most optimizations. The fact that a filesystem "scales better" under normal (cached) load, means nothing when it comes to fsyncing. http://archives.postgresql.org/pgsql-performance/2003-08/msg00191.php this _defends_ ext2 from the accusation of being buggy. The author prefers XFS, "but I only have fuzzy reasons, as opposed to metrics." I was looking for metrics. It's says nothing about ext3, so does not apply. These are not from postgresql lists, but anyway: http://groups-beta.google.com/group/comp.os.linux.misc/msg/b299a71fd540c2b8?q=ext2+corrupt+%22power+failure%22&hl=en&lr=&ie=UTF-8&rnum=9 "People are referring to the old ext2 filesystem here. The new ext3 is very resistant to this issue." If you're referring to what "Jinny" said, well all the evidence is "...recently I have come to know from a reliable group that Linux is not so stable". Does not meet 1) and 2), sorry. http://oss.sgi.com/projects/xfs/papers/filesystem-perf-tm.pdf Yes, surprisingly enough I've read this one, too. The only interesting part is "[XFS] Perfomance features include asynchronous write ahead logging (similar to Ext2 " - no, ext3 - " with data=writeback), ...". This confirms my comment about comparing apples and oranges, and completely justifies my requirement 2) - and comes from a XFS paper! It's not clear at all if what they call OLTP Workload really performs fsync after write. Anyway, there's only _one_ graph in the results (how weird) and all filesystems are pretty close. No tests with data=journal. All other graphs in the Appendix fail requirement 1). http://www.oracle.com/technology/oramag/webcolumns/2002/techarticles/scalzo_linux02.html thanks, this is the like that _I_ posted. Have _you_ read it? It shows that EXT3 is almost twice as fast as JFS. Too bad there's no XFS here. BTW, this meets 1), I'm not sure about 2), but the options they used seem enough to outperform JFS. http://jamesthornton.com/hotlist/linux-filesystems/ this is just a collection of links. It's not clear which one would back up your claim of XFS and JFS being _generally_ considered superior for PostgreSQL or other database usage. Let's see: http://www-106.ibm.com/developerworks/linux/library/l-fs8.html "data=ordered mode effectively solves the corruption problem found in data=writeback mode and _most other journaled filesystems_, and it does so without requiring full data journaling" (emphasis mine) interesting enough, most journaled filesystems do have a corruption problem, ext3 in default mode doesn't. But this does not really apply to us, this refers to normal writes not write+fsyncs. I think any fs has to be badly broken if it looses data after fsycn, anyway. http://www-106.ibm.com/developerworks/library/l-fs9.html "Other than that, XFS performance was very close to that of ReiserFS and generally surpasses that of ext3... " uh, this sounds interesting... but wait... "... One of the nicest things about XFS is that, like ReiserFS, it doesn't generate a lot of unnecessary disk activity. XFS tries to cache as much data in memory as possible, and generally only writes things out to disk when memory pressure dictates that it do so." so, if a benchmark shows XFS is faster, it's matter of better caching, right? But it comes at a price of possible (data) corruption... Thankfully, it's pretty useless to us, with every write followed by a sync. I'm sorry, but with the links _you_ selected, applying my filter conditions 1) and 2), which are necessary for a fair comparison, one could say there's general consensus on EXT3 being far superior to other filesystems, not the opposite. Note that I'm not interested in supporting such a claim. As I already wrote I think FS selection has generally a minimal impact on PostgreSQL performance. But again, what was you original claim "Generally XFS and JFS are considered superior to ext2/3." based upon? I apologize if I sound somehow harsh, it's not really intented. But next time please assume that: - I'm able to do a 10 minute search; - I've got some work to do, too, but I'm willing so spend more than 10 minutes on this research (it already took me more than 2 hours actually, of my spare time); - if I say I've searched the lists and read many messages, I've really done so. You're absolutely entitled to have your opinion, if you like XFS and JFS go ahead and use them, because of their name, the names of their sponsors (IBM and SGI), or their features, or your personal experience, or whatever. Just please don't claim that's general consensus for the pgsql lists. There's _no_ general consensus. There's _no_ clear winner. And if you do want a winner anyway, it's ext3, so far. This "ext3 is not good as XFS as JFS" is a recurring subject, as long as "ext3 is buggy". _Every single time_ I've asked for references to back up such claims, nothing valuable was presented. On the contrary, the only references I've found are on the "ext3 is equal or better" side. Now, feel free to prove me wrong. .TM. -- ____/ ____/ / / / / Marco Colombo ___/ ___ / / Technical Manager / / / ESI s.r.l. _____/ _____/ _/ Colombo@ESI.it
Whoa, hold on. My original post was this: QUOTE: Generally XFS and JFS are considered superior to ext2/3. ext3, in my experience, isn't much slower than ext2. Plus the decreased time required to bring up a server after a power outage is worth something too. Having used ext3 quite a bit, I'd say it's fairly stable and reliable, but I have seen references here to know, possibly unfixable bugs. I've used XFS a few years back, and there was no great gain for what we were doing at the time, as we were CPU, not I/O bound. ENDQUOTE: So where do you get off saying I'm such a big fan of XFS and am trashing ext3. You do the research, I'm tired of trying to have a civilized conversation with you. If you wanna argue, go pay someone a quarter to do it, I'm done.