Re: wal-size limited to 16MB - Performance issue for subsequent backup - Mailing list pgsql-hackers
From | Craig Ringer |
---|---|
Subject | Re: wal-size limited to 16MB - Performance issue for subsequent backup |
Date | |
Msg-id | 5446FDA8.5050102@2ndquadrant.com Whole thread Raw |
In response to | wal-size limited to 16MB - Performance issue for subsequent backup (jesper@krogh.cc) |
List | pgsql-hackers |
On 10/21/2014 03:03 AM, jesper@krogh.cc wrote: > That being said, along comes the backup, scheduled ones a day and tries to > read off these wal-files, which to the backup looks like "an awfull lot of > small files", our backup utillized a single thread to read of those files > and levels of at reading through 30-40MB/s from a 21 drive Raid50 of > rotating drives, which is quite bad. That causes a daily incremental run > to take in the order of 24h. Differential picking up larger deltas and > full are even worse. What's the backup system? 151952 files should be a trivial matter for any backup system. I'm very surprised you're seeing those kind of run times for 2TB of WAL, and think it's worth investigating just why the backup system is behaving this way. What does 'filefrag' say about the WAL segments? Are they generally a single extent each? If not, how many extents? It'd be useful to know the kernel version, file system, RAID controller, whether you use LVM, and other relevant details? What's your RAID array's stripe size? > A short test like: > find . -type f -ctime -1 | tail -n 50 | xargs cat | pipebench > /dev/null > confirms the backup speed to be roughly the same as seen by the backup > software. > Another test from the same volume doing: > find . -type f -ctime -1 | tail -n 50 | xargs cat > largefile > And then wait for the fs to not cache the file any more and > cat largefile | pipebench > /dev/null > confirms that the disk-subsystem can do way (150-200MB/s) better on larger > files. OK, so a larger contiguously allocated file looks like it's probably read faster. That doesn't mean there's any guarantee that big WAL segment would be allocated contiguously if there are lots of other writes interspersed, but the FS will try. (What does 'filefrag' say about your 'largefile'?) I'm wondering if you're having issues related to a RAID stripe size that is close to, or bigger than, your WAL segment size. So each segment is only being read from one disk or a couple of disks. If that's the case you're probably not getting ideal write performance either. That said, I don't see any particular reason why readahead wouldn't result in you getting similar results from multiple smaller WAL segments that're allocated contiguously, and they usually would be if they're created one after the other. What are your readahead settings? (There are often several at different levels; what exists depends on how your storage is configured, use of LVM, use of SW RAID, etc). In my opinion RAID 50, or RAID 5, are generally pretty poor options for a database file system in performance terms anyway. Especially for transaction logs. RAID 50 is also not wonderfully durable for arrays of larger numbers of bigger disks given modern disks' sizes, even with the low block error rates and relatively low disk failure rates. I personally tend to consider two parity disks the minimum acceptable for arrays of more than four or five disks. I'd certainly want continuous archiving or streaming replication in place if I was running RAID 50 on a big array. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: