Re: Shared buffers, db transactions commited, and write IO on Solaris - Mailing list pgsql-performance
From | Erik Jones |
---|---|
Subject | Re: Shared buffers, db transactions commited, and write IO on Solaris |
Date | |
Msg-id | 606E8491-AAFB-493E-8FFA-FF26D6DD5ACE@myemma.com Whole thread Raw |
In response to | Re: Shared buffers, db transactions commited, and write IO on Solaris ("dimitri k" <dimitrik.fr@gmail.com>) |
Responses |
Re: Shared buffers, db transactions commited, and write IO on Solaris
|
List | pgsql-performance |
On Mar 29, 2007, at 12:41 PM, dimitri k wrote:
On 3/29/07, Erik Jones <erik@myemma.com> wrote:On Mar 29, 2007, at 11:16 AM, Tom Lane wrote:> Erik Jones <erik@myemma.com> writes:>> We've recently made a couple changes to our system that have resulted>> in a drastic increase in performance as well as some very confusing>> changes to the database statistics, specifically>> pg_stat_database.xact_commit. Here's the details:>> I'm kinda boggled too. I can see how increasing shared buffers could> result in a drastic reduction in write rate, if the working set of> your> queries fits in the new space but didn't fit in the old. I have no> idea> how that leads to a drop in number of transactions committed though.> It doesn't make sense that autovac would run less frequently, because> it's driven by number of tuples changed not number of disk writes; and> that could hardly account for a 10x drop anyway.>> Did you by any chance take note of exactly which processes were> generating all the I/O or the CPU load?Well, wrt to the CPU load, as I said, we're pretty sure that'sautovac as we still get spikes that hit about the same threshold,after which cache hits go up dramatically and the spikes just don'tlast two days anymore.As far as the procs responsible for the writes go, we were unable tosee that from the OS level as the guy we had as a systems admin lastyear totally screwed us with the way he set up the SunCluster on theboxes and we have been unable to run Dtrace which has left uswatching a lot of iostat. However, we did notice a directcorrelation between write spikes and "write intensive" queries likelarge COPYs, UPDATEs, and INSERTs.One very important thing to note here is that the number, or ratherrate, of disk writes has not changed. It's the volume of data inthose writes that has dropped, along with those transactionmysterious counts. Could the bgwriter be the culprit here? Doesanything it does get logged as a transaction?erik jones <erik@myemma.com>software developer615-296-0838emma(r)
Erik,using 'forcedirectio' simply brings your write operations to the*real* volume - means while you need to write 10 bytes you'll write 10bytes (instead of UFS block size (8K)). So it explains me why yourwrite volume became slower.
"forcedirectio
Even if they aren't exact, being off by a factor of 10 wouldn't be believable. the forcedirectio mount setting for ufs can definitely explain the drop in data written volume, but doesn't do much to explain the difference in xact commits.
The forcedirectio (read "force direct IO") UFS option causes data to be buffered in kernel address whenever data is transferred between user address space and the disk. In other words, it bypasses the file system cache. For certain types of applications -- primarily database systems -- this option can dramatically improve performance. In fact, some database experts have argued that a file using the forcedirectio option will outperform a raw partition, though this opinion seems fairly controversial.
The forcedirectio improves file system performance by eliminating double buffering, providing a small, efficient code path for file system reads and writes and removing pressure on memory."
However, what this does mean is that writes will be at the actual filesystem block size and not the cache block size (8K v. 512K).
Now, why TX number is reduced - is a small mystery :)Options:- you really do 10 times less commits, means you work 10 times slower? ;)what about users? how do you measure your work performance?
We are an email marketing service provider with a web front end application. We measure work performance via web requests (counts, types, etc...), mailer activity and the resulting database activity. We are doing as much or more work now than previously, and faster.
- TX reported in pg_* tables are not exact, but I don't believe at all :)
erik jones <erik@myemma.com>
software developer
615-296-0838
emma(r)
pgsql-performance by date: