Re: O_DIRECT in freebsd - Mailing list pgsql-hackers
From | Sean Chittenden |
---|---|
Subject | Re: O_DIRECT in freebsd |
Date | |
Msg-id | 20030623040135.GO97131@perrin.int.nxad.com Whole thread Raw |
In response to | Re: O_DIRECT in freebsd (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
pgsql-hackers@postgresql.org
|
List | pgsql-hackers |
> >> it doesn't seem totally out of the question. I'd kinda like to > >> see some experimental evidence that it's worth doing though. > >> Anyone care to make a quick-hack prototype and do some > >> measurements? > > > What would you like to measure? Overall system performance when a > > query is using O_DIRECT or are you looking for negative/postitve > > impact of read() not using the FS cache? The latter is much > > easier to do than the former... recreating a valid load > > environment that'd let any O_DIRECT benchmark be useful isn't > > trivial. > > If this stuff were easy, we'd have done it already ;-). What do you mean? Bits don't just hit the tree randomly because of a possible speed improvement hinted at by a man page reference? :-] > The first problem is to figure out what makes sense to measure. Egh, yeah, and this isn't trivial either.... benchmarking around vfs caching makes it hard to get good results (been down that prim rose path before with sendfile() happiness). > Given that the request is for a quick-and-dirty test, I'd be willing > to cut you some slack on the measurement process. That is, it's > okay to pick something easier to measure over something harder to > measure, as long as you can make a fair argument that what you're > measuring is of any interest at all... hrm, well, given the easy part is thumping out the code, how's the following sound as a test procedure: 1) Write out several files at varying sizes using O_DIRECT (512KB, 1MB, 5MB, 10MB, 50MB, 100MB, 512MB, 1GB) to avoid havingthe FS cache polluted by the writes. 2) Open two new procs that read the above created files with and without O_DIRECT (each test iteration must rewrite thefiles above). 3) Before each read() call (does PostgreSQL use fread(3) or read(2)?), use gettimeofday(2) to get high resolution timingof time required to perform each system call. 4) Perform each of the tests above 4 times, averaging the last three and throwing out the 1st case (though reporting itsvalue may be of interest). I'm not that wild about writing anything threaded unless there's strong enough interest in a write() to an O_DIRECT'ed fd to see what happens. I'm not convinced we'll see anything worth while unless I setup an example that's doing a ton of write disk io. As things stand, because O_DIRECT is an execution fast path through the vfs subsystem, I expect the speed difference to be greater on faster HDDs with high RPMs than on slower IDE machines at only 5400RPM... thus trivializing any benchmark I'll do on my laptop. And actually, if the app can't keep up with the disk, I bet the fs cache case will be faster. If the read()'s are able to keep up at the rate of the HDD, however, this could be a big win in the speed dept, but if things lag for an instant, the platter will have to make another rotation before the call comes back to the userland. Now that I think about it, the optimal case would be to anonymously mmap() a private buffer that does the read() writes into that way the HDD could just DMA the data into the mmap()'ed buffer making it a zero-copy read operation.... though stirring any interest with my mmap() benchmarks from a while back seems to me have been lost in the fray. :) -sc -- Sean Chittenden
pgsql-hackers by date: