Synchronized Scan update - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Synchronized Scan update |
Date | |
Msg-id | 1172618058.10824.418.camel@dogma.v10.wvs Whole thread Raw |
Responses |
Re: Synchronized Scan update
|
List | pgsql-hackers |
I have found some interesting results from my tests with the Synchronized Scan patch I'm working on. The two benefits that I hope to achieve with the patch are: (1) Better caching behavior with multiple sequential scans running in parallel (2) Faster sequential reads from disk and less seeking I have consistently seen #1 to be true. There is still more testing to be done (hopefully soon), but I haven't found a problem yet. And the benefits I've seen are very substantial, which isn't hard, since in the typical case, a large sequential scan will have 0% cache hit rate. These numbers were retrieved using log_executor_stats=on. #2 however, is a little trickier. IIRC, Tom was the first to point out that the I/O system might not recognize that reads coming from different processes are indeed one sequential read. At first I never saw the problem actually happen, and I assumed that the OS was being smart enough. However, recently I noticed this problem on my home machine, which experienced great caching behavior but poor I/O throughput (as measured by iostat). My home machine was using the Linux CFQ io scheduler, and when I swapped the CFQ io scheduler for the anticipatory scheduler (AS), it worked great. When I sent Josh my patch (per his request) I mentioned the problem I experienced. Then I started investigating, and found some mixed results. My test was basically to use iostat (or zpool iostat) to measure disk throughput, and N processes of "dd if=bigfile of=/dev/null" (started simultaneously) to run the test. I consider the test to be "passed" if the additional processes did not interfere (i.e. each process finished as though it were the only one running). Of course, all tests were I/O bound. My home machine (core 2 duo, single SATA disk, intel controller): Linux/ext3/AS: passed Linux/ext3/CFQ: failed Linux/ext3/noop: passed Linux/ext3/deadline: passed Machine 2 (old thinkpad, IDE disk): Solaris/UFS: failed Solaris/ZFS: passed Machine 3 (dell 2950, LSI PERC/5i controller, 6 SAS disks, RAID-10, adaptive read ahead): FreeBSD/UFS: failed (I suspect the last test would be fine with read ahead always on, and it may just be a problem with the adaptive read ahead feature) There are a lot of factors involved, because several components of the I/O system have the ability to reorder requests or read ahead, such as the block layer and the controller. The block request ordering isn't the only factor because Solaris/UFS only orders the requests by cylinder and moves only in one direction (i.e. looks like a simple elevator algorithm that isn't affected by process id). At least, that's how I understand it. Readahead can't be the only factor either because replacing the io scheduler in Linux solved the problem, even when that replacement was the noop scheduler. Anyway, back to the patch, it looks like there are some complications if you try to use it with the wrong combination of fs, io scheduler, and controller. The patch is designed for certain query patterns anyway, so I don't think that this is a show-stopper. Given the better cache behavior, it seems like it's really the job of the I/O system to get a single, sequential stream of blocks efficiently. The alternative would be to have a single block-reader process, which I don't think we want to do. However, I/O systems don't really seem to know how to handle multiple processes reading from the same file very well. Comments? Regards,Jeff Davis
pgsql-hackers by date: