Thread: Possible performance improvement: buffer replacement policy
Those of you with long memories may recall a benchmark that Edmund Mergl drew our attention to back in May '99. That test showed extremely slow performance for updating a table with many indexes (about 20). At the time, it seemed the problem was due to bad performance of btree with many equal keys, so I thought I'd go back and retry the benchmark after this latest round of btree hackery. The good news is that btree itself seems to be pretty well fixed; the bad news is that the benchmark is still slow for large numbers of rows. The problem is I/O: the CPU mostly sits idle waiting for the disk. As best I can tell, the difficulty is that the working set of pages needed to update this many indexes is too large compared to the number of disk buffers Postgres is using. (I was running with -B 1000 and looking at behavior for a 100000-row test table. This gave me a table size of 3876 pages, plus 11526 pages in 20 indexes.) Of course, there's only so much we can do when the number of buffers is too small, but I still started to wonder if we are using the buffers as effectively as we can. Some tracing showed that most of the pages of the indexes were being read and written multiple times within a single UPDATE query, while most of the pages of the table proper were fetched and written only once. That says we're not using the buffers as well as we could; the index pages are not being kept in memory when they should be. In a query like this, we should displace main-table pages sooner to allow keeping more index pages in cache --- but with the simple LRU replacement method we use, once a page has been loaded it will stay in cache for at least the next NBuffers (-B) page references, no matter what. With a large NBuffers that's a long time. I've come across an interesting article:The LRU-K Page Replacement Algorithm For Database Disk BufferingElizabeth J. O'Neil,Patrick E. O'Neil, Gerhard WeikumProceedings of the 1993 ACM SIGMOD international conferenceon Management of Data,May 1993 (If you subscribe to the ACM digital library, you can get a PDF of this from there.) This article argues that standard LRU buffer management is inherently not great for database caches, and that it's much better to replace pages on the basis of time since the K'th most recent reference, not just time since the most recent one. K=2 is enough to get most of the benefit. The big win is that you are measuring an actual page interreference time (between the last two references) and not just dealing with a lower-bound guess on the interreference time. Frequently used pages are thus much more likely to stay in cache. It looks like it wouldn't take too much work to replace shared buffers on the basis of LRU-2 instead of LRU, so I'm thinking about trying it. Has anyone looked into this area? Is there a better method to try? regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> It looks like it wouldn't take too much work to replace shared buffers >> on the basis of LRU-2 instead of LRU, so I'm thinking about trying it. >> >> Has anyone looked into this area? Is there a better method to try? > Sounds like a perfect idea. Good luck. :-) Actually, the idea went down in flames :-(, but I neglected to report back to pghackers about it. I did do some code to manage buffers as LRU-2. I didn't have any good performance test cases to try it with, but Richard Brosnahan was kind enough to re-run the TPC tests previously published by Great Bridge with that code in place. Wasn't any faster, in fact possibly a little slower, likely due to the extra CPU time spent on buffer freelist management. It's possible that other scenarios might show a better result, but right now I feel pretty discouraged about the LRU-2 idea and am not pursuing it. regards, tom lane
> (If you subscribe to the ACM digital library, you can get a PDF of this > from there.) This article argues that standard LRU buffer management is > inherently not great for database caches, and that it's much better to > replace pages on the basis of time since the K'th most recent reference, > not just time since the most recent one. K=2 is enough to get most of > the benefit. The big win is that you are measuring an actual page > interreference time (between the last two references) and not just > dealing with a lower-bound guess on the interreference time. Frequently > used pages are thus much more likely to stay in cache. > > It looks like it wouldn't take too much work to replace shared buffers > on the basis of LRU-2 instead of LRU, so I'm thinking about trying it. > > Has anyone looked into this area? Is there a better method to try? Sounds like a perfect idea. Good luck. :-) -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom, did we ever test this? I think we did and found that it was the same or worse, right? > Those of you with long memories may recall a benchmark that Edmund Mergl > drew our attention to back in May '99. That test showed extremely slow > performance for updating a table with many indexes (about 20). At the > time, it seemed the problem was due to bad performance of btree with > many equal keys, so I thought I'd go back and retry the benchmark after > this latest round of btree hackery. > > The good news is that btree itself seems to be pretty well fixed; the > bad news is that the benchmark is still slow for large numbers of rows. > The problem is I/O: the CPU mostly sits idle waiting for the disk. > As best I can tell, the difficulty is that the working set of pages > needed to update this many indexes is too large compared to the number > of disk buffers Postgres is using. (I was running with -B 1000 and > looking at behavior for a 100000-row test table. This gave me a table > size of 3876 pages, plus 11526 pages in 20 indexes.) > > Of course, there's only so much we can do when the number of buffers > is too small, but I still started to wonder if we are using the buffers > as effectively as we can. Some tracing showed that most of the pages > of the indexes were being read and written multiple times within a > single UPDATE query, while most of the pages of the table proper were > fetched and written only once. That says we're not using the buffers > as well as we could; the index pages are not being kept in memory when > they should be. In a query like this, we should displace main-table > pages sooner to allow keeping more index pages in cache --- but with > the simple LRU replacement method we use, once a page has been loaded > it will stay in cache for at least the next NBuffers (-B) page > references, no matter what. With a large NBuffers that's a long time. > > I've come across an interesting article: > The LRU-K Page Replacement Algorithm For Database Disk Buffering > Elizabeth J. O'Neil, Patrick E. O'Neil, Gerhard Weikum > Proceedings of the 1993 ACM SIGMOD international conference > on Management of Data, May 1993 > (If you subscribe to the ACM digital library, you can get a PDF of this > from there.) This article argues that standard LRU buffer management is > inherently not great for database caches, and that it's much better to > replace pages on the basis of time since the K'th most recent reference, > not just time since the most recent one. K=2 is enough to get most of > the benefit. The big win is that you are measuring an actual page > interreference time (between the last two references) and not just > dealing with a lower-bound guess on the interreference time. Frequently > used pages are thus much more likely to stay in cache. > > It looks like it wouldn't take too much work to replace shared buffers > on the basis of LRU-2 instead of LRU, so I'm thinking about trying it. > > Has anyone looked into this area? Is there a better method to try? > > regards, tom lane > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom, did we ever test this? I think we did and found that it was the > same or worse, right? I tried it and didn't see any noticeable improvement on the particular test case I was using, so I got discouraged and didn't pursue the idea further. I'd like to come back to it someday, though. regards, tom lane
I added this too to TODO.detail/performance. > On Fri, Jan 19, 2001 at 12:03:58PM -0500, Bruce Momjian wrote: > > > > Tom, did we ever test this? I think we did and found that it was the > > same or worse, right? > > (Funnily enough, I just read that message:) > > To: Bruce Momjian <pgman@candle.pha.pa.us> > cc: pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] Possible performance improvement: buffer replacement policy > In-reply-to: <200010161541.LAA06653@candle.pha.pa.us> > References: <200010161541.LAA06653@candle.pha.pa.us> > Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us> > message dated "Mon, 16 Oct 2000 11:41:41 -0400" > Date: Mon, 16 Oct 2000 11:49:52 -0400 > Message-ID: <26100.971711392@sss.pgh.pa.us> > From: Tom Lane <tgl@sss.pgh.pa.us> > X-Mailing-List: pgsql-hackers@postgresql.org > Precedence: bulk > Sender: pgsql-hackers-owner@hub.org > Status: RO > Content-Length: 947 > Lines: 19 > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> It looks like it wouldn't take too much work to replace shared buffers > >> on the basis of LRU-2 instead of LRU, so I'm thinking about trying it. > >> > >> Has anyone looked into this area? Is there a better method to try? > > > Sounds like a perfect idea. Good luck. :-) > > Actually, the idea went down in flames :-(, but I neglected to report > back to pghackers about it. I did do some code to manage buffers as > LRU-2. I didn't have any good performance test cases to try it with, > but Richard Brosnahan was kind enough to re-run the TPC tests previously > published by Great Bridge with that code in place. Wasn't any faster, > in fact possibly a little slower, likely due to the extra CPU time spent > on buffer freelist management. It's possible that other scenarios might > show a better result, but right now I feel pretty discouraged about the > LRU-2 idea and am not pursuing it. > > regards, tom lane > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Threw it into TODO.detail performance, not optimizer. > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom, did we ever test this? I think we did and found that it was the > > same or worse, right? > > I tried it and didn't see any noticeable improvement on the particular > test case I was using, so I got discouraged and didn't pursue the idea > further. I'd like to come back to it someday, though. > > regards, tom lane > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
I will throw the email into the optimizer TODO.detail file. > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom, did we ever test this? I think we did and found that it was the > > same or worse, right? > > I tried it and didn't see any noticeable improvement on the particular > test case I was using, so I got discouraged and didn't pursue the idea > further. I'd like to come back to it someday, though. > > regards, tom lane > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Fri, Jan 19, 2001 at 12:03:58PM -0500, Bruce Momjian wrote: > > Tom, did we ever test this? I think we did and found that it was the > same or worse, right? (Funnily enough, I just read that message:) To: Bruce Momjian <pgman@candle.pha.pa.us> cc: pgsql-hackers@postgreSQL.org Subject: Re: [HACKERS] Possible performance improvement: buffer replacement policy In-reply-to: <200010161541.LAA06653@candle.pha.pa.us> References: <200010161541.LAA06653@candle.pha.pa.us> Comments: In-reply-to Bruce Momjian <pgman@candle.pha.pa.us>message dated "Mon, 16 Oct 2000 11:41:41 -0400" Date: Mon, 16 Oct 2000 11:49:52 -0400 Message-ID: <26100.971711392@sss.pgh.pa.us> From: Tom Lane <tgl@sss.pgh.pa.us> X-Mailing-List: pgsql-hackers@postgresql.org Precedence: bulk Sender: pgsql-hackers-owner@hub.org Status: RO Content-Length: 947 Lines: 19 Bruce Momjian <pgman@candle.pha.pa.us> writes: >> It looks like it wouldn't take too much work to replace shared buffers >> on the basis of LRU-2 instead of LRU, so I'm thinking about trying it. >> >> Has anyone looked into this area? Is there a better method to try? > Sounds like a perfect idea. Good luck. :-) Actually, the idea went down in flames :-(, but I neglected to report back to pghackers about it. I did do some code to manage buffers as LRU-2. I didn't have any good performance test cases to try it with, but Richard Brosnahan was kind enough to re-run the TPC tests previously published by Great Bridge with that code in place. Wasn't any faster, in fact possibly a little slower, likely due to the extra CPU time spent on buffer freelist management. It's possible that other scenarios might show a better result, but right now I feel pretty discouraged about the LRU-2 idea and am not pursuing it. regards, tom lane