Thread: shared_buffers documentation
TFM says: Sets the amount of memory the database server uses for shared memory buffers. The default is typically 32 megabytes (32MB), but might be less if your kernel settings will not support it (as determined during initdb). This setting must be at least 128 kilobytes. (Non-default values of BLCKSZ change the minimum.) However, settings significantly higher than the minimum are usually needed for good performance. Several tens of megabytes are recommended for production installations. So if the default is 32MB, and what I'm actually supposed to have is "several tens of megabytes", isn't that pretty much the same thing? I think this advice is badly outdated. s/tens/hundreds/ might be a good idea at a minimum, but I'm thinking we might want to also mention the one-quarter-of-system-memory heuristic. ...Robert
Robert Haas <robertmhaas@gmail.com> wrote: > I think this advice is badly outdated. Yeah. > s/tens/hundreds/ might be a good idea at a minimum, +1 > but I'm thinking we might want to also mention the > one-quarter-of-system-memory heuristic. Given how many people seem to find that a good guideline, it seems like we should. I wonder if we should add any hints telling people what they might see as problems if they are too far one way or the other. (Or does that go beyond the scope of what makes sense in TFM?) -Kevin
On Wed, Apr 14, 2010 at 11:15 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > >> I think this advice is badly outdated. > > Yeah. > >> s/tens/hundreds/ might be a good idea at a minimum, > > +1 > >> but I'm thinking we might want to also mention the >> one-quarter-of-system-memory heuristic. > > Given how many people seem to find that a good guideline, it seems > like we should. I wonder if we should add any hints telling people > what they might see as problems if they are too far one way or the > other. (Or does that go beyond the scope of what makes sense in > TFM?) No, I think that would be reasonable provided someone can come up with some appropriate wording. My understanding is that if you have a really small system then you might need >25% and if you have a really big system you might need <25%, but I'm not sure where the edges are. ...Robert
Robert Haas <robertmhaas@gmail.com> wrote: > I think that would be reasonable provided someone can come up > with some appropriate wording. My understanding is that if you > have a really small system then you might need >25% and if you > have a really big system you might need <25%, but I'm not sure > where the edges are. Yeah, I remember a study which showed 40% as optimal, but I think that was on a server with 2GB RAM, which is smaller than my desktop workstation. (Heck, digital cameras with that much aren't that rare.) We might want to advise that if there are periods of irregular response time, particularly if they are related to the checkpoint cycle, if adjustments to the checkpoint and background writer settings don't completely resolve it, they might want to try reducing shared_buffers. The improvements to the checkpoint and background writer areas in 8.3 helped a lot with this issue, but I don't believe it's been totally eliminated (yet). Perhaps that's now infrequent enough that it's not necessary to mention it. Dunno. I guess I'd be interested to hear Greg Smith weigh in on this one. I've generally stopped tweaking when our web support folks say I've got it to the point where we're not getting any timeouts against our 20 second limit for queries which normally run in less than 1 ms. -Kevin
Kevin Grittner wrote: > I wonder if we should add any hints telling people > what they might see as problems if they are too far one way or the > other. (Or does that go beyond the scope of what makes sense in TFM?) > It's hard to figure that out. One of the talks I'm doing at PGCon next month is focusing on how to monitor things when increasing shared_buffers and the related checkpoint parameters, so that you don't make things worse. It's going to take a solid 45 minutes to cover that, and a section of the manual covering this bit of trivial would be a few pages long and hard to follow. Maybe I'll get that in shape to insert into TFM eventually, but it's a bit bleeding edge to put into there now. Trying to explain it live to other people a couple of times should make it clearer how to describe what I do. As for updating the size recommendations, the text at http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server has been beaten into the status quo by a number of people. Here's what might make sense from there to insert into the docs, removing the bits referring to older versions, rewriting a bit for manual tone, and noting the checkpoint issues: If you have a system with 1GB or more of RAM, a reasonable starting value for shared_buffers on a dedicated database server is 25% of the memory in your system. If you have less RAM, you'll have to account more carefully for how much memory the operating system is taking up, allocating a fraction of the free memory instead. There are some workloads where even larger settings for shared_buffers are effective. But given the way PostgreSQL also relies on the operating system cache, it's unlikely you'll find using more than 40% of RAM to work better than a smaller amount. On Windows, large values for shared_buffers aren't as effective. You may find better results keeping the setting relatively low and using the OS cache more instead. The useful size range for shared_buffers on Windows systems is generally from 64MB to 512MB of RAM. Larger settings for shared_buffers usually require a corresponding increase in checkpoint_segments, in order to spread out writing large quantities of changed or new data in the cache over a longer period of time. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Wed, Apr 14, 2010 at 4:18 PM, Greg Smith <greg@2ndquadrant.com> wrote: > As for updating the size recommendations, the text at > http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server has been > beaten into the status quo by a number of people. Here's what might make > sense from there to insert into the docs, removing the bits referring to > older versions, rewriting a bit for manual tone, and noting the checkpoint > issues: This is good text. I will incorporate it with slight copy editing if no one objects. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Apr 14, 2010 at 4:18 PM, Greg Smith <greg@2ndquadrant.com> wrote: >> As for updating the size recommendations, the text at >> http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server has been >> beaten into the status quo by a number of people. �Here's what might make >> sense from there to insert into the docs, removing the bits referring to >> older versions, rewriting a bit for manual tone, and noting the checkpoint >> issues: > This is good text. I will incorporate it with slight copy editing if > no one objects. Looks good to me too, although perhaps more than the single use of "dedicated" is needed to remind people that these numbers are only appropriate if the machine is not doing anything else than running (one instance of) Postgres. Should we expend a whole sentence on that? regards, tom lane
On Wed, Apr 14, 2010 at 6:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Apr 14, 2010 at 4:18 PM, Greg Smith <greg@2ndquadrant.com> wrote: >>> As for updating the size recommendations, the text at >>> http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server has been >>> beaten into the status quo by a number of people. Here's what might make >>> sense from there to insert into the docs, removing the bits referring to >>> older versions, rewriting a bit for manual tone, and noting the checkpoint >>> issues: > >> This is good text. I will incorporate it with slight copy editing if >> no one objects. > > Looks good to me too, although perhaps more than the single use of > "dedicated" is needed to remind people that these numbers are only > appropriate if the machine is not doing anything else than running > (one instance of) Postgres. Should we expend a whole sentence > on that? IMHO that would be overkill, but that's just MHO. Other opinions? ...Robert
On Wed, Apr 14, 2010 at 4:18 PM, Greg Smith <greg@2ndquadrant.com> wrote: > Kevin Grittner wrote: >> I wonder if we should add any hints telling people >> what they might see as problems if they are too far one way or the >> other. (Or does that go beyond the scope of what makes sense in TFM?) > > It's hard to figure that out. One of the talks I'm doing at PGCon next > month is focusing on how to monitor things when increasing shared_buffers > and the related checkpoint parameters, so that you don't make things worse. > It's going to take a solid 45 minutes to cover that, and a section of the > manual covering this bit of trivial would be a few pages long and hard to > follow. Maybe I'll get that in shape to insert into TFM eventually, but > it's a bit bleeding edge to put into there now. Trying to explain it live > to other people a couple of times should make it clearer how to describe > what I do. > > As for updating the size recommendations, the text at > http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server has been > beaten into the status quo by a number of people. I've incorporated most of this text, with some further editing, into the documentation. I'm halfway tempted to backpatch it to 8.4 and maybe even 8.3, but have refrained from so doing for now. From reading this and other threads, I think I generally understand that the perils of setting shared_buffers too high: memory is needed for other things, like work_mem, a problem which is exacerbated by the fact that there is some double buffering going on. Also, if the buffer cache gets too large, checkpoints can involve writing out enormous amounts of dirty data, which can be bad. It seems intuitive to me that setting shared_buffers too small will also cause a performance problem, especially for write-heavy workloads, but I'm less sure I can clearly explain why. And I'm curious why the correct setting is different on Windows than it is on other platforms. Can anyone shed some light on this? ...Robert
Robert Haas wrote: > It seems intuitive to me that setting shared_buffers too small will > also cause a performance problem, especially for write-heavy > workloads, but I'm less sure I can clearly explain why. More text to add: When the server needs to allocate more space for reading or writing blocks, and the next available space available is a block that's been modified but not used recently, that block will be written out to the operating system. With large settings for shared_buffers, that prefers evicting blocks that are used infrequently from the cache. The main downside to tuning in that direction is that all recently modified blocks not already written must be flushed to disk during each checkpoint, which can cause large amounts of disk writes grouped together. But if shared_buffers is set too low instead, and therefore only a portion of the active working set can be kept in the buffer cache at once, that can cause the same block to be written out more frequently than is optimal. > And I'm curious why the correct setting is different on Windows than it is on > other platforms. Can anyone shed some light on this? > No one has ever come up with a good explanation for why this is other than "Windows doesn't seem to like large amounts of shared memory". But we've seen it show up in too many benchmarks to dismiss. Dave and Greg Stark did benchmarks focused on this: http://archives.postgresql.org/pgsql-hackers/2008-12/msg00003.php that Magnus concurred with last time I tried to dig for more info about this specific subject. And the last time I remember this caming up it was with someone who suggested 8MB (!) worked best on their Windows system: http://archives.postgresql.org/pgsql-general/2009-12/msg00475.php -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Fri, Apr 16, 2010 at 7:24 PM, Greg Smith <greg@2ndquadrant.com> wrote: > Robert Haas wrote: >> It seems intuitive to me that setting shared_buffers too small will >> also cause a performance problem, especially for write-heavy >> workloads, but I'm less sure I can clearly explain why. > > More text to add: > > When the server needs to allocate more space for reading or writing blocks, > and the next available space available is a block that's been modified but > not used recently, that block will be written out to the operating system. > With large settings for shared_buffers, that prefers evicting blocks that > are used infrequently from the cache. The main downside to tuning in that > direction is that all recently modified blocks not already written must be > flushed to disk during each checkpoint, which can cause large amounts of > disk writes grouped together. But if shared_buffers is set too low instead, > and therefore only a portion of the active working set can be kept in the > buffer cache at once, that can cause the same block to be written out more > frequently than is optimal. Well, why can't they just hang out as dirty buffers in the OS cache, which is also designed to solve this problem? >> And I'm curious why the correct setting is different on Windows than it is >> on >> other platforms. Can anyone shed some light on this? >> > > No one has ever come up with a good explanation for why this is other than > "Windows doesn't seem to like large amounts of shared memory". But we've > seen it show up in too many benchmarks to dismiss. Dave and Greg Stark > did benchmarks focused on this: > http://archives.postgresql.org/pgsql-hackers/2008-12/msg00003.php that > Magnus concurred with last time I tried to dig for more info about this > specific subject. And the last time I remember this caming up it was with > someone who suggested 8MB (!) worked best on their Windows system: > http://archives.postgresql.org/pgsql-general/2009-12/msg00475.php I guess the obvious question is whether Windows "doesn't need" more shared memory than that, or whether it "can't effectively use" more memory than that. ...Robert
Robert Haas wrote: > Well, why can't they just hang out as dirty buffers in the OS cache, > which is also designed to solve this problem? > If the OS were guaranteed to be as suitable for this purpose as the approach taken in the database, this might work. But much like the clock sweep approach should outperform a simpler OS caching implementation in many common workloads, there are a couple of spots where making dirty writes the OS's problem can fall down: 1) That presumes that OS write coalescing will solve the problem for you by merging repeat writes, which depending on implementation it might not. 2) On some filesystems, such as ext3, any write with an fsync behind it will flush the whole write cache out and defeat this optimization. Since the spread checkpoint design has some such writes going to the data disk in the middle of the currently processing checkpoing, in those situations that's likely to push the first write of that block to disk before it can be combined with a second. If you'd have kept it in the buffer cache it might survive as long as a full checkpoint cycle longer.. 3) The "timeout" as it were for shared buffers is driven by the distance between checkpoints, typically as long as 5 minutes. The longest a filesystem will hold onto a write is probably less. On Linux it's typically 30 seconds before the OS considers a write important to get out to disk, longest case; if you've already filled a lot of RAM with writes it can be substantially less. > I guess the obvious question is whether Windows "doesn't need" more > shared memory than that, or whether it "can't effectively use" more > memory than that. > It's probably can't effectively use. We know for a fact that applications where blocks regularly accumulate high usage counts and have repeat read/writes to them, which includes pgbench, benefit in several easy to measure ways from using larger amounts of database buffer cache. There's just plain old less churn of buffers going in and out of there. The alternate explanation of "Windows is just so much better at read/write caching that you should give it most of the RAM anyway" doesn't really sound as probable as the more commonly proposed theory "Windows doesn't handle large blocks of shared memory well". Note that there's no discussion of the why behind this is in the commit you just did, just the description of what happens. The reasons why are left undefined, which I feel is appropriate given we really don't know for sure. Still waiting for somebody to let loose the Visual Studio profiler and measure what's causing the degradation at larger sizes. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Fri, Apr 16, 2010 at 9:47 PM, Greg Smith <greg@2ndquadrant.com> wrote: > Robert Haas wrote: >> Well, why can't they just hang out as dirty buffers in the OS cache, >> which is also designed to solve this problem? > > If the OS were guaranteed to be as suitable for this purpose as the approach > taken in the database, this might work. But much like the clock sweep > approach should outperform a simpler OS caching implementation in many > common workloads, there are a couple of spots where making dirty writes the > OS's problem can fall down: > > 1) That presumes that OS write coalescing will solve the problem for you by > merging repeat writes, which depending on implementation it might not. > > 2) On some filesystems, such as ext3, any write with an fsync behind it will > flush the whole write cache out and defeat this optimization. Since the > spread checkpoint design has some such writes going to the data disk in the > middle of the currently processing checkpoing, in those situations that's > likely to push the first write of that block to disk before it can be > combined with a second. If you'd have kept it in the buffer cache it might > survive as long as a full checkpoint cycle longer.. > > 3) The "timeout" as it were for shared buffers is driven by the distance > between checkpoints, typically as long as 5 minutes. The longest a > filesystem will hold onto a write is probably less. On Linux it's typically > 30 seconds before the OS considers a write important to get out to disk, > longest case; if you've already filled a lot of RAM with writes it can be > substantially less. Thanks for the explanation. That makes sense. Does this imply that the problems with shared_buffers being too small are going to be less with a read-mostly load? >> I guess the obvious question is whether Windows "doesn't need" more >> shared memory than that, or whether it "can't effectively use" more >> memory than that. > > It's probably can't effectively use. We know for a fact that applications > where blocks regularly accumulate high usage counts and have repeat > read/writes to them, which includes pgbench, benefit in several easy to > measure ways from using larger amounts of database buffer cache. There's > just plain old less churn of buffers going in and out of there. The > alternate explanation of "Windows is just so much better at read/write > caching that you should give it most of the RAM anyway" doesn't really sound > as probable as the more commonly proposed theory "Windows doesn't handle > large blocks of shared memory well". > > Note that there's no discussion of the why behind this is in the commit you > just did, just the description of what happens. The reasons why are left > undefined, which I feel is appropriate given we really don't know for sure. > Still waiting for somebody to let loose the Visual Studio profiler and > measure what's causing the degradation at larger sizes. Right - my purpose in wanting to revise the documentation was not to give a complete tutorial, which is obviously not practical, but to give people some guidelines that are better than our previous suggestion to use "a few tens of megabytes", which I think we've accomplished. The follow-up questions are mostly for my own benefit rather than the docs... ...Robert
On Wed, Apr 14, 2010 at 4:18 PM, Greg Smith <greg@2ndquadrant.com> wrote: > As for updating the size recommendations, the text at > http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server has been > beaten into the status quo by a number of people. A few other random thoughts on this document: 1. The section on default_statistics_target needs some updating - the default is 100 in 8.4+. 2. Reading the section on checkpoint_segments reminds me, again, that the current value seems extremely conservative on modern hardware. It's quite easy to hit this when doing large bulk data loads or even a big ol' CTAS. I think we should consider raising this for 9.1. I don't have a real strong opinion on what we should raise it TO - I think it's basically a question of how much temporary disk storage we think we can use during a large bulk data load without having users come back and say "wtf?" - but it seems to me that we're not doing ourselves any favors by having this set to a value where the first advice we give our users is "try tripling it". ...Robert
Robert Haas <robertmhaas@gmail.com> wrote: > 2. Reading the section on checkpoint_segments reminds me, again, > that the current value seems extremely conservative on modern > hardware. It's quite easy to hit this when doing large bulk data > loads or even a big ol' CTAS. I think we should consider raising > this for 9.1. Perhaps, but be aware the current default benchmarked better than a larger setting in bulk loads. http://archives.postgresql.org/pgsql-hackers/2009-06/msg01382.php The apparent reason is that when there were fewer of them the WAL files were re-used before the RAID controller flushed them from BBU cache, causing an overall reduction in disk writes. I have little doubt that *without* a good BBU cached controller a larger setting is better, but it's not universally true that bigger is better on this one. -Kevin
On Mon, Apr 19, 2010 at 10:21 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Robert Haas <robertmhaas@gmail.com> wrote: > >> 2. Reading the section on checkpoint_segments reminds me, again, >> that the current value seems extremely conservative on modern >> hardware. It's quite easy to hit this when doing large bulk data >> loads or even a big ol' CTAS. I think we should consider raising >> this for 9.1. > > Perhaps, but be aware the current default benchmarked better > than a larger setting in bulk loads. > > http://archives.postgresql.org/pgsql-hackers/2009-06/msg01382.php > > The apparent reason is that when there were fewer of them the WAL > files were re-used before the RAID controller flushed them from BBU > cache, causing an overall reduction in disk writes. I have little > doubt that *without* a good BBU cached controller a larger setting > is better, but it's not universally true that bigger is better on > this one. I don't actually know what's best. I'm just concerned that we have a default in postgresql.conf and a tuning guide that says "don't do that". Maybe the tuning guide needs to be more nuanced, or maybe postgresql.conf needs to be changed, but it makes no sense to have them saying contradictory things. ...Robert
Robert Haas wrote: > On Mon, Apr 19, 2010 at 10:21 AM, Kevin Grittner > <Kevin.Grittner@wicourts.gov> wrote: > > Robert Haas <robertmhaas@gmail.com> wrote: > > > >> 2. Reading the section on checkpoint_segments reminds me, again, > >> that the current value seems extremely conservative on modern > >> hardware. ?It's quite easy to hit this when doing large bulk data > >> loads or even a big ol' CTAS. ?I think we should consider raising > >> this for 9.1. > > > > Perhaps, but be aware the current default benchmarked better > > than a larger setting in bulk loads. > > > > http://archives.postgresql.org/pgsql-hackers/2009-06/msg01382.php > > > > The apparent reason is that when there were fewer of them the WAL > > files were re-used before the RAID controller flushed them from BBU > > cache, causing an overall reduction in disk writes. ?I have little > > doubt that *without* a good BBU cached controller a larger setting > > is better, but it's not universally true that bigger is better on > > this one. > > I don't actually know what's best. I'm just concerned that we have a > default in postgresql.conf and a tuning guide that says "don't do > that". Maybe the tuning guide needs to be more nuanced, or maybe > postgresql.conf needs to be changed, but it makes no sense to have > them saying contradictory things. The good news about checkpoint_segments is that you get a log file warning message if the value should be increased, i.e. you are checkpointing often than 30 seconds. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Mon, Apr 19, 2010 at 5:36 PM, Bruce Momjian <bruce@momjian.us> wrote: >> I don't actually know what's best. I'm just concerned that we have a >> default in postgresql.conf and a tuning guide that says "don't do >> that". Maybe the tuning guide needs to be more nuanced, or maybe >> postgresql.conf needs to be changed, but it makes no sense to have >> them saying contradictory things. > > The good news about checkpoint_segments is that you get a log file > warning message if the value should be increased, i.e. you are > checkpointing often than 30 seconds. Yeah. I get that warning frequently when I'm creating test tables of dummy data for PG devel purposes. That's actually the main thing that makes me think the default may be too low. ...Robert
Robert Haas wrote: > On Mon, Apr 19, 2010 at 5:36 PM, Bruce Momjian <bruce@momjian.us> wrote: > >> I don't actually know what's best. ?I'm just concerned that we have a > >> default in postgresql.conf and a tuning guide that says "don't do > >> that". ?Maybe the tuning guide needs to be more nuanced, or maybe > >> postgresql.conf needs to be changed, but it makes no sense to have > >> them saying contradictory things. > > > > The good news about checkpoint_segments is that you get a log file > > warning message if the value should be increased, i.e. you are > > checkpointing often than 30 seconds. > > Yeah. I get that warning frequently when I'm creating test tables of > dummy data for PG devel purposes. That's actually the main thing that > makes me think the default may be too low. Well, the point is that you are getting it for _unusual_ circumstances. Seems it is only when you are getting it for typical workloads that it should be increased. However, this is the first time I am hearing that battery-backed cache favors the default value. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Mon, Apr 19, 2010 at 6:06 PM, Bruce Momjian <bruce@momjian.us> wrote: > Robert Haas wrote: >> On Mon, Apr 19, 2010 at 5:36 PM, Bruce Momjian <bruce@momjian.us> wrote: >> >> I don't actually know what's best. ?I'm just concerned that we have a >> >> default in postgresql.conf and a tuning guide that says "don't do >> >> that". ?Maybe the tuning guide needs to be more nuanced, or maybe >> >> postgresql.conf needs to be changed, but it makes no sense to have >> >> them saying contradictory things. >> > >> > The good news about checkpoint_segments is that you get a log file >> > warning message if the value should be increased, i.e. you are >> > checkpointing often than 30 seconds. >> >> Yeah. I get that warning frequently when I'm creating test tables of >> dummy data for PG devel purposes. That's actually the main thing that >> makes me think the default may be too low. > > Well, the point is that you are getting it for _unusual_ circumstances. > Seems it is only when you are getting it for typical workloads that it > should be increased. I guess. I am not sure we should consider "doing a large CTAS" to be an unusual workload, though. Sure, most of us don't do that every day, but what do we get out of having it be slow when we do decide to do it? Up until today, I had never heard anyone say that there was any possible performance trade-off, and... > However, this is the first time I am hearing that > battery-backed cache favors the default value. ...if that's as bad as it gets, I'm still not sure we shouldn't increase the default. Most people will not have their first experience of PG on a server with a battery-backed RAID controller, I'm thinking. And people who do have battery-backed RAID controllers can tune the value down if need be. I have never yet heard anyone justify why all the values in postgresql.conf should be defined as "the lowest value that works best for at least 1 user". Then again, I don't really know what I'm talking about. I think we should be listening very carefully to people who have spent a lot of time tuning this and taking their advice on how it should be set by default. ...Robert
Robert Haas <robertmhaas@gmail.com> wrote: >> However, this is the first time I am hearing that >> battery-backed cache favors the default value. Well, it was discussed on the lists during a CommitFest. > ...if that's as bad as it gets, I'm still not sure we shouldn't > increase the default. Most people will not have their first > experience of PG on a server with a battery-backed RAID > controller, I'm thinking. And people who do have battery-backed > RAID controllers can tune the value down if need be. I have never > yet heard anyone justify why all the values in postgresql.conf > should be defined as "the lowest value that works best for at > least 1 user". > > Then again, I don't really know what I'm talking about. I think > we should be listening very carefully to people who have spent a > lot of time tuning this and taking their advice on how it should > be set by default. I'm not sure we shouldn't change the default either. There seems to be a wealth of experience showing where a bigger value can help, and a fairly narrow use case where (much to my surprise) the lower value helped. Perhaps this just fits under the "be sure to test and tune your own environment" heading, although it is a direction people might not even think to try without some hint that it can help. FWIW, we use very different configurations for bulk loading (like pg_dump piped to psql) than we do for production usage afterward. This has become part of my bag of tricks for bulk loads, but we still use a larger number after the load. Also, I haven't heard of any independent confirmation -- it could be a quirk of our hardware and configuration? Has anyone else benchmarked this to see the impact on bulk loads with BBU cache? -Kevin
Kevin Grittner wrote: > Perhaps, but be aware the current default benchmarked better > than a larger setting in bulk loads. > > http://archives.postgresql.org/pgsql-hackers/2009-06/msg01382.php > > The apparent reason is that when there were fewer of them the WAL > files were re-used before the RAID controller flushed them from BBU > cache, causing an overall reduction in disk writes. I have little > doubt that *without* a good BBU cached controller a larger setting > is better, but it's not universally true that bigger is better on > this one After running some tests, I believe what you observed is more universal than that, because I've been able to replicate a performance drop from a checkpoint_segments increase on a system without a BBWC (laptop with write caching turned off) where I really expected it to help. My working theory is that are a broader set of situations where limiting the working set of WAL files to a small number in order to decrease cache disruption applies than just when you've got hardware caching involved. However, I believe the guidelines to increasing this parameter along with shared_buffers still applies. The real case for wins with more segments is when you also have a large buffer cache, because that's where the write savings from postponed database writes to often used blocks becomes easy to measure. I've found it difficult today to demonstrate a slam-dunk bulk loading improvement through checkpoint_segments increase when shared_buffers is fixed at its default of ~32MB. If that keeps up, I might soon have enough data to bust the idea that it alone improves bulk loading performance when you haven't touched anything else in the default config, which was unexpected to me. Will report back once I've got a full handle on it. Thanks for reminding me about this counter example, it slipped by in that broader thread before and I didn't try doing that myself until today, to see that you're onto something there. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Robert Haas wrote: > > Well, the point is that you are getting it for _unusual_ circumstances. > > Seems it is only when you are getting it for typical workloads that it > > should be increased. > > I guess. I am not sure we should consider "doing a large CTAS" to be > an unusual workload, though. Sure, most of us don't do that every > day, but what do we get out of having it be slow when we do decide to > do it? Up until today, I had never heard anyone say that there was > any possible performance trade-off, and... > > > However, this is the first time I am hearing that > > battery-backed cache favors the default value. > > ...if that's as bad as it gets, I'm still not sure we shouldn't > increase the default. Most people will not have their first > experience of PG on a server with a battery-backed RAID controller, > I'm thinking. And people who do have battery-backed RAID controllers > can tune the value down if need be. I have never yet heard anyone > justify why all the values in postgresql.conf should be defined as > "the lowest value that works best for at least 1 user". > > Then again, I don't really know what I'm talking about. I think we > should be listening very carefully to people who have spent a lot of > time tuning this and taking their advice on how it should be set by > default. The current default was just chosen to reduce the PG disk footprint. It probably should be increased, unless we find that the smaller working set is a win in many cases. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Mon, Apr 19, 2010 at 9:23 PM, Bruce Momjian <bruce@momjian.us> wrote: > Robert Haas wrote: >> > Well, the point is that you are getting it for _unusual_ circumstances. >> > Seems it is only when you are getting it for typical workloads that it >> > should be increased. >> >> I guess. I am not sure we should consider "doing a large CTAS" to be >> an unusual workload, though. Sure, most of us don't do that every >> day, but what do we get out of having it be slow when we do decide to >> do it? Up until today, I had never heard anyone say that there was >> any possible performance trade-off, and... >> >> > However, this is the first time I am hearing that >> > battery-backed cache favors the default value. >> >> ...if that's as bad as it gets, I'm still not sure we shouldn't >> increase the default. Most people will not have their first >> experience of PG on a server with a battery-backed RAID controller, >> I'm thinking. And people who do have battery-backed RAID controllers >> can tune the value down if need be. I have never yet heard anyone >> justify why all the values in postgresql.conf should be defined as >> "the lowest value that works best for at least 1 user". >> >> Then again, I don't really know what I'm talking about. I think we >> should be listening very carefully to people who have spent a lot of >> time tuning this and taking their advice on how it should be set by >> default. > > The current default was just chosen to reduce the PG disk footprint. It > probably should be increased, unless we find that the smaller working > set is a win in many cases. Yeah. 48MB is not much these days. ...Robert
On Apr 16, 2010, at 4:56 PM, Robert Haas wrote: > From reading this and other threads, I think I generally understand > that the perils of setting shared_buffers too high: memory is needed > for other things, like work_mem, a problem which is exacerbated by the > fact that there is some double buffering going on. Also, if the > buffer cache gets too large, checkpoints can involve writing out > enormous amounts of dirty data, which can be bad. I've also seen large shared buffer settings perform poorly outside of IO issues, presumably due to some kind of internallock contention. I tried running 8.3 with 24G for a while, but dropped it back down to our default of 8G after noticingsome performance problems. Unfortunately I don't remember the exact details, let alone having a repeatable test case. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Tue, Apr 20, 2010 at 12:07 PM, Jim Nasby <decibel@decibel.org> wrote: > On Apr 16, 2010, at 4:56 PM, Robert Haas wrote: >> From reading this and other threads, I think I generally understand >> that the perils of setting shared_buffers too high: memory is needed >> for other things, like work_mem, a problem which is exacerbated by the >> fact that there is some double buffering going on. Also, if the >> buffer cache gets too large, checkpoints can involve writing out >> enormous amounts of dirty data, which can be bad. > > I've also seen large shared buffer settings perform poorly outside of IO issues, presumably due to some kind of internallock > contention. I tried running 8.3 with 24G for a while, but dropped it back down to our default of 8G after noticing someperformance > problems. Unfortunately I don't remember the exact details, let alone having a repeatable test case. > i have heard this before, sadly enough i don't have a machine for that kind of tests and can't use my customer's production servers for such things :) so, i always set shared buffers lower than 8Gb even if i have ram for more... someone can confirm the lock contention theory? this should be noticeable at checkpoint time right? -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Jim Nasby wrote: > I've also seen large shared buffer settings perform poorly outside of IO issues, presumably due to some kind of internallock contention. I tried running 8.3 with 24G for a while, but dropped it back down to our default of 8G after noticingsome performance problems. Unfortunately I don't remember the exact details, let alone having a repeatable test case We got a report for Jignesh at Sun once that he had a benchmark workload where there was a clear performance wall at around 10GB of shared_buffers. At http://blogs.sun.com/jkshah/entry/postgresql_east_2008_talk_best he says: "Shared Bufferpool getting better in 8.2, worth to increase it to 3GB (for 32-bit PostgreSQL) but still not great to increase it more than 10GB (for 64-bit PostgreSQL)" So you running into the same wall around the same amount just fuels the existing idea there's an underlying scalablity issue in there. Nobody with that right hardware has put it under the light of a profiler yet as far as I know. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Wed, Apr 21, 2010 at 2:54 AM, Greg Smith <greg@2ndquadrant.com> wrote: > Jim Nasby wrote: >> >> I've also seen large shared buffer settings perform poorly outside of IO >> issues, presumably due to some kind of internal lock contention. I tried >> running 8.3 with 24G for a while, but dropped it back down to our default of >> 8G after noticing some performance problems. Unfortunately I don't remember >> the exact details, let alone having a repeatable test case > > We got a report for Jignesh at Sun once that he had a benchmark workload > where there was a clear performance wall at around 10GB of shared_buffers. > At http://blogs.sun.com/jkshah/entry/postgresql_east_2008_talk_best he > says: > "Shared Bufferpool getting better in 8.2, worth to increase it to 3GB (for > 32-bit PostgreSQL) but still > not great to increase it more than 10GB (for 64-bit PostgreSQL)" > > So you running into the same wall around the same amount just fuels the > existing idea there's an underlying scalablity issue in there. Nobody with > that right hardware has put it under the light of a profiler yet as far as I > know. It might be interesting to see whether increasing NUM_BUFFER_PARTITIONS, LOG2_NUM_LOCK_PARTITIONS, and NUM_LOCK_PARTITIONS alleviates this problem at all. ...Robert