Thread: shared_buffers documentation

shared_buffers documentation

From

Robert Haas

Date:

14 April 2010, 12:06:40

TFM says:

Sets the amount of memory the database server uses for shared memory
buffers. The default is typically 32 megabytes (32MB), but might be
less if your kernel settings will not support it (as determined during
initdb). This setting must be at least 128 kilobytes. (Non-default
values of BLCKSZ change the minimum.) However, settings significantly
higher than the minimum are usually needed for good performance.
Several tens of megabytes are recommended for production
installations.

So if the default is 32MB, and what I'm actually supposed to have is
"several tens of megabytes", isn't that pretty much the same thing?  I
think this advice is badly outdated.  s/tens/hundreds/ might be a good
idea at a minimum, but I'm thinking we might want to also mention the
one-quarter-of-system-memory heuristic.

...Robert

Re: shared_buffers documentation

From

"Kevin Grittner"

Date:

14 April 2010, 12:15:35

Robert Haas <robertmhaas@gmail.com> wrote:
> I think this advice is badly outdated.
Yeah.
> s/tens/hundreds/ might be a good idea at a minimum,
+1
> but I'm thinking we might want to also mention the
> one-quarter-of-system-memory heuristic.
Given how many people seem to find that a good guideline, it seems
like we should.  I wonder if we should add any hints telling people
what they might see as problems if they are too far one way or the
other.  (Or does that go beyond the scope of what makes sense in
TFM?)
-Kevin

Re: shared_buffers documentation

From

Robert Haas

Date:

14 April 2010, 14:37:38

On Wed, Apr 14, 2010 at 11:15 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>
>> I think this advice is badly outdated.
>
> Yeah.
>
>> s/tens/hundreds/ might be a good idea at a minimum,
>
> +1
>
>> but I'm thinking we might want to also mention the
>> one-quarter-of-system-memory heuristic.
>
> Given how many people seem to find that a good guideline, it seems
> like we should.  I wonder if we should add any hints telling people
> what they might see as problems if they are too far one way or the
> other.  (Or does that go beyond the scope of what makes sense in
> TFM?)

No, I think that would be reasonable provided someone can come up with
some appropriate wording.  My understanding is that if you have a
really small system then you might need >25% and if you have a really
big system you might need <25%, but I'm not sure where the edges are.

...Robert

Re: shared_buffers documentation

From

"Kevin Grittner"

Date:

14 April 2010, 15:04:24

Robert Haas <robertmhaas@gmail.com> wrote:
> I think that would be reasonable provided someone can come up
> with some appropriate wording.  My understanding is that if you
> have a really small system then you might need >25% and if you
> have a really big system you might need <25%, but I'm not sure
> where the edges are.
Yeah, I remember a study which showed 40% as optimal, but I think
that was on a server with 2GB RAM, which is smaller than my desktop
workstation.  (Heck, digital cameras with that much aren't that
rare.)
We might want to advise that if there are periods of irregular
response time, particularly if they are related to the checkpoint
cycle, if adjustments to the checkpoint and background writer
settings don't completely resolve it, they might want to try
reducing shared_buffers.  The improvements to the checkpoint and
background writer areas in 8.3 helped a lot with this issue, but I
don't believe it's been totally eliminated (yet).  Perhaps that's
now infrequent enough that it's not necessary to mention it.  Dunno.
I guess I'd be interested to hear Greg Smith weigh in on this one.
I've generally stopped tweaking when our web support folks say I've
got it to the point where we're not getting any timeouts against our
20 second limit for queries which normally run in less than 1 ms.
-Kevin

Re: shared_buffers documentation

From

Greg Smith

Date:

14 April 2010, 17:19:12

Kevin Grittner wrote:
> I wonder if we should add any hints telling people
> what they might see as problems if they are too far one way or the
> other.  (Or does that go beyond the scope of what makes sense in TFM?)
>   

It's hard to figure that out.  One of the talks I'm doing at PGCon next 
month is focusing on how to monitor things when increasing 
shared_buffers and the related checkpoint parameters, so that you don't 
make things worse.  It's going to take a solid 45 minutes to cover that, 
and a section of the manual covering this bit of trivial would be a few 
pages long and hard to follow.  Maybe I'll get that in shape to insert 
into TFM eventually, but it's a bit bleeding edge to put into there 
now.  Trying to explain it live to other people a couple of times should 
make it clearer how to describe what I do.

As for updating the size recommendations, the text at 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server has been 
beaten into the status quo by a number of people.  Here's what might 
make sense from there to insert into the docs, removing the bits 
referring to older versions, rewriting a bit for manual tone, and noting 
the checkpoint issues:

If you have a system with 1GB or more of RAM, a reasonable starting 
value for shared_buffers on a dedicated database server is 25% of the 
memory in your system. If you have less RAM, you'll have to account more 
carefully for how much memory the operating system is taking up, 
allocating a fraction of the free memory instead. There are some 
workloads where even larger settings for shared_buffers are effective.  
But given the way PostgreSQL also relies on the operating system cache, 
it's unlikely you'll find using more than 40% of RAM to work better than 
a smaller amount.

On Windows, large values for shared_buffers aren't as effective.  You 
may find better results keeping the setting relatively low and using the 
OS cache more instead. The useful size range for shared_buffers on 
Windows systems is generally from 64MB to 512MB of RAM.

Larger settings for shared_buffers usually require a corresponding 
increase in checkpoint_segments, in order to spread out writing large 
quantities of changed or new data in the cache over a longer period of time.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: shared_buffers documentation

From

Robert Haas

Date:

14 April 2010, 17:23:47

On Wed, Apr 14, 2010 at 4:18 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> As for updating the size recommendations, the text at
> http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server has been
> beaten into the status quo by a number of people.  Here's what might make
> sense from there to insert into the docs, removing the bits referring to
> older versions, rewriting a bit for manual tone, and noting the checkpoint
> issues:

This is good text.  I will incorporate it with slight copy editing if
no one objects.

...Robert

Re: shared_buffers documentation

From

Tom Lane

Date:

14 April 2010, 19:21:07

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Apr 14, 2010 at 4:18 PM, Greg Smith <greg@2ndquadrant.com> wrote:
>> As for updating the size recommendations, the text at
>> http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server has been
>> beaten into the status quo by a number of people. �Here's what might make
>> sense from there to insert into the docs, removing the bits referring to
>> older versions, rewriting a bit for manual tone, and noting the checkpoint
>> issues:

> This is good text.  I will incorporate it with slight copy editing if
> no one objects.

Looks good to me too, although perhaps more than the single use of
"dedicated" is needed to remind people that these numbers are only
appropriate if the machine is not doing anything else than running
(one instance of) Postgres.  Should we expend a whole sentence
on that?
        regards, tom lane

Re: shared_buffers documentation

From

Robert Haas

Date:

14 April 2010, 19:53:59

On Wed, Apr 14, 2010 at 6:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Apr 14, 2010 at 4:18 PM, Greg Smith <greg@2ndquadrant.com> wrote:
>>> As for updating the size recommendations, the text at
>>> http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server has been
>>> beaten into the status quo by a number of people.  Here's what might make
>>> sense from there to insert into the docs, removing the bits referring to
>>> older versions, rewriting a bit for manual tone, and noting the checkpoint
>>> issues:
>
>> This is good text.  I will incorporate it with slight copy editing if
>> no one objects.
>
> Looks good to me too, although perhaps more than the single use of
> "dedicated" is needed to remind people that these numbers are only
> appropriate if the machine is not doing anything else than running
> (one instance of) Postgres.  Should we expend a whole sentence
> on that?

IMHO that would be overkill, but that's just MHO.  Other opinions?

...Robert

Re: shared_buffers documentation

From

Robert Haas

Date:

16 April 2010, 18:56:19

On Wed, Apr 14, 2010 at 4:18 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> Kevin Grittner wrote:
>> I wonder if we should add any hints telling people
>> what they might see as problems if they are too far one way or the
>> other.  (Or does that go beyond the scope of what makes sense in TFM?)
>
> It's hard to figure that out.  One of the talks I'm doing at PGCon next
> month is focusing on how to monitor things when increasing shared_buffers
> and the related checkpoint parameters, so that you don't make things worse.
>  It's going to take a solid 45 minutes to cover that, and a section of the
> manual covering this bit of trivial would be a few pages long and hard to
> follow.  Maybe I'll get that in shape to insert into TFM eventually, but
> it's a bit bleeding edge to put into there now.  Trying to explain it live
> to other people a couple of times should make it clearer how to describe
> what I do.
>
> As for updating the size recommendations, the text at
> http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server has been
> beaten into the status quo by a number of people.

I've incorporated most of this text, with some further editing, into
the documentation.  I'm halfway tempted to backpatch it to 8.4 and
maybe even 8.3, but have refrained from so doing for now.

From reading this and other threads, I think I generally understand
that the perils of setting shared_buffers too high: memory is needed
for other things, like work_mem, a problem which is exacerbated by the
fact that there is some double buffering going on.  Also, if the
buffer cache gets too large, checkpoints can involve writing out
enormous amounts of dirty data, which can be bad.

It seems intuitive to me that setting shared_buffers too small will
also cause a performance problem, especially for write-heavy
workloads, but I'm less sure I can clearly explain why.  And I'm
curious why the correct setting is different on Windows than it is on
other platforms.  Can anyone shed some light on this?

...Robert

Re: shared_buffers documentation

From

Greg Smith

Date:

16 April 2010, 20:25:08

Robert Haas wrote:
> It seems intuitive to me that setting shared_buffers too small will
> also cause a performance problem, especially for write-heavy
> workloads, but I'm less sure I can clearly explain why.  

More text to add:

When the server needs to allocate more space for reading or writing 
blocks, and the next available space available is a block that's been 
modified but not used recently, that block will be written out to the 
operating system.  With large settings for shared_buffers, that prefers 
evicting blocks that are used infrequently from the cache.  The main 
downside to tuning in that direction is that all recently modified 
blocks not already written must be flushed to disk during each 
checkpoint, which can cause large amounts of disk writes grouped 
together.  But if shared_buffers is set too low instead, and therefore 
only a portion of the active working set can be kept in the buffer cache 
at once, that can cause the same block to be written out more frequently 
than is optimal.

> And I'm curious why the correct setting is different on Windows than it is on
> other platforms.  Can anyone shed some light on this?
>   

No one has ever come up with a good explanation for why this is other 
than "Windows doesn't seem to like large amounts of shared memory".  But 
we've seen it show up in too many benchmarks to dismiss.    Dave and 
Greg Stark did benchmarks focused on this:  
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00003.php that 
Magnus concurred with last time I tried to dig for more info about this 
specific subject.  And the last time I remember this caming up it was 
with someone who suggested 8MB (!) worked best on their Windows system:  
http://archives.postgresql.org/pgsql-general/2009-12/msg00475.php

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: shared_buffers documentation

From

Robert Haas

Date:

16 April 2010, 21:37:32

On Fri, Apr 16, 2010 at 7:24 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> Robert Haas wrote:
>> It seems intuitive to me that setting shared_buffers too small will
>> also cause a performance problem, especially for write-heavy
>> workloads, but I'm less sure I can clearly explain why.
>
> More text to add:
>
> When the server needs to allocate more space for reading or writing blocks,
> and the next available space available is a block that's been modified but
> not used recently, that block will be written out to the operating system.
>  With large settings for shared_buffers, that prefers evicting blocks that
> are used infrequently from the cache.  The main downside to tuning in that
> direction is that all recently modified blocks not already written must be
> flushed to disk during each checkpoint, which can cause large amounts of
> disk writes grouped together.  But if shared_buffers is set too low instead,
> and therefore only a portion of the active working set can be kept in the
> buffer cache at once, that can cause the same block to be written out more
> frequently than is optimal.

Well, why can't they just hang out as dirty buffers in the OS cache,
which is also designed to solve this problem?

>> And I'm curious why the correct setting is different on Windows than it is
>> on
>> other platforms.  Can anyone shed some light on this?
>>
>
> No one has ever come up with a good explanation for why this is other than
> "Windows doesn't seem to like large amounts of shared memory".  But we've
> seen it show up in too many benchmarks to dismiss.    Dave and Greg Stark
> did benchmarks focused on this:
>  http://archives.postgresql.org/pgsql-hackers/2008-12/msg00003.php that
> Magnus concurred with last time I tried to dig for more info about this
> specific subject.  And the last time I remember this caming up it was with
> someone who suggested 8MB (!) worked best on their Windows system:
>  http://archives.postgresql.org/pgsql-general/2009-12/msg00475.php

I guess the obvious question is whether Windows "doesn't need" more
shared memory than that, or whether it "can't effectively use" more
memory than that.

...Robert

Re: shared_buffers documentation

From

Greg Smith

Date:

16 April 2010, 22:47:58

Robert Haas wrote:
> Well, why can't they just hang out as dirty buffers in the OS cache,
> which is also designed to solve this problem?
>   

If the OS were guaranteed to be as suitable for this purpose as the 
approach taken in the database, this might work.  But much like the 
clock sweep approach should outperform a simpler OS caching 
implementation in many common workloads, there are a couple of spots 
where making dirty writes the OS's problem can fall down:

1) That presumes that OS write coalescing will solve the problem for you 
by merging repeat writes, which depending on implementation it might not.

2) On some filesystems, such as ext3, any write with an fsync behind it 
will flush the whole write cache out and defeat this optimization.  
Since the spread checkpoint design has some such writes going to the 
data disk in the middle of the currently processing checkpoing, in those 
situations that's likely to push the first write of that block to disk 
before it can be combined with a second.  If you'd have kept it in the 
buffer cache it might survive as long as a full checkpoint cycle longer..

3) The "timeout" as it were for shared buffers is driven by the distance 
between checkpoints, typically as long as 5 minutes.  The longest a 
filesystem will hold onto a write is probably less.  On Linux it's 
typically 30 seconds before the OS considers a write important to get 
out to disk, longest case; if you've already filled a lot of RAM with 
writes it can be substantially less.

> I guess the obvious question is whether Windows "doesn't need" more
> shared memory than that, or whether it "can't effectively use" more
> memory than that.
>   

It's probably can't effectively use.  We know for a fact that 
applications where blocks regularly accumulate high usage counts and 
have repeat read/writes to them, which includes pgbench, benefit in 
several easy to measure ways from using larger amounts of database 
buffer cache.  There's just plain old less churn of buffers going in and 
out of there.  The alternate explanation of "Windows is just so much 
better at read/write caching that you should give it most of the RAM 
anyway" doesn't really sound as probable as the more commonly proposed 
theory "Windows doesn't handle large blocks of shared memory well".

Note that there's no discussion of the why behind this is in the commit 
you just did, just the description of what happens.  The reasons why are 
left undefined, which I feel is appropriate given we really don't know 
for sure.  Still waiting for somebody to let loose the Visual Studio 
profiler and measure what's causing the degradation at larger sizes.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: shared_buffers documentation

From

Robert Haas

Date:

16 April 2010, 23:08:24

On Fri, Apr 16, 2010 at 9:47 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> Robert Haas wrote:
>> Well, why can't they just hang out as dirty buffers in the OS cache,
>> which is also designed to solve this problem?
>
> If the OS were guaranteed to be as suitable for this purpose as the approach
> taken in the database, this might work.  But much like the clock sweep
> approach should outperform a simpler OS caching implementation in many
> common workloads, there are a couple of spots where making dirty writes the
> OS's problem can fall down:
>
> 1) That presumes that OS write coalescing will solve the problem for you by
> merging repeat writes, which depending on implementation it might not.
>
> 2) On some filesystems, such as ext3, any write with an fsync behind it will
> flush the whole write cache out and defeat this optimization.  Since the
> spread checkpoint design has some such writes going to the data disk in the
> middle of the currently processing checkpoing, in those situations that's
> likely to push the first write of that block to disk before it can be
> combined with a second.  If you'd have kept it in the buffer cache it might
> survive as long as a full checkpoint cycle longer..
>
> 3) The "timeout" as it were for shared buffers is driven by the distance
> between checkpoints, typically as long as 5 minutes.  The longest a
> filesystem will hold onto a write is probably less.  On Linux it's typically
> 30 seconds before the OS considers a write important to get out to disk,
> longest case; if you've already filled a lot of RAM with writes it can be
> substantially less.

Thanks for the explanation.  That makes sense.  Does this imply that
the problems with shared_buffers being too small are going to be less
with a read-mostly load?

>> I guess the obvious question is whether Windows "doesn't need" more
>> shared memory than that, or whether it "can't effectively use" more
>> memory than that.
>
> It's probably can't effectively use.  We know for a fact that applications
> where blocks regularly accumulate high usage counts and have repeat
> read/writes to them, which includes pgbench, benefit in several easy to
> measure ways from using larger amounts of database buffer cache.  There's
> just plain old less churn of buffers going in and out of there.  The
> alternate explanation of "Windows is just so much better at read/write
> caching that you should give it most of the RAM anyway" doesn't really sound
> as probable as the more commonly proposed theory "Windows doesn't handle
> large blocks of shared memory well".
>
> Note that there's no discussion of the why behind this is in the commit you
> just did, just the description of what happens.  The reasons why are left
> undefined, which I feel is appropriate given we really don't know for sure.
>  Still waiting for somebody to let loose the Visual Studio profiler and
> measure what's causing the degradation at larger sizes.

Right - my purpose in wanting to revise the documentation was not to
give a complete tutorial, which is obviously not practical, but to
give people some guidelines that are better than our previous
suggestion to use "a few tens of megabytes", which I think we've
accomplished.  The follow-up questions are mostly for my own benefit
rather than the docs...

...Robert

Re: shared_buffers documentation

From

Robert Haas

Date:

18 April 2010, 22:08:35

On Wed, Apr 14, 2010 at 4:18 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> As for updating the size recommendations, the text at
> http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server has been
> beaten into the status quo by a number of people.

A few other random thoughts on this document:

1. The section on default_statistics_target needs some updating - the
default is 100 in 8.4+.

2. Reading the section on checkpoint_segments reminds me, again, that
the current value seems extremely conservative on modern hardware.
It's quite easy to hit this when doing large bulk data loads or even a
big ol' CTAS.  I think we should consider raising this for 9.1.  I
don't have a real strong opinion on what we should raise it TO - I
think it's basically a question of how much temporary disk storage we
think we can use during a large bulk data load without having users
come back and say "wtf?" - but it seems to me that we're not doing
ourselves any favors by having this set to a value where the first
advice we give our users is "try tripling it".

...Robert

Re: shared_buffers documentation

From

"Kevin Grittner"

Date:

19 April 2010, 11:21:32

Robert Haas <robertmhaas@gmail.com> wrote:
> 2. Reading the section on checkpoint_segments reminds me, again,
> that the current value seems extremely conservative on modern
> hardware.  It's quite easy to hit this when doing large bulk data
> loads or even a big ol' CTAS.  I think we should consider raising
> this for 9.1.
Perhaps, but be aware the current default benchmarked better
than a larger setting in bulk loads.
http://archives.postgresql.org/pgsql-hackers/2009-06/msg01382.php
The apparent reason is that when there were fewer of them the WAL
files were re-used before the RAID controller flushed them from BBU
cache, causing an overall reduction in disk writes.  I have little
doubt that *without* a good BBU cached controller a larger setting
is better, but it's not universally true that bigger is better on
this one.
-Kevin

Re: shared_buffers documentation

From

Robert Haas

Date:

19 April 2010, 11:26:39

On Mon, Apr 19, 2010 at 10:21 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Robert Haas <robertmhaas@gmail.com> wrote:
>
>> 2. Reading the section on checkpoint_segments reminds me, again,
>> that the current value seems extremely conservative on modern
>> hardware.  It's quite easy to hit this when doing large bulk data
>> loads or even a big ol' CTAS.  I think we should consider raising
>> this for 9.1.
>
> Perhaps, but be aware the current default benchmarked better
> than a larger setting in bulk loads.
>
> http://archives.postgresql.org/pgsql-hackers/2009-06/msg01382.php
>
> The apparent reason is that when there were fewer of them the WAL
> files were re-used before the RAID controller flushed them from BBU
> cache, causing an overall reduction in disk writes.  I have little
> doubt that *without* a good BBU cached controller a larger setting
> is better, but it's not universally true that bigger is better on
> this one.

I don't actually know what's best.  I'm just concerned that we have a
default in postgresql.conf and a tuning guide that says "don't do
that".  Maybe the tuning guide needs to be more nuanced, or maybe
postgresql.conf needs to be changed, but it makes no sense to have
them saying contradictory things.

...Robert

Re: shared_buffers documentation

From

Bruce Momjian

Date:

19 April 2010, 18:36:50

Robert Haas wrote:
> On Mon, Apr 19, 2010 at 10:21 AM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov> wrote:
> > Robert Haas <robertmhaas@gmail.com> wrote:
> >
> >> 2. Reading the section on checkpoint_segments reminds me, again,
> >> that the current value seems extremely conservative on modern
> >> hardware. ?It's quite easy to hit this when doing large bulk data
> >> loads or even a big ol' CTAS. ?I think we should consider raising
> >> this for 9.1.
> >
> > Perhaps, but be aware the current default benchmarked better
> > than a larger setting in bulk loads.
> >
> > http://archives.postgresql.org/pgsql-hackers/2009-06/msg01382.php
> >
> > The apparent reason is that when there were fewer of them the WAL
> > files were re-used before the RAID controller flushed them from BBU
> > cache, causing an overall reduction in disk writes. ?I have little
> > doubt that *without* a good BBU cached controller a larger setting
> > is better, but it's not universally true that bigger is better on
> > this one.
> 
> I don't actually know what's best.  I'm just concerned that we have a
> default in postgresql.conf and a tuning guide that says "don't do
> that".  Maybe the tuning guide needs to be more nuanced, or maybe
> postgresql.conf needs to be changed, but it makes no sense to have
> them saying contradictory things.

The good news about checkpoint_segments is that you get a log file
warning message if the value should be increased, i.e. you are
checkpointing often than 30 seconds.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

Re: shared_buffers documentation

From

Robert Haas

Date:

19 April 2010, 18:49:22

On Mon, Apr 19, 2010 at 5:36 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> I don't actually know what's best.  I'm just concerned that we have a
>> default in postgresql.conf and a tuning guide that says "don't do
>> that".  Maybe the tuning guide needs to be more nuanced, or maybe
>> postgresql.conf needs to be changed, but it makes no sense to have
>> them saying contradictory things.
>
> The good news about checkpoint_segments is that you get a log file
> warning message if the value should be increased, i.e. you are
> checkpointing often than 30 seconds.

Yeah.  I get that warning frequently when I'm creating test tables of
dummy data for PG devel purposes.  That's actually the main thing that
makes me think the default may be too low.

...Robert

Re: shared_buffers documentation

From

Bruce Momjian

Date:

19 April 2010, 19:06:58

Robert Haas wrote:
> On Mon, Apr 19, 2010 at 5:36 PM, Bruce Momjian <bruce@momjian.us> wrote:
> >> I don't actually know what's best. ?I'm just concerned that we have a
> >> default in postgresql.conf and a tuning guide that says "don't do
> >> that". ?Maybe the tuning guide needs to be more nuanced, or maybe
> >> postgresql.conf needs to be changed, but it makes no sense to have
> >> them saying contradictory things.
> >
> > The good news about checkpoint_segments is that you get a log file
> > warning message if the value should be increased, i.e. you are
> > checkpointing often than 30 seconds.
> 
> Yeah.  I get that warning frequently when I'm creating test tables of
> dummy data for PG devel purposes.  That's actually the main thing that
> makes me think the default may be too low.

Well, the point is that you are getting it for _unusual_ circumstances. 
Seems it is only when you are getting it for typical workloads that it
should be increased.  However, this is the first time I am hearing that
battery-backed cache favors the default value.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

Re: shared_buffers documentation

From

Robert Haas

Date:

19 April 2010, 19:15:56

On Mon, Apr 19, 2010 at 6:06 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Robert Haas wrote:
>> On Mon, Apr 19, 2010 at 5:36 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> >> I don't actually know what's best. ?I'm just concerned that we have a
>> >> default in postgresql.conf and a tuning guide that says "don't do
>> >> that". ?Maybe the tuning guide needs to be more nuanced, or maybe
>> >> postgresql.conf needs to be changed, but it makes no sense to have
>> >> them saying contradictory things.
>> >
>> > The good news about checkpoint_segments is that you get a log file
>> > warning message if the value should be increased, i.e. you are
>> > checkpointing often than 30 seconds.
>>
>> Yeah.  I get that warning frequently when I'm creating test tables of
>> dummy data for PG devel purposes.  That's actually the main thing that
>> makes me think the default may be too low.
>
> Well, the point is that you are getting it for _unusual_ circumstances.
> Seems it is only when you are getting it for typical workloads that it
> should be increased.

I guess.  I am not sure we should consider "doing a large CTAS" to be
an unusual workload, though.  Sure, most of us don't do that every
day, but what do we get out of having it be slow when we do decide to
do it?  Up until today, I had never heard anyone say that there was
any possible performance trade-off, and...

> However, this is the first time I am hearing that
> battery-backed cache favors the default value.

...if that's as bad as it gets, I'm still not sure we shouldn't
increase the default.  Most people will not have their first
experience of PG on a server with a battery-backed RAID controller,
I'm thinking.  And people who do have battery-backed RAID controllers
can tune the value down if need be.  I have never yet heard anyone
justify why all the values in postgresql.conf should be defined as
"the lowest value that works best for at least 1 user".

Then again, I don't really know what I'm talking about.  I think we
should be listening very carefully to people who have spent a lot of
time tuning this and taking their advice on how it should be set by
default.

...Robert

Re: shared_buffers documentation

From

"Kevin Grittner"

Date:

19 April 2010, 19:35:43

Robert Haas <robertmhaas@gmail.com> wrote:
>> However, this is the first time I am hearing that
>> battery-backed cache favors the default value.
Well, it was discussed on the lists during a CommitFest.
> ...if that's as bad as it gets, I'm still not sure we shouldn't
> increase the default.  Most people will not have their first
> experience of PG on a server with a battery-backed RAID
> controller, I'm thinking.  And people who do have battery-backed
> RAID controllers can tune the value down if need be.  I have never
> yet heard anyone justify why all the values in postgresql.conf
> should be defined as "the lowest value that works best for at
> least 1 user".
> 
> Then again, I don't really know what I'm talking about.  I think
> we should be listening very carefully to people who have spent a
> lot of time tuning this and taking their advice on how it should
> be set by default.
I'm not sure we shouldn't change the default either.  There seems to
be a wealth of experience showing where a bigger value can help, and
a fairly narrow use case where (much to my surprise) the lower value
helped.  Perhaps this just fits under the "be sure to test and tune
your own environment" heading, although it is a direction people
might not even think to try without some hint that it can help.
FWIW, we use very different configurations for bulk loading (like
pg_dump piped to psql) than we do for production usage afterward. 
This has become part of my bag of tricks for bulk loads, but we
still use a larger number after the load.
Also, I haven't heard of any independent confirmation -- it could be
a quirk of our hardware and configuration?  Has anyone else
benchmarked this to see the impact on bulk loads with BBU cache?
-Kevin

Re: shared_buffers documentation

From

Greg Smith

Date:

19 April 2010, 19:36:31

Kevin Grittner wrote:
> Perhaps, but be aware the current default benchmarked better
> than a larger setting in bulk loads.
>  
> http://archives.postgresql.org/pgsql-hackers/2009-06/msg01382.php
>  
> The apparent reason is that when there were fewer of them the WAL
> files were re-used before the RAID controller flushed them from BBU
> cache, causing an overall reduction in disk writes.  I have little
> doubt that *without* a good BBU cached controller a larger setting
> is better, but it's not universally true that bigger is better on
> this one

After running some tests, I believe what you observed is more universal 
than that, because I've been able to replicate a performance drop from a 
checkpoint_segments increase on a system without a BBWC (laptop with 
write caching turned off) where I really expected it to help.  My 
working theory is that are a broader set of situations where limiting 
the working set of WAL files to a small number in order to decrease 
cache disruption applies than just when you've got hardware caching 
involved.

However, I believe the guidelines to increasing this parameter along 
with shared_buffers still applies.  The real case for wins with more 
segments is when you also have a large buffer cache, because that's 
where the write savings from postponed database writes to often used 
blocks becomes easy to measure.  I've found it difficult today to 
demonstrate a slam-dunk bulk loading improvement through 
checkpoint_segments increase when shared_buffers is fixed at its default 
of ~32MB.  If that keeps up, I might soon have enough data to bust the 
idea that it alone improves bulk loading performance when you haven't 
touched anything else in the default config, which was unexpected to 
me.  Will report back once I've got a full handle on it.

Thanks for reminding me about this counter example, it slipped by in 
that broader thread before and I didn't try doing that myself until 
today, to see that you're onto something there.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: shared_buffers documentation

From

Bruce Momjian

Date:

19 April 2010, 22:23:30

Robert Haas wrote:
> > Well, the point is that you are getting it for _unusual_ circumstances.
> > Seems it is only when you are getting it for typical workloads that it
> > should be increased.
> 
> I guess.  I am not sure we should consider "doing a large CTAS" to be
> an unusual workload, though.  Sure, most of us don't do that every
> day, but what do we get out of having it be slow when we do decide to
> do it?  Up until today, I had never heard anyone say that there was
> any possible performance trade-off, and...
> 
> > However, this is the first time I am hearing that
> > battery-backed cache favors the default value.
> 
> ...if that's as bad as it gets, I'm still not sure we shouldn't
> increase the default.  Most people will not have their first
> experience of PG on a server with a battery-backed RAID controller,
> I'm thinking.  And people who do have battery-backed RAID controllers
> can tune the value down if need be.  I have never yet heard anyone
> justify why all the values in postgresql.conf should be defined as
> "the lowest value that works best for at least 1 user".
> 
> Then again, I don't really know what I'm talking about.  I think we
> should be listening very carefully to people who have spent a lot of
> time tuning this and taking their advice on how it should be set by
> default.

The current default was just chosen to reduce the PG disk footprint.  It
probably should be increased, unless we find that the smaller working
set is a win in many cases.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

Re: shared_buffers documentation

From

Robert Haas

Date:

19 April 2010, 22:28:27

On Mon, Apr 19, 2010 at 9:23 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Robert Haas wrote:
>> > Well, the point is that you are getting it for _unusual_ circumstances.
>> > Seems it is only when you are getting it for typical workloads that it
>> > should be increased.
>>
>> I guess.  I am not sure we should consider "doing a large CTAS" to be
>> an unusual workload, though.  Sure, most of us don't do that every
>> day, but what do we get out of having it be slow when we do decide to
>> do it?  Up until today, I had never heard anyone say that there was
>> any possible performance trade-off, and...
>>
>> > However, this is the first time I am hearing that
>> > battery-backed cache favors the default value.
>>
>> ...if that's as bad as it gets, I'm still not sure we shouldn't
>> increase the default.  Most people will not have their first
>> experience of PG on a server with a battery-backed RAID controller,
>> I'm thinking.  And people who do have battery-backed RAID controllers
>> can tune the value down if need be.  I have never yet heard anyone
>> justify why all the values in postgresql.conf should be defined as
>> "the lowest value that works best for at least 1 user".
>>
>> Then again, I don't really know what I'm talking about.  I think we
>> should be listening very carefully to people who have spent a lot of
>> time tuning this and taking their advice on how it should be set by
>> default.
>
> The current default was just chosen to reduce the PG disk footprint.  It
> probably should be increased, unless we find that the smaller working
> set is a win in many cases.

Yeah.  48MB is not much these days.

...Robert

Re: shared_buffers documentation

From

Jim Nasby

Date:

20 April 2010, 14:56:03

On Apr 16, 2010, at 4:56 PM, Robert Haas wrote:
> From reading this and other threads, I think I generally understand
> that the perils of setting shared_buffers too high: memory is needed
> for other things, like work_mem, a problem which is exacerbated by the
> fact that there is some double buffering going on.  Also, if the
> buffer cache gets too large, checkpoints can involve writing out
> enormous amounts of dirty data, which can be bad.

I've also seen large shared buffer settings perform poorly outside of IO issues, presumably due to some kind of
internallock contention. I tried running 8.3 with 24G for a while, but dropped it back down to our default of 8G after
noticingsome performance problems. Unfortunately I don't remember the exact details, let alone having a repeatable test
case.
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: shared_buffers documentation

From

Jaime Casanova

Date:

20 April 2010, 16:45:47

On Tue, Apr 20, 2010 at 12:07 PM, Jim Nasby <decibel@decibel.org> wrote:
> On Apr 16, 2010, at 4:56 PM, Robert Haas wrote:
>> From reading this and other threads, I think I generally understand
>> that the perils of setting shared_buffers too high: memory is needed
>> for other things, like work_mem, a problem which is exacerbated by the
>> fact that there is some double buffering going on.  Also, if the
>> buffer cache gets too large, checkpoints can involve writing out
>> enormous amounts of dirty data, which can be bad.
>
> I've also seen large shared buffer settings perform poorly outside of IO issues, presumably due to some kind of
internallock 
> contention. I tried running 8.3 with 24G for a while, but dropped it back down to our default of 8G after noticing
someperformance 
> problems. Unfortunately I don't remember the exact details, let alone having a repeatable test case.
>

i have heard this before, sadly enough i don't have a machine for that
kind of tests and can't use my customer's production servers for such
things :) so, i always set shared buffers lower than 8Gb even if i
have ram for more...

someone can confirm the lock contention theory? this should be
noticeable at checkpoint time right?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Re: shared_buffers documentation

From

Greg Smith

Date:

21 April 2010, 03:55:13

Jim Nasby wrote:
> I've also seen large shared buffer settings perform poorly outside of IO issues, presumably due to some kind of
internallock contention. I tried running 8.3 with 24G for a while, but dropped it back down to our default of 8G after
noticingsome performance problems. Unfortunately I don't remember the exact details, let alone having a repeatable test
case
We got a report for Jignesh at Sun once that he had a benchmark workload 
where there was a clear performance wall at around 10GB of 
shared_buffers.  At 
http://blogs.sun.com/jkshah/entry/postgresql_east_2008_talk_best he says: 

"Shared Bufferpool getting better in 8.2, worth to increase it to 3GB 
(for 32-bit PostgreSQL) but still
not great to increase it more than 10GB (for 64-bit PostgreSQL)"

So you running into the same wall around the same amount just fuels the 
existing idea there's an underlying scalablity issue in there.  Nobody 
with that right hardware has put it under the light of a profiler yet as 
far as I know.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: shared_buffers documentation

From

Robert Haas

Date:

23 April 2010, 00:16:00

On Wed, Apr 21, 2010 at 2:54 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> Jim Nasby wrote:
>>
>> I've also seen large shared buffer settings perform poorly outside of IO
>> issues, presumably due to some kind of internal lock contention. I tried
>> running 8.3 with 24G for a while, but dropped it back down to our default of
>> 8G after noticing some performance problems. Unfortunately I don't remember
>> the exact details, let alone having a repeatable test case
>
> We got a report for Jignesh at Sun once that he had a benchmark workload
> where there was a clear performance wall at around 10GB of shared_buffers.
>  At http://blogs.sun.com/jkshah/entry/postgresql_east_2008_talk_best he
> says:
> "Shared Bufferpool getting better in 8.2, worth to increase it to 3GB (for
> 32-bit PostgreSQL) but still
> not great to increase it more than 10GB (for 64-bit PostgreSQL)"
>
> So you running into the same wall around the same amount just fuels the
> existing idea there's an underlying scalablity issue in there.  Nobody with
> that right hardware has put it under the light of a profiler yet as far as I
> know.

It might be interesting to see whether increasing
NUM_BUFFER_PARTITIONS, LOG2_NUM_LOCK_PARTITIONS, and
NUM_LOCK_PARTITIONS alleviates this problem at all.

...Robert