Home > mailing lists

Re: limiting hint bit I/O - Mailing list pgsql-hackers

From	Cédric Villemain
Subject	Re: limiting hint bit I/O
Date	February 7, 2011 15:03:25
Msg-id	AANLkTi==bJdFbNr0GgHXX-paT1f4_SudX4s18OPuW86q@mail.gmail.com Whole thread Raw
In response to	Re: limiting hint bit I/O (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: limiting hint bit I/O
List	pgsql-hackers

Tree view

2011/2/7 Robert Haas <robertmhaas@gmail.com>:
> On Mon, Feb 7, 2011 at 10:48 AM, Bruce Momjian <bruce@momjian.us> wrote:
>> Robert Haas wrote:
>>> On Sat, Feb 5, 2011 at 4:31 PM, Bruce Momjian <bruce@momjian.us> wrote:
>>> > Uh, in this C comment:
>>> >
>>> > + ? ? ? ?* or not we want to take the time to write it. ?We allow up to 5% of
>>> > + ? ? ? ?* otherwise-not-dirty pages to be written due to hint bit changes,
>>> >
>>> > 5% of what? ?5% of all buffers? ?5% of all hint-bit-dirty ones? ?Can you
>>> > clarify this in the patch?
>>>
>>> 5% of buffers that are hint-bit-dirty but not otherwise dirty.  ISTM
>>> that's exactly what the comment you just quoted says on its face, but
>>> I'm open to some other wording you want to propose.
>>
>> How about:
>>
>>        otherwise-not-dirty -> only-hint-bit-dirty
>>
>> So 95% of your hint bit modificates are discarded if the pages is not
>> otherwise dirtied?  That seems pretty radical.
>
> No, it's more subtle than that, although I admit it *is* radical.
> There are three ways that pages can get written out to disk:
>
> 1. Checkpoints.
> 2. Background writer activity.
> 3. Backends writing out dirty buffers because there are no clean
> buffers available to allocate.
>
> What the latest version of the patch implements is:
>
> 1. Checkpoints no longer write only-hint-bit-dirty pages to disk.
> Since a checkpoint doesn't evict pages from memory, the hint bits are
> still there to be written out (or not) by (2) or (3), below.
>
> 2. When the background writer's cleaning scan hits an
> only-hint-bit-dirty page, it writes it, same as before.  This
> definitely doesn't result in the loss of any hint bits.
>
> 3. When a backend writes out a dirty buffer itself, because there are
> no clean buffers available to allocate, it initially writes them.  But
> if there are more than 100 such pages per block of 2000 allocations,
> it recycles any after the first 100 without writing them.
>
> In normal operation, I suspect that there will be very little impact
> from this change.  The change described in #1 may slightly reduce the
> size of some checkpoints, but it's unclear that it will be enough to
> be material.  The change described in #3 will probably also not
> matter, because, in a well-tuned system, the background writer should
> be set aggressively enough to provide a supply of clean pages, and
> therefore backends shouldn't be doing many writes themselves, and
> therefore most buffer allocations will be of already-clean pages, and
> the logic described in #3 will probably never kick in.  Even if they
> are writing a lot of buffers themselves, the logic in #3 still won't
> kick in if many of the pages being written are actually dirty - it
> will only matter if the backends are writing out lots and lots of
> pages *solely because they are only-hint-bit-dirty*.
>
> Where I expect this to make a big difference is on sequential scans of
> just-loaded tables.  In that case, the BufferAccessStrategy machinery
> will force the backend to reuse the same buffers over and over again,
> and all of those pages will be only-hint-bit-dirty.  So the backend
> has to do a write for every page it allocates, and even though those
> writes are being absorbed by the OS cache, it's still slow.  With this
> patch, what will happen is that the backend will write about 100
> pages, then perform the next 1900 allocations without writing, then
> write another 100 pages, etc.  So at the end of the scan, instead of
> having written an amount of data equal to the size of the table, we
> will have written 5% of that amount, and 5% of the hint bits will be
> on disk.  Each subsequent scan will get another 5% of the hint bits on
> disk until after 20 scans they are all set.  So the work of setting
> the hint bits is spread out across the first 20 table scans instead of
> all being done the first time through.
>
> Clearly, there's further jiggering that can be done here.  But the
> overall goal is simply that some of our users don't seem to like it
> when the first scan of a newly loaded table generates a huge storm of
> *write* traffic.  Given that the hint bits appear to be quite
> important from a performance perspective (see benchmark numbers
> upthread),

those are not real benchmarks, just quick guess to check behavior.
(and I agree it looks good, but I also got inconsistent results, the
patched postgresql hardly reach the same speed of the original
9.1devel even after 200 hundreds select of your testcase)


> we don't really have the option of just not writing them -
> but we can try to not to do it all at once, if we think that's an
> improvement, which I think is likely.
>
> Overall, I'm inclined to move this patch to the next CommitFest and
> forget about it for now.  I don't think we're going to get enough
> testing of this in the next week to be really confident that it's
> right.  I might be willing to commit with some more moderate amount of
> testing if we were right at the beginning of a development cycle,
> figuring that we'd shake out any warts as the cycle went along, but
> this isn't seeming like the right time for this kind of a change.

I agree.
I think it might be better to do the hint_bit_allowance decrement when
we write something (dirty or dirtyhint).
And so we can have something like :

100% writte :  write dirty + hint
5 % write : write 5 % of (dirty + hint) (instead of write 5% of the hint only).

So come a simple Bandwith/IOrequest limiter.
Open for next commitfest :)


--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

pgsql-hackers by date:

From: Dimitri Fontaine
Date: 07 February 2011, 14:58:45
Subject: Re: More extension issues: ownership and search_path

From: Cédric Villemain
Date: 07 February 2011, 15:04:59
Subject: Re: limiting hint bit I/O

Re: limiting hint bit I/O - Mailing list pgsql-hackers

Previous

Next