Thread: Set hint bits upon eviction from BufMgr

Set hint bits upon eviction from BufMgr

From

Merlin Moncure

Date:

25 March 2011, 12:25:56

Maybe I'm being overly simplistic or incorrect here, but I was
thinking that there might be a route to reducing hint bit impact to
the main sufferers of the feature without adding too much pain in the
general case.  I'm unfortunately convinced there is no getting rid of
them -- in fact their utility will become even more apparent with
faster storage and the pendulum of optimization swings back to the cpu
side.

My idea is to reserve a bit in the page header, say PD_ALL_SAME_XMIN
that indicates all the tuples are from the same transaction and set it
when the first insertion tuple hits the page and unset it when any
tuple is added from another xmin/touched/deleted.  The point here is
to set up a cheap check at the page level that we can make when a page
is getting evicted from the bufmgr.  If the bit is set, we grab off
the xmin of the first tuple on the page and test it for visibility
(assuming the hint bit is not already set). If we get a thumbs up on
the transaction, we can look the page and set all tuple hints as
during the page evict/sync process.  We don't worry about
logging/crash safety on the 'all same' hint because it's only
interesting to this bufmgr check (it can even be cleared when page is
loaded).

Without this bit, the only way to set hint bits going during bufmgr
eviction is to do a visibility check on every tuple, which would
probably be prohibitively expensive.  Since OLTP environments would
rarely see this bit, they would not have to pay for the check.

Also, we can maybe tweak the bufmgr to prefer not to evict pages with
this bit set if it's known they are not yet written out to primary
storage.  Maybe this impossible or not logical...just thinking out
loud.  Anyways, if this actually works, shared buffers can start to
play a role of mitigating hint bit i/o as long as the transaction
resolves before pages start jumping out into storage.  If you couple
this with a facility to do bulk loads that break up transactions on
regular intervals, you have a good shot at getting all your hint bits
written out properly in large load situation.

You might be able to do similar tricks with deletes -- I haven't
thought about that.  Also there might be some interplay with vacuum or
some other deal breaker -- curious to see if I have something worth
further thought  here.

merlin

Re: Set hint bits upon eviction from BufMgr

From

Jim Nasby

Date:

25 March 2011, 12:35:09

On Mar 25, 2011, at 9:52 AM, Merlin Moncure wrote:
> Without this bit, the only way to set hint bits going during bufmgr
> eviction is to do a visibility check on every tuple, which would
> probably be prohibitively expensive.  Since OLTP environments would
> rarely see this bit, they would not have to pay for the check.

IIRC one of the biggest costs is accessing the CLOG, but what if the bufmgr.c/bgwriter didn't use the same CLOG lookup
mechanismas backends did? Unlike when a backend is inspecting visibility, it's not necessary for something like
bgwriterto know exact visibility as long as it doesn't mark something as visible when it shouldn't. If it uses a
differentCLOG caching/accessing method that lags behind the real CLOG then the worst-case scenario is that there's a
delayon setting hint bits. But getting grwiter to dothis would likely still be a huge win over forcing backends to
worryabout it. It's also possible that the visibility check itself could be simplified. 

BTW, I don't think you want to play these games when a backend is evicting a page because you'll be slowing a real
backenddown. 
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: Set hint bits upon eviction from BufMgr

From

Merlin Moncure

Date:

25 March 2011, 13:40:18

On Fri, Mar 25, 2011 at 10:34 AM, Jim Nasby <jim@nasby.net> wrote:
> On Mar 25, 2011, at 9:52 AM, Merlin Moncure wrote:
>> Without this bit, the only way to set hint bits going during bufmgr
>> eviction is to do a visibility check on every tuple, which would
>> probably be prohibitively expensive.  Since OLTP environments would
>> rarely see this bit, they would not have to pay for the check.
>
> IIRC one of the biggest costs is accessing the CLOG, but what if the bufmgr.c/bgwriter didn't use the same CLOG
lookupmechanism as backends did? Unlike when a backend is inspecting visibility, it's not necessary for something like
bgwriterto know exact visibility as long as it doesn't mark something as visible when it shouldn't. If it uses a
differentCLOG caching/accessing method that lags behind the real CLOG then the worst-case scenario is that there's a
delayon setting hint bits. But getting grwiter to dothis would likely still be a huge win over forcing backends to
worryabout it. It's also possible that the visibility check itself could be simplified. 
>
> BTW, I don't think you want to play these games when a backend is evicting a page because you'll be slowing a real
backenddown. 

Well, I'm not so sure -- as noted above, you only pay for the check
above when all the records in a page are new, and only once per page,
not once per tuple.  Basically, only when you are bulk jamming records
through the buffers.  The amoritized cost of the clog lookup is going
to be near zero (maybe you could put a fuse in that would get tripped
if there weren't enough tuples in the page to justify the check).

If you are bulk loading more data than you have shared buffers, then
you get zero benefit.  However, you might having the makings of a
strategy of dealing with hint bit i/o in user land.  (by breaking up
transactions, tweaking shared buffers, etc).

merlin

Re: Set hint bits upon eviction from BufMgr

From

Heikki Linnakangas

Date:

25 March 2011, 16:32:29

On 25.03.2011 16:52, Merlin Moncure wrote:
> Without this bit, the only way to set hint bits going during bufmgr
> eviction is to do a visibility check on every tuple, which would
> probably be prohibitively expensive.

I don't think the naive approach of scanning all tuples would be too 
bad, actually. The hint bits only need to be set once, and it'd be 
bgwriter shouldering the overhead.

The problem with setting hing bits when a buffer is evicted is that it 
doesn't help with the bulk load case. The hint bits can't be set for a 
bulk load until the load is finished and the transaction commits.

Maybe it would still be worthwhile to have bgwriter set hint bits, to 
reduce I/O caused by hint bit updates in an OLTP workload, but that's 
not what people usually complain about.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Set hint bits upon eviction from BufMgr

From

Merlin Moncure

Date:

25 March 2011, 16:44:49

On Fri, Mar 25, 2011 at 2:32 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 25.03.2011 16:52, Merlin Moncure wrote:
>>
>> Without this bit, the only way to set hint bits going during bufmgr
>> eviction is to do a visibility check on every tuple, which would
>> probably be prohibitively expensive.
>
> I don't think the naive approach of scanning all tuples would be too bad,
> actually. The hint bits only need to be set once, and it'd be bgwriter
> shouldering the overhead.
>
> The problem with setting hing bits when a buffer is evicted is that it
> doesn't help with the bulk load case. The hint bits can't be set for a bulk
> load until the load is finished and the transaction commits.

Not the true bulk load case.  However, if you can break up a load into
multiple transactions and sneak out 10-100mb of pages into the buffer
per transaction, you have a good chance of getting most/all the bits
out correct before bgwriter eats them up.  I was thinking to also
teach bgwriter to keep xmin flagged pages in a separate lower priority
pool so that it didn't race to them before the transaction had a
chance to go in.

Long term, I'm imagining more direct transaction control in the
backend, either via autonomous transactions, or stored procedures with
explicit transaction control, so we don't have to load N gigabytes in
a single transaction.

> Maybe it would still be worthwhile to have bgwriter set hint bits, to reduce
> I/O caused by hint bit updates in an OLTP workload, but that's not what
> people usually complain about.

well, if bgwriter does it, you lose the ability to bail the clog check
via TransactionIdIsCurrentTransactionId, right? If it's done in the
bufmgr you at least have a chance to not have to go all the way out.
Either way though, you at least have to teach bgwriter to be more
cooperative.

merlin

Re: Set hint bits upon eviction from BufMgr

From

Robert Haas

Date:

25 March 2011, 17:18:10

On Fri, Mar 25, 2011 at 3:32 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 25.03.2011 16:52, Merlin Moncure wrote:
>>
>> Without this bit, the only way to set hint bits going during bufmgr
>> eviction is to do a visibility check on every tuple, which would
>> probably be prohibitively expensive.
>
> I don't think the naive approach of scanning all tuples would be too bad,
> actually. The hint bits only need to be set once, and it'd be bgwriter
> shouldering the overhead.

I was thinking the same thing.  The only thing I'm worried about is
whether it'd make the bgwriter less responsive; we already have some
issues in that department.

> The problem with setting hing bits when a buffer is evicted is that it
> doesn't help with the bulk load case. The hint bits can't be set for a bulk
> load until the load is finished and the transaction commits.
>
> Maybe it would still be worthwhile to have bgwriter set hint bits, to reduce
> I/O caused by hint bit updates in an OLTP workload, but that's not what
> people usually complain about.

Yeah.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Set hint bits upon eviction from BufMgr

From

Merlin Moncure

Date:

28 March 2011, 10:48:46

On Fri, Mar 25, 2011 at 3:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Mar 25, 2011 at 3:32 PM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> On 25.03.2011 16:52, Merlin Moncure wrote:
>>>
>>> Without this bit, the only way to set hint bits going during bufmgr
>>> eviction is to do a visibility check on every tuple, which would
>>> probably be prohibitively expensive.
>>
>> I don't think the naive approach of scanning all tuples would be too bad,
>> actually. The hint bits only need to be set once, and it'd be bgwriter
>> shouldering the overhead.
>
> I was thinking the same thing.  The only thing I'm worried about is
> whether it'd make the bgwriter less responsive; we already have some
> issues in that department.

I'd like to experiment on this and see what comes out.  If the
bgwriter was to be granted the ability to inspect buffers and set
hints, it needs to be able to peek in and inspect the buffer itself
which it currently doesn't do FWICT.  I was thinking about setting a
flag in the buffer (BM_HEAP) that gets set by the loader which flags
the buffer for later inspection.  Is there a simpler way to do this?

It may turn out to be a dud, but I'd still like to play with the all
visible bit and see how that interacts with data loading, both with
and without special bgwriter logic (i'm going to kludge in a crude
mechanism to try to prefer non all visible pages).  The reason why I
like it is the optimization is narrow and the risk of downside is low,
although it's up a notch on the complexity level.  If you do end up
retooling the bgwriter to set hint bits broadly, there are some tricks
you can do to reduce the number of useless clog checks you do (that
is, you fault through to an in progress transaction).  They involve
changing the way the scan works, maybe even organizing buffers into
multiple priority pools, so it's complicated and has to be done very
carefully.

I think you guys are correct: the logic belongs in the bgwriter.
Generally speaking, it looks like the best route to minimizing hint
bit pain is to if at all possible write them out set so they don't
have to be rewritten later (Stephen's approach to leverage in
transaction table creation is another way of attempting to do that).

merlin

Re: Set hint bits upon eviction from BufMgr

From

Robert Haas

Date:

28 March 2011, 11:10:05

On Mon, Mar 28, 2011 at 9:48 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> I'd like to experiment on this and see what comes out.

Great!

> If the
> bgwriter was to be granted the ability to inspect buffers and set
> hints, it needs to be able to peek in and inspect the buffer itself
> which it currently doesn't do FWICT.

That matches my understanding.

> I was thinking about setting a
> flag in the buffer (BM_HEAP) that gets set by the loader which flags
> the buffer for later inspection.  Is there a simpler way to do this?

Hmm.  That's slightly crufty, but it might be OK.  At least, I don't
have a better idea.

> I think you guys are correct: the logic belongs in the bgwriter.
> Generally speaking, it looks like the best route to minimizing hint
> bit pain is to if at all possible write them out set so they don't
> have to be rewritten later (Stephen's approach to leverage in
> transaction table creation is another way of attempting to do that).

Yeah.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Set hint bits upon eviction from BufMgr

From

Tom Lane

Date:

28 March 2011, 11:19:35

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Mar 28, 2011 at 9:48 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> I was thinking about setting a
>> flag in the buffer (BM_HEAP) that gets set by the loader which flags
>> the buffer for later inspection. �Is there a simpler way to do this?

> Hmm.  That's slightly crufty, but it might be OK.  At least, I don't
> have a better idea.

The major problem with all of this is that the bgwriter has no idea
which buffers contain heap pages.  And I'm not convinced it's a good
idea to try to let it know that.  If we get to the point where bgwriter
is trying to do catalog accesses, we are in for a world of pain.
(Can you say "modularity violation"?  How about "deadlock"?)
        regards, tom lane

Re: Set hint bits upon eviction from BufMgr

From

"Kevin Grittner"

Date:

28 March 2011, 11:29:57

Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The major problem with all of this is that the bgwriter has no
> idea which buffers contain heap pages.  And I'm not convinced it's
> a good idea to try to let it know that.  If we get to the point
> where bgwriter is trying to do catalog accesses, we are in for a
> world of pain. (Can you say "modularity violation"?  How about
> "deadlock"?)
How about having a BackgroundPrepareForWriteFunction variable
associated with each page the bgwriter might see, which would be a
pointer to a function to call (if the variable is not NULL) before
writing?  The bgwriter would still have no idea what kind of page it
was or what the function did....
-Kevin

Re: Set hint bits upon eviction from BufMgr

From

Robert Haas

Date:

28 March 2011, 11:44:06

On Mon, Mar 28, 2011 at 10:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Mar 28, 2011 at 9:48 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>> I was thinking about setting a
>>> flag in the buffer (BM_HEAP) that gets set by the loader which flags
>>> the buffer for later inspection.  Is there a simpler way to do this?
>
>> Hmm.  That's slightly crufty, but it might be OK.  At least, I don't
>> have a better idea.
>
> The major problem with all of this is that the bgwriter has no idea
> which buffers contain heap pages.  And I'm not convinced it's a good
> idea to try to let it know that.  If we get to the point where bgwriter
> is trying to do catalog accesses, we are in for a world of pain.
> (Can you say "modularity violation"?  How about "deadlock"?)

Well, that's why Merlin was suggesting having the backends that read
the buffers in flag the heap pages as BM_HEAP.  Then the background
writer can just examine that bit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Set hint bits upon eviction from BufMgr

From

Merlin Moncure

Date:

28 March 2011, 11:49:10

On Mon, Mar 28, 2011 at 9:29 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> The major problem with all of this is that the bgwriter has no
>> idea which buffers contain heap pages.  And I'm not convinced it's
>> a good idea to try to let it know that.  If we get to the point
>> where bgwriter is trying to do catalog accesses, we are in for a
>> world of pain. (Can you say "modularity violation"?  How about
>> "deadlock"?)
>
> How about having a BackgroundPrepareForWriteFunction variable
> associated with each page the bgwriter might see, which would be a
> pointer to a function to call (if the variable is not NULL) before
> writing?  The bgwriter would still have no idea what kind of page it
> was or what the function did....

Well, that is much cleaner from abstraction point of view but you lose
the ability to adjust scan priority before flushing out the page...I'm
assuming by the time this function is called, you've already made the
decision to write it out.  (maybe priority is necessary and maybe it
isn't, but I don't like losing the ability to tune at that level).

You could though put a priority inspection facility behind a similar
abstraction fence (BackgroundGetWritePriority) though.  Maybe that's
more trouble than it's worth though.

merlin

Re: Set hint bits upon eviction from BufMgr

From

Jim Nasby

Date:

05 April 2011, 12:24:14

On Mar 28, 2011, at 9:48 AM, Merlin Moncure wrote:
> On Mon, Mar 28, 2011 at 9:29 AM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov> wrote:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>>> The major problem with all of this is that the bgwriter has no
>>> idea which buffers contain heap pages.  And I'm not convinced it's
>>> a good idea to try to let it know that.  If we get to the point
>>> where bgwriter is trying to do catalog accesses, we are in for a
>>> world of pain. (Can you say "modularity violation"?  How about
>>> "deadlock"?)
>>
>> How about having a BackgroundPrepareForWriteFunction variable
>> associated with each page the bgwriter might see, which would be a
>> pointer to a function to call (if the variable is not NULL) before
>> writing?  The bgwriter would still have no idea what kind of page it
>> was or what the function did....
>
> Well, that is much cleaner from abstraction point of view but you lose
> the ability to adjust scan priority before flushing out the page...I'm
> assuming by the time this function is called, you've already made the
> decision to write it out.  (maybe priority is necessary and maybe it
> isn't, but I don't like losing the ability to tune at that level).
>
> You could though put a priority inspection facility behind a similar
> abstraction fence (BackgroundGetWritePriority) though.  Maybe that's
> more trouble than it's worth though.

Merlin, does your new work on CLOG caching negate anything in this thread? I think there's some ideas here worth
furtherinvestigation and want to make sure they don't get lost. 
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: Set hint bits upon eviction from BufMgr

From

Merlin Moncure

Date:

05 April 2011, 12:59:23

On Tue, Apr 5, 2011 at 9:49 AM, Jim Nasby <jim@nasby.net> wrote:
> On Mar 28, 2011, at 9:48 AM, Merlin Moncure wrote:
>> On Mon, Mar 28, 2011 at 9:29 AM, Kevin Grittner
>> <Kevin.Grittner@wicourts.gov> wrote:
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>
>>>> The major problem with all of this is that the bgwriter has no
>>>> idea which buffers contain heap pages.  And I'm not convinced it's
>>>> a good idea to try to let it know that.  If we get to the point
>>>> where bgwriter is trying to do catalog accesses, we are in for a
>>>> world of pain. (Can you say "modularity violation"?  How about
>>>> "deadlock"?)
>>>
>>> How about having a BackgroundPrepareForWriteFunction variable
>>> associated with each page the bgwriter might see, which would be a
>>> pointer to a function to call (if the variable is not NULL) before
>>> writing?  The bgwriter would still have no idea what kind of page it
>>> was or what the function did....
>>
>> Well, that is much cleaner from abstraction point of view but you lose
>> the ability to adjust scan priority before flushing out the page...I'm
>> assuming by the time this function is called, you've already made the
>> decision to write it out.  (maybe priority is necessary and maybe it
>> isn't, but I don't like losing the ability to tune at that level).
>>
>> You could though put a priority inspection facility behind a similar
>> abstraction fence (BackgroundGetWritePriority) though.  Maybe that's
>> more trouble than it's worth though.
>
> Merlin, does your new work on CLOG caching negate anything in this thread? I think there's some ideas here worth
furtherinvestigation and want to make sure they don't get lost. 

No, they don't -- and I plan to work on this independently.

The performance tradeoffs here are much more complicated and will
require extensive benchmarking to analyze.  A process local clog
cache, if it can be made to work (and that's be no means certain) is
going to affect how this is put together.  In particular, i'd be even
more disinclined to adjust scan priorty or do anything fancy like that
-- and more amenable to checking every tuple.   I'm particularly
interested in setting the PD_ALL_VISIBLE bit at eviction time if it's
available to be set and the page is already dirty.

merlin