Home > mailing lists

Re: Turning off HOT/Cleanup sometimes - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: Turning off HOT/Cleanup sometimes
Date	January 15, 2014 22:14:52
Msg-id	CA+U5nMLDUPsK2B737GWJGbEuZ9Bhkv_r_deBz04uvcY6FmhYMA@mail.gmail.com Whole thread Raw
In response to	Re: Turning off HOT/Cleanup sometimes (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Turning off HOT/Cleanup sometimes
List	pgsql-hackers

Tree view

On 15 January 2014 16:47, Robert Haas <robertmhaas@gmail.com> wrote:

>>> There would be a postgresql.conf parameter prune_cost_limit, as well
>>> as a table level parameter that would prevent pruning except via
>>> VACUUM.
>>>
>>> This will help in these ways
>>> * Reduce write I/O from SELECTs and pg_dump - improving backups and BI queries
>>> * Allow finer grained control over Hot Standby conflicts
>>> * Potentially allow diagnostic inspection of older data via SeqScan

>> Patch attached, implemented to reduce writes by SELECTs only.
>
> I am still not sure whether we want this, but I think it's definitely
> an improvement over the previous version.  Assorted comments:
>
> - Naming consistency seems to me to dictate that there should be more
> similarity between the reloption name (allow_buffer_cleanup) and the
> GUC (prune_page_dirty_limit).

Now that I've written the patch, I'm seeing those as two different
things, but YMMV and I am very open to naming suggestions.

> - The documentation doesn't describe the use case where suppressing
> cleanup on a per-table basis would be desirable, and I can't think of
> one, either.

We already know that HOT is ineffective in areas of high contention
(previous thread by me). Prior experience was that smaller tables
didn't show much apparent benefit from using HOT either; its
effectiveness was limited to medium and large tables being updated.
The two already stated use cases that would apply are these ones
   * Allow finer grained control over Hot Standby conflicts   * Potentially allow diagnostic inspection of older data
viaSeqScan

So the use cases for the two parameters seem quite different and we
may decide we want one but not the other.

> - There are a variety of ways to limit pruning; here, you've chosen to
> limit it to a particular number of pruning operations per executor
> invocation.  But the flag is global, not part of the executor state,
> so a query that calls a PL/pgsql function during execution will reset
> the counter for the parent query also, which doesn't seem very
> principled.

That is subtle thing in this patch and I agree that potential problem
exists. The current limit is set according to the current executing
statement, but the current total is not reset until start of the top
level statement. So the behaviour is not reset during statements
executed within PL/pgSQL function.

> In a patch I posted a few years ago to set hint bits only sometimes, I
> settled on an algorithm where I dirtied the first 50 pages per scan
> and then skipped the next 950, or something like that.  The idea was
> that you wanted the pages that did get dirtied to be clustered
> together to avoid random I/O; and also that you wanted table of
> arbitrary size to get hinted within a certain number of scans (e.g.
> 20).  The limiting here is much more aggressive, so on large tables it
> will amount to basically no pruning at all.  I dunno whether that's a
> good idea or not.  But if the idea of making this an integer rather
> than a boolean is to allow some pruning to still happen while keeping
> it checked within reasonable bounds, I'm not sure it will succeed.

It sounds like you're in favour of the overall concept of limiting
writes, which is good.

The behaviour I think we need, based on listening to everybody so far is

* OLTP is unaffected
* Large SELECTs and pg_dump don't cause lots of write I/O.

and hence why "prune_page_dirty_limit" offers a change in behaviour at
a certain point.

Reducing cleanup to "only 5%" just reduces but doesn't remove the
problem. If the data is stored on very poor I/O infrastructure, any
significant volume of writes can adversely affect performance. As we
reduce the percentage, we also reduce the benefit from inducing writes
in the first place and so I would question why bother at all using a
percentage. For me, a parameter that gives you absolute rather than
relative control is more desirable.

The current behaviour assumes it is OK for the first/next user to
touch the data to be the one that won't mind re-writing everything. In
time critical applications, the first/next user could well have a very
urgent need to access the data quickly and doesn't want to have to pay
this price. In seldom-accessed data applications, VACUUM has lots of
time to run out of hours, so users are OK to defer this work. Some
applications exist where we literally want zero I/O.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Peter Eisentraut
Date: 15 January 2014, 21:56:11
Subject: Re: Why conf.d should be default, and auto.conf and recovery.conf should be in it

From: Jeff Janes
Date: 15 January 2014, 22:29:47
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance

Re: Turning off HOT/Cleanup sometimes - Mailing list pgsql-hackers

Previous

Next