Re: Turning off HOT/Cleanup sometimes - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Turning off HOT/Cleanup sometimes |
Date | |
Msg-id | CA+U5nMLDUPsK2B737GWJGbEuZ9Bhkv_r_deBz04uvcY6FmhYMA@mail.gmail.com Whole thread Raw |
In response to | Re: Turning off HOT/Cleanup sometimes (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Turning off HOT/Cleanup sometimes
|
List | pgsql-hackers |
On 15 January 2014 16:47, Robert Haas <robertmhaas@gmail.com> wrote: >>> There would be a postgresql.conf parameter prune_cost_limit, as well >>> as a table level parameter that would prevent pruning except via >>> VACUUM. >>> >>> This will help in these ways >>> * Reduce write I/O from SELECTs and pg_dump - improving backups and BI queries >>> * Allow finer grained control over Hot Standby conflicts >>> * Potentially allow diagnostic inspection of older data via SeqScan >> Patch attached, implemented to reduce writes by SELECTs only. > > I am still not sure whether we want this, but I think it's definitely > an improvement over the previous version. Assorted comments: > > - Naming consistency seems to me to dictate that there should be more > similarity between the reloption name (allow_buffer_cleanup) and the > GUC (prune_page_dirty_limit). Now that I've written the patch, I'm seeing those as two different things, but YMMV and I am very open to naming suggestions. > - The documentation doesn't describe the use case where suppressing > cleanup on a per-table basis would be desirable, and I can't think of > one, either. We already know that HOT is ineffective in areas of high contention (previous thread by me). Prior experience was that smaller tables didn't show much apparent benefit from using HOT either; its effectiveness was limited to medium and large tables being updated. The two already stated use cases that would apply are these ones * Allow finer grained control over Hot Standby conflicts * Potentially allow diagnostic inspection of older data viaSeqScan So the use cases for the two parameters seem quite different and we may decide we want one but not the other. > - There are a variety of ways to limit pruning; here, you've chosen to > limit it to a particular number of pruning operations per executor > invocation. But the flag is global, not part of the executor state, > so a query that calls a PL/pgsql function during execution will reset > the counter for the parent query also, which doesn't seem very > principled. That is subtle thing in this patch and I agree that potential problem exists. The current limit is set according to the current executing statement, but the current total is not reset until start of the top level statement. So the behaviour is not reset during statements executed within PL/pgSQL function. > In a patch I posted a few years ago to set hint bits only sometimes, I > settled on an algorithm where I dirtied the first 50 pages per scan > and then skipped the next 950, or something like that. The idea was > that you wanted the pages that did get dirtied to be clustered > together to avoid random I/O; and also that you wanted table of > arbitrary size to get hinted within a certain number of scans (e.g. > 20). The limiting here is much more aggressive, so on large tables it > will amount to basically no pruning at all. I dunno whether that's a > good idea or not. But if the idea of making this an integer rather > than a boolean is to allow some pruning to still happen while keeping > it checked within reasonable bounds, I'm not sure it will succeed. It sounds like you're in favour of the overall concept of limiting writes, which is good. The behaviour I think we need, based on listening to everybody so far is * OLTP is unaffected * Large SELECTs and pg_dump don't cause lots of write I/O. and hence why "prune_page_dirty_limit" offers a change in behaviour at a certain point. Reducing cleanup to "only 5%" just reduces but doesn't remove the problem. If the data is stored on very poor I/O infrastructure, any significant volume of writes can adversely affect performance. As we reduce the percentage, we also reduce the benefit from inducing writes in the first place and so I would question why bother at all using a percentage. For me, a parameter that gives you absolute rather than relative control is more desirable. The current behaviour assumes it is OK for the first/next user to touch the data to be the one that won't mind re-writing everything. In time critical applications, the first/next user could well have a very urgent need to access the data quickly and doesn't want to have to pay this price. In seldom-accessed data applications, VACUUM has lots of time to run out of hours, so users are OK to defer this work. Some applications exist where we literally want zero I/O. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: