Thread: Concurrent HOT Update interference
Currently, when we access a buffer for a HOT update we check to see if its possible to get a cleanup lock so we can clean the buffer. Currently, UPDATEs and DELETEs pin buffers during the scan phase and then re-lock the buffer to update. So what we have is that multiple UPDATEs repeatedly accessing the same block will prevent each other from successful cleanup, since while one session is performing the update, the second session is pinning the block with an indexscan. This effect has been noted for some time during pgbench runs, where running with more sessions than scale factors causes contention. We've never done anything about it because that's been seen as a poorly executed test, whereas it does actually match the real situation we experience at "hot spots" in the table. Holding the buffer pin across both scan and update saves effort for a single session, but it also causes bloat in the concurrent case. Or put another way, HOT is not effective at "hot spots" in a table! I thought I'd raise the problem first before attempting to propose a solution. (And also: why is index_fetch_heap() in indexam.c, yet bitgetpage() in executor/nodeBitmapHeapscan.c?) Comments? --Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, May 10, 2013 at 11:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > This effect has been noted for some time during pgbench runs, where > running with more sessions than scale factors causes contention. We've > never done anything about it because that's been seen as a poorly > executed test, whereas it does actually match the real situation we > experience at "hot spots" in the table. Without prejudice to the rest of the argument, just a point of history here. Running pgbench with more clients than the scale factor has been considered a test of contention and not i/o scaling for a lot longer than we've had HOT. As far back as I can remember the recommendation was to run pgbench with fewer sessions than the scale factor. At times some people (Robert and Greg I think?) have used the reverse specifically to test contention but that's something programmers are concerned about, not users. pgbench isn't great at testing "hot spots" because it uses uniform random numbers. We've talked about having an option to do a 90/10 distribution to better emulate real usage patterns but I don't think anyone's done it. In any case I don't think now is a great time to be bringing up new ideas like this. Once 9.3 is out the door it'll be a better time for this kind of out of the box brainstorming. -- greg
Simon Riggs <simon@2ndQuadrant.com> writes: > So what we have is that multiple UPDATEs repeatedly accessing the same > block will prevent each other from successful cleanup, since while one > session is performing the update, the second session is pinning the > block with an indexscan. > This effect has been noted for some time during pgbench runs, where > running with more sessions than scale factors causes contention. We've > never done anything about it because that's been seen as a poorly > executed test, whereas it does actually match the real situation we > experience at "hot spots" in the table. Uh, no. pgbench's problem at high scale factors is that multiple sessions want to update *the same row*, not just different rows on the same page. That contention is unavoidable. You may in fact have a good point, but you can't prove it by reference to pgbench. regards, tom lane
On Fri, May 10, 2013 at 5:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Currently, when we access a buffer for a HOT update we check to see if > its possible to get a cleanup lock so we can clean the buffer. > > Currently, UPDATEs and DELETEs pin buffers during the scan phase and > then re-lock the buffer to update. > > So what we have is that multiple UPDATEs repeatedly accessing the same > block will prevent each other from successful cleanup, since while one > session is performing the update, the second session is pinning the > block with an indexscan. wait -- you can't acquire a cleanup lock if the buffer is pinned by at least one other session? yeah -- that would defeat HOT for many important cases. this should be pretty easy to demonstrate in simulated testing. merlin
On 2013-05-10 08:28:24 -0500, Merlin Moncure wrote: > On Fri, May 10, 2013 at 5:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > Currently, when we access a buffer for a HOT update we check to see if > > its possible to get a cleanup lock so we can clean the buffer. > > > > Currently, UPDATEs and DELETEs pin buffers during the scan phase and > > then re-lock the buffer to update. > > > > So what we have is that multiple UPDATEs repeatedly accessing the same > > block will prevent each other from successful cleanup, since while one > > session is performing the update, the second session is pinning the > > block with an indexscan. > > wait -- you can't acquire a cleanup lock if the buffer is pinned by at > least one other session? Correct. When you have a pin you are allowed to point into the buffer and a cleanup lock allows you to rearange the contents of a page. So that doesn't work well together. > yeah -- that would defeat HOT for many > important cases. this should be pretty easy to demonstrate in > simulated testing. Well, HOT itself works without getting a cleanup lock. Its just HOT pruning that doesn't. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, May 10, 2013 at 8:33 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-05-10 08:28:24 -0500, Merlin Moncure wrote: >> On Fri, May 10, 2013 at 5:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> > Currently, when we access a buffer for a HOT update we check to see if >> > its possible to get a cleanup lock so we can clean the buffer. >> > >> > Currently, UPDATEs and DELETEs pin buffers during the scan phase and >> > then re-lock the buffer to update. >> > >> > So what we have is that multiple UPDATEs repeatedly accessing the same >> > block will prevent each other from successful cleanup, since while one >> > session is performing the update, the second session is pinning the >> > block with an indexscan. >> >> wait -- you can't acquire a cleanup lock if the buffer is pinned by at >> least one other session? > > Correct. When you have a pin you are allowed to point into the buffer > and a cleanup lock allows you to rearange the contents of a page. So > that doesn't work well together. > >> yeah -- that would defeat HOT for many >> important cases. this should be pretty easy to demonstrate in >> simulated testing. > > Well, HOT itself works without getting a cleanup lock. Its just HOT > pruning that doesn't. right. hm, I guess this is something to keep in mind if you start going down the path of 'keep frequently accessed buffers pinned for longer durations -- possibly even forever'. merlin
On 10 May 2013 14:13, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> So what we have is that multiple UPDATEs repeatedly accessing the same >> block will prevent each other from successful cleanup, since while one >> session is performing the update, the second session is pinning the >> block with an indexscan. > >> This effect has been noted for some time during pgbench runs, where >> running with more sessions than scale factors causes contention. We've >> never done anything about it because that's been seen as a poorly >> executed test, whereas it does actually match the real situation we >> experience at "hot spots" in the table. > > Uh, no. pgbench's problem at high scale factors is that multiple > sessions want to update *the same row*, not just different rows on the > same page. That contention is unavoidable. > > You may in fact have a good point, but you can't prove it by reference > to pgbench. I wasn't dissing pgbench, just saying that we've all witnessed the case I'm discussing many times and looked passed it because we were looking at general scalability, which we have now done a good job on (well done, team). There are two related use cases that demonstrate poor behaviour: a) Updating two separate rows that happen to be on the same block will eventually cause one or both of the rows to migrate to separate blocks because of 1) the inability to clean the existing block and 2) the way our fsm algorithm gives you a clean new block away from other people. That leads to a one-block-per-row situation, or in other words quite bad bloating, which seems to be avoidable, hence this thread. b) Updating the same row is also relevant, since concurrent updates cannot escape each other, so we get contention and bloating because HOT page cleanup is ineffective. The best we can achieve in this case is to make HOT page cleanup work, though we cannot ever escape the contention. Updating the same row is easier to reproduce with standard pgbench; the effect is shown very well by using -s 1 -c 4 though even more purely by running the attached test pgbench -i pgbench -c 4 -t 10000 -f update_same_row.pgb -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Fri, May 10, 2013 at 3:04 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > a) Updating two separate rows that happen to be on the same block will > eventually cause one or both of the rows to migrate to separate blocks > because of 1) the inability to clean the existing block and 2) the way > our fsm algorithm gives you a clean new block away from other people. > That leads to a one-block-per-row situation, or in other words quite > bad bloating, which seems to be avoidable, hence this thread. This seems like a good behaviour to me. If you have N busy rows then having each row in its own block minimizes contention and minimizes the frequency of cleanups. You can't be both worried about bloating *and* contention -- either you have relatively few busy rows per processor in which case the bloat is minor and the contention is an issue or you have many rows in which case the contention can't be an issue and the bloat becomes important. -- greg
On 10 May 2013 15:04, Merlin Moncure <mmoncure@gmail.com> wrote: > right. hm, I guess this is something to keep in mind if you start > going down the path of 'keep frequently accessed buffers pinned for > longer durations -- possibly even forever'. Just to mention that this scenario effectively starves anybody wanting a cleanup lock, which was the reason we put in logic to VACUUM to skip busy pages. We just need to extend that thought to page level cleanup also. --Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 10 May 2013 15:47, Greg Stark <stark@mit.edu> wrote: > On Fri, May 10, 2013 at 3:04 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> a) Updating two separate rows that happen to be on the same block will >> eventually cause one or both of the rows to migrate to separate blocks >> because of 1) the inability to clean the existing block and 2) the way >> our fsm algorithm gives you a clean new block away from other people. >> That leads to a one-block-per-row situation, or in other words quite >> bad bloating, which seems to be avoidable, hence this thread. > > This seems like a good behaviour to me. If you have N busy rows then > having each row in its own block minimizes contention and minimizes > the frequency of cleanups. You can't be both worried about bloating > *and* contention -- either you have relatively few busy rows per > processor in which case the bloat is minor and the contention is an > issue or you have many rows in which case the contention can't be an > issue and the bloat becomes important. For very small tables, yes, for anything else, no. Bloat and contention both slow you down. Difference is that bloat slows other people down as well by filling up RAM and causing extra I/O. If you have roving contention your table quickly spreads out to one table per block, which ain't great. --Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services