Re: Page-at-a-time Locking Considerations - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Page-at-a-time Locking Considerations |
Date | |
Msg-id | 200803230037.m2N0b6c19764@momjian.us Whole thread Raw |
In response to | Page-at-a-time Locking Considerations (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Page-at-a-time Locking Considerations
|
List | pgsql-hackers |
With no concrete patch or performance numbers, this thread has been removed from the patches queue. --------------------------------------------------------------------------- Simon Riggs wrote: > > In heapgetpage() we hold the buffer locked while we look for visible > tuples. That works well in most cases since the visibility check is fast > if we have status bits set. If we don't have visibility bits set we have > to do things like scan the snapshot and confirm things via clog lookups. > All of that takes time and can lead to long buffer lock times, possibly > across multiple I/Os in the very worst cases. > > This doesn't just happen for old transactions. Accessing very recent > TransactionIds is prone to rare but long waits when we ExtendClog(). > > Such problems are numerically rare, but the buffers with long lock times > are also the ones that have concurrent or at least recent write > operations on them. So all SeqScans have the potential to induce long > wait times for write transactions, even if they are scans on 1 block > tables. Tables with heavy write activity on them from multiple backends > have their work spread across multiple blocks, so a SeqScan will hit > this issue repeatedly as it encounters each current insertion point in a > table and so greatly increases the chances of it occurring. > > It seems possible to just memcpy() the whole block away and then drop > the lock quickly. That gives a consistent lock time in all cases and > allows us to do the visibility checks in our own time. It might seem > that we would end up copying irrelevant data, which is true. But the > greatest cost is memory access time. If hardware memory pre-fetch cuts > in we will find that the memory is retrieved en masse anyway; if it > doesn't we will have to wait for each cache line. So the best case is > actually an en masse retrieval of cache lines, in the common case where > blocks are fairly full (vague cutoff is determined by exact mechanism of > hardware/compiler induced memory prefetch). > > The copied block would be used only for visibility checks. The main > buffer would retain its pin and we would pass references to the block > through the executor as normal. So this would be a change completely > isolated to heapgetpage(). > > Was the copy-aside method considered when we introduced page at a time > mode? Any reasons to think it would be dangerous or infeasible? If not, > I'll give it a bash and get some test results. > > -- > Simon Riggs > 2ndQuadrant http://www.2ndQuadrant.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-hackers by date: