Home > mailing lists

Re: Page-at-a-time Locking Considerations - Mailing list pgsql-hackers

From	Bruce Momjian
Subject	Re: Page-at-a-time Locking Considerations
Date	March 22, 2008 21:37:15
Msg-id	200803230037.m2N0b6c19764@momjian.us Whole thread Raw
In response to	Page-at-a-time Locking Considerations (Simon Riggs <simon@2ndquadrant.com>)
Responses	Re: Page-at-a-time Locking Considerations
List	pgsql-hackers

Tree view

With no concrete patch or performance numbers, this thread has been
removed from the patches queue.

---------------------------------------------------------------------------

Simon Riggs wrote:
> 
> In heapgetpage() we hold the buffer locked while we look for visible
> tuples. That works well in most cases since the visibility check is fast
> if we have status bits set. If we don't have visibility bits set we have
> to do things like scan the snapshot and confirm things via clog lookups.
> All of that takes time and can lead to long buffer lock times, possibly
> across multiple I/Os in the very worst cases.
> 
> This doesn't just happen for old transactions. Accessing very recent
> TransactionIds is prone to rare but long waits when we ExtendClog(). 
> 
> Such problems are numerically rare, but the buffers with long lock times
> are also the ones that have concurrent or at least recent write
> operations on them. So all SeqScans have the potential to induce long
> wait times for write transactions, even if they are scans on 1 block
> tables. Tables with heavy write activity on them from multiple backends
> have their work spread across multiple blocks, so a SeqScan will hit
> this issue repeatedly as it encounters each current insertion point in a
> table and so greatly increases the chances of it occurring.
> 
> It seems possible to just memcpy() the whole block away and then drop
> the lock quickly. That gives a consistent lock time in all cases and
> allows us to do the visibility checks in our own time. It might seem
> that we would end up copying irrelevant data, which is true. But the
> greatest cost is memory access time. If hardware memory pre-fetch cuts
> in we will find that the memory is retrieved en masse anyway; if it
> doesn't we will have to wait for each cache line. So the best case is
> actually an en masse retrieval of cache lines, in the common case where
> blocks are fairly full (vague cutoff is determined by exact mechanism of
> hardware/compiler induced memory prefetch).
> 
> The copied block would be used only for visibility checks. The main
> buffer would retain its pin and we would pass references to the block
> through the executor as normal. So this would be a change completely
> isolated to heapgetpage().
> 
> Was the copy-aside method considered when we introduced page at a time
> mode? Any reasons to think it would be dangerous or infeasible? If not,
> I'll give it a bash and get some test results.
> 
> -- 
>   Simon Riggs
>   2ndQuadrant  http://www.2ndQuadrant.com 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +

pgsql-hackers by date:

From: Bruce Momjian
Date: 22 March 2008, 21:32:40
Subject: Re: pg_dump additional options for performance

From: Tom Lane
Date: 22 March 2008, 22:14:39
Subject: Re: Building PostgreSQL 8.3.1 on OpenVMS 8.3 AXP

Re: Page-at-a-time Locking Considerations - Mailing list pgsql-hackers

Previous

Next