Re: Thinking about breaking up the BufMgrLock - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Thinking about breaking up the BufMgrLock |
Date | |
Msg-id | KGEFLMPJFBNNLNOOOPLGEEHLCIAA.simon@2ndquadrant.com Whole thread Raw |
In response to | Thinking about breaking up the BufMgrLock (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Thinking about breaking up the BufMgrLock
|
List | pgsql-hackers |
>Tom Lane > But without a smart idea about data structures I don't see how to do > better. > > Thoughts? > Hmm, seems like you summed up the lack of algorithmic choices very well. After much thought, I believe the best way is to implement bufferpools (BPs). That is, we don't just have one bufferhash and one LRU, we have many pairs. We then work out some mapping by which a block can be known to be in a particular BP, then acquire the lock for that BP. Using BPs means that each block is only ever associated with one BP. Hence, there is still only a single BPBufMgrLock to Acquire/Release, but there could be considerably less contention for a specific lock. BPs can be implemented manually or automatically. Commercial systems provide facilities for manual placement of specific tables into specific bufferpools. That is more work, so I would favour automatically assigned BPs, though leaving the door open to further extension. It seems simple to use one or more of the numbers associated with a block as input to a hash algorithm to identify the BP in which to locate the block. After some thinking, using relationId seems to be the best candidate. That would mean an index or sequential scan would be more likely to reAcquire a lock. There is some possibility that the multiple BP LRUs could become unbalanced, so very short LRUs would not benefit as much from splitting up in this way. So, I suggest linking the number of BPs to the number of shared buffers, say 1 BP per 10,000 shared_buffers, up to a maximum of 8. I believe a self-balancing algorithm between the multiple lists would be fairly straightforward addition also. ISTM that there is no chance for deadlock in using BPs. The current code assumes that all blocks are accessed via a single BufMgrLock. There is never an attempt to acquire BufMgrLock when it is already held, so no chance therefore of deadlock. Overall, this idea would reduce BufMgrLock contention by as many multiples as you choose, as well as allowing more practical use of larger shared_buffers settings which would further drive up performance. The costs need not be incurred at all for smaller shared_buffers and the current situation (num_bufferpools = 1) can remain the default. Also, the current code already allows for multiple lists, so extending the number from 4 (or 3) to multiples seems more straightforward. One other consideration is the eviction of dirty buffers, but more detail on that in another post. What should be observed there is that dirty buffer evictions perform Acquire/Release of the BufMgrLock at least twice. Improving the bgwriter will also have an effect on BufMgrLock contention. Best Regards, Simon Riggs
pgsql-hackers by date: