Re: Changing shared_buffers without restart - Mailing list pgsql-hackers
From | Dmitry Dolgov |
---|---|
Subject | Re: Changing shared_buffers without restart |
Date | |
Msg-id | jhcrkdzetmni4ojbrppiyy2lyo322tvsy2z4iv4aeuie7idufb@i2xp7tkyaatl Whole thread Raw |
In response to | Re: Changing shared_buffers without restart (Dmitry Dolgov <9erthalion6@gmail.com>) |
Responses |
Re: Changing shared_buffers without restart
|
List | pgsql-hackers |
> On Fri, Jul 04, 2025 at 04:41:51PM +0200, Dmitry Dolgov wrote: > > v5-0003-Introduce-pss_barrierReceivedGeneration.patch > > > > 1) Do we actually need this? Isn't it enough to just have two barriers? > > Or a barrier + condition variable, or something like that. > > The issue with two barriers is that they do not prevent disjoint groups, > i.e. one backend joins the barrier, finishes the work and detaches from > the barrier, then another backends joins. I'm not familiar with how this > was solved for online checkums patch though, will take a look. Having a > barrier and a condition variable would be possible, but it's hard to > figure out for how many backends to wait. All in all, a small extention > to the ProcSignalBarrier feels to me much more elegant. After quickly checking how online checksums patch is dealing with the coordination, I've realized my answer here about the disjoint groups is not quite correct. You were asking about ProcSignalBarrier, I was answering about the barrier within the resizing logic. Here is how it looks like to me: * We could follow the same way as the online checksums, launch a coordinator worker (Ashutosh was suggesting that, but no implementation has materialized yet) and fire two ProcSignalBarriers, one to kick off resizing and another one to finish it. Maybe it could even be three phases, one extra to tell backends to not pull in new buffers into the pool to help buffer eviction process. * This way any backend between the ProcSignalBarriers will be able proceed with whatever it's doing, and there is need to make sure it will not access buffers that will soon disappear. A suggestion so far was to get all backends agree to not allocate any new buffers in the to-be-truncated range, but accessing already existing buffers that will soon go away is a problem as well. As far as I can tell there is no rock solid method to make sure a backend doesn't have a reference to such a buffer somewhere (this was discussed earlier in thre thread), meaning that either a backend has to wait or buffers have to be checked every time on access. * Since the latter adds a performance overhead, we went with the former (making backends wait). And here is where all the complexity comes from, because waiting backends cannot reply on a ProcSignalBarrier and thus require some other approach. If I've overlooked any other alternative to backends waiting, let me know. > It also seems a bit strange that the "switch" gets to be be driven by > a randomly selected backend (unless I'm misunderstanding this bit). It > seems to be true for the buffer eviction during shrinking, at least. But looks like the eviction could be indeed improved via a new coordinator worker. Before resizing shared memory such a worker will first tell all the backends to not allocate new buffers via ProcSignalBarrier, then will do buffer eviction. Since backends don't need to be waiting after this type of ProcSignalBarrier, it should work and establish only one worker to do the eviction. But the second ProcSignalBarrier for resizing would still follow the current procedure with everybody waiting. Does it make sense to you folks?
pgsql-hackers by date: