Re: Optimize LISTEN/NOTIFY - Mailing list pgsql-hackers
| From | Joel Jacobson |
|---|---|
| Subject | Re: Optimize LISTEN/NOTIFY |
| Date | |
| Msg-id | 34ccde90-2041-4388-90d5-f8519f3104dd@app.fastmail.com Whole thread Raw |
| In response to | Re: Optimize LISTEN/NOTIFY (Tom Lane <tgl@sss.pgh.pa.us>) |
| Responses |
Re: Optimize LISTEN/NOTIFY
|
| List | pgsql-hackers |
On Wed, Jan 7, 2026, at 21:29, Tom Lane wrote:
> "Joel Jacobson" <joel@compiler.org> writes:
>> I've reworked the staging mechanism for LISTEN/UNLISTEN. The new design
>> tracks LISTEN state at three levels:
>> * pendingListenChannels: per-transaction pending changes
>> * listenChannelsHash: per-backend committed state cache
>> * channelHash: cluster-wide shared state
>
> I spent some time looking through this version.
Many thanks for a great review with awesome ideas.
>> The main benefit is that pendingListenChannels is now a hash table
>> instead of a simple List. In the old design, LISTEN foo; UNLISTEN foo;
>> LISTEN foo would create three list entries that all had to be processed
>> at commit. The new design collapses this to one hash entry storing the
>> final state, which we just apply at commit.
>
> That seems a little weird: surely such usage is not something we need
> to optimize. Maybe it can be justified because it makes the code
> simpler, but that's not immediately obvious to me.
The reason for not sticking to the two-boolean approach (staged/current)
like in v32 was to minimize shared dshash operations in favor of local
hash table operations.
Consider `LISTEN foo; UNLISTEN foo; LISTEN foo:`
With two booleans and pendingListenActions as a List (as in v32):
Each LISTEN/UNLISTEN touches the shared dshash during PreCommit to set
the staged value, then commit iterates the entire list (including
duplicates) to finalize. That's 6 dshash operations in total.
With a local hash table (current design, v34):
UNLISTEN during PreCommit doesn't touch the shared dshash at all, it
just records the intent locally. LISTEN pre-allocates once per channel.
At commit we iterate the de-duplicated hash table, so we do one dshash
operation per unique channel rather than per action. That's 2 dshash
operations in total.
Since dshash operations involve shared memory access and potential
contention, doing more work locally with a cheap hash table seemed like
the right trade-off to me.
I also tried making the current v34 approach work with
pendingListenActions as a List. The problem is that at commit, we need
to know the final intent per channel. With a List containing
duplicates, we'd iterate in reverse for "last action wins" semantics,
but then we'd need to track which channels we've already processed,
essentially reimplementing a hash table manually. The hash table gives
us this for free: inserting the same channel just overwrites the
previous action, and at commit each channel appears exactly once with
its final intent.
> I found the terminology pretty confusing, in particular there's too
> little cognitive distance between "channel hash table" and "local
> listen channels hash table" and "pending listen channels hash table".
> I think you could improve matters by renaming, perhaps along the
> lines of
I really struggled trying to come up with meaningful names, so big
thanks for great naming improvements.
Renames per suggestions:
channelHash -> globalChannelTable
listenChannelsHash -> localChannelTable
pendingListenChannels -> pendingListenActions
NotificationList.channelSet -> NotificationList.uniqueChannelNames
Exec_ListenPreCommit -> BecomeRegisteredListener
Exec_ListenPreCommitStage -> PrepareTableEntriesForListen
Exec_UnlistenPreCommitStage -> PrepareTableEntriesForUnlisten
Exec_UnlistenAllPreCommitStage -> PrepareTableEntriesForUnlistenAll
I also renamed the following to make their purpose clear:
ChannelNameKey -> GlobalChannelKey
ChannelHashKey -> GlobalChannelKey
ChannelListeners -> GlobalChannelEntry
ChannelListeners.ChannelHashKey -> GlobalChannelEntry.GlobalChannelKey
AsyncQueueControl.channelHashDSA -> AsyncQueueControl.globalChannelTableDSA
AsyncQueueControl.channelHashDSH -> AsyncQueueControl.globalChannelTableDSH
channelDSA -> globalChannelDSA
I also renamed PendingListenEntry.listening to
PendingListenEntry.action, and changed it from a bool to a new enum
PendingListenAction. I didn't like how this field was a bool named
"listening", since ListenerEntry's also has a bool field named
"listening". It felt better to make it an enum with the explicit values
{PENDING_LISTEN, PENDING_UNLISTEN}.
> I noted while looking at code coverage results that AtAbort_Notify
> seems almost entirely unreached, and soon realized that that's because
> none of it runs unless we are cleaning up a transaction that fails
> after Exec_ListenPreCommit has created pendingListenChannels.
> That's obviously extremely difficult to reach in a test, so it's a bit
> sad that we have that big chunk of unreachable code, especially when
> the coverage elsewhere in this file is pretty stellar. I wonder if
> there is a way to merge that chunk with the code in AtCommit_Notify
> that also has to scan pendingListenChannels, so as to reduce the
> amount of untested code. Also, the comments for AtAbort_Notify
> don't really make it clear that there's nothing to do unless we're
> reverting an almost-committed transaction.
Great idea, thanks! I also found it troubling not being able to test
AtAbort_Notify without e.g. manually adding code to simulate OOM, which
is how I manually tested the old code.
Two new helper-functions have been introduced:
ApplyPendingListenActions
-> RemoveListenerFromChannel
RemoveListenerFromChannel is now used both by UNLISTEN being committed...
AtCommit_Notify
-> ApplyPendingListenActions
-> RemoveListenerFromChannel
...as well as the harder-to-hit case of a staged LISTEN being aborted:
AtAbort_Notify
-> ApplyPendingListenActions
-> RemoveListenerFromChannel
> Speaking of AtCommit_Notify:
>
> entry = dshash_find(channelHash, &key, true);
> if (entry == NULL)
> continue;
>
> How can it be okay to just silently "continue" here? Certainly
> this case should be unreachable, but that doesn't make it okay
> to silently ignore the failure if one were to occur. Given
> that this is post-commit processing, probably we can do no
> better than
The silent continue was handling UNLISTEN of a channel we weren't
listening on (which is allowed as a no-op). However, I've now moved
that check earlier: PrepareTableEntriesForUnlisten now returns early if
the channel isn't in localChannelTable. This means
ApplyPendingListenActions can safely PANIC on entry == NULL, since that
case is now truly unreachable.
> Also, a bit further down:
>
...
>
> This seems like a weird order of operations: why didn't you finish
> dealing with the channelHash entry before dealing with
> listenChannelsHash?
Right. Fixed.
New version attached.
/Joel
Attachment
pgsql-hackers by date: