Re: Buffer locking is special (hints, checksums, AIO writes) - Mailing list pgsql-hackers

From Chao Li
Subject Re: Buffer locking is special (hints, checksums, AIO writes)
Date
Msg-id 732580B4-962F-4EBD-8811-F4667D79747D@gmail.com
Whole thread Raw
In response to Re: Buffer locking is special (hints, checksums, AIO writes)  (Chao Li <li.evan.chao@gmail.com>)
Responses Re: Buffer locking is special (hints, checksums, AIO writes)
List pgsql-hackers

> On Jan 15, 2026, at 08:04, Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
>> On Jan 15, 2026, at 07:37, Andres Freund <andres@anarazel.de> wrote:
>>
>> Hi,
>>
>> On 2026-01-15 07:20:27 +0800, Chao Li wrote:
>>>> On Jan 15, 2026, at 00:30, Andres Freund <andres@anarazel.de> wrote:
>>>> On 2026-01-14 11:41:19 +0800, Chao Li wrote:
>>>>> Basically, code changes in 0003 is straightforward, just a couple of small comments:
>>>>>
>>>>> 1
>>>>> ```
>>>>> - * refcounts in buf_internals.h.  This limitation could be lifted by using a
>>>>> - * 64bit state; but it's unlikely to be worthwhile as 2^18-1 backends exceed
>>>>> - * currently realistic configurations. Even if that limitation were removed,
>>>>> - * we still could not a) exceed 2^23-1 because inval.c stores the ProcNumber
>>>>> - * as a 3-byte signed integer, b) INT_MAX/4 because some places compute
>>>>> - * 4*MaxBackends without any overflow check.  We check that the configured
>>>>> - * number of backends does not exceed MAX_BACKENDS in InitializeMaxBackends().
>>>>> + * refcounts in buf_internals.h.  This limitation could be lifted, but it's
>>>>> ```
>>>>>
>>>>> Before this patch, there was room for lifting the limitation. With this
>>>>> patch, state is 64bit already, but the significant 32bit will be used for
>>>>> buffer locking as stated in buf_internals.h, in other words, there is no
>>>>> room for lifting the limitation now. If that’s true, then I think we can
>>>>> remove the statements about lifting limitation.
>>>>
>>>> I'm not following - there's plenty space for more bits if we need that:
>>>>
>>>> * State of the buffer itself (in order):
>>>> * - 18 bits refcount
>>>> * - 4 bits usage count
>>>> * - 12 bits of flags
>>>> * - 18 bits share-lock count
>>>> * - 1 bit share-exclusive locked
>>>> * - 1 bit exclusive locked
>>>>
>>>> That's 54 bits in total. Which part is in the lower and which in the upper
>>>> 32bit isn't relevant for anything afaict?
>>>
>>> Because I saw the comment in buf_internals.h:
>>> ```
>>> * NB: A future commit will use a significant portion of the remaining bits to
>>> * implement buffer locking as part of the state variable.
>>> ```
>>> That seems to indicate all the significant 32 bits will be used for buffer locking.
>>
>> A significant portion != all. As the above excerpt from the comment shows, the
>> locking uses 20 bits. We could increase max backends by 5 bits without running
>> out of bits (we'd need space both in the refcount bitspace as well as the
>> share-lock bitspace).
>
> Make sense. I think I misread the comment.
>
>>
>>
>>> Also, there is an assert that concretes the impression:
>>> ```
>>> StaticAssertDecl(BUF_REFCOUNT_BITS + BUF_USAGECOUNT_BITS + BUF_FLAG_BITS == 32,
>>>      "parts of buffer state space need to equal 32");
>>> ```
>>
>> You can see that being relaxed in the subsequent commit, when we start to use
>> more bits.
>>
>
> Sure. I plan to review 0003-0005 today. I believe I will get better understanding.

I have finished reviewing 0003-0005. Basically, 0003 and 0004 just delete some unused functions. I only searched over
thesource tree to ensure no usages of them. I spent time on 0005. The code logic are solid already, I didn't find any
correctnessissue and only got some small comments: 

1 - 0004 - commit message
```
subsequent commits fixing a typo an a parameter name.
```

Typo: a typo an a parameter name => a typo in a parameter name

2 - 0005 - bufmgr.c
```
-     * Pin it, share-lock it, write it.  (FlushBuffer will do nothing if the
-     * buffer is clean by the time we've locked it.)
+     * Pin it, share-exclusive-lock it, write it.  (FlushBuffer will do
+     * nothing if the buffer is clean by the time we've locked it.)
      */
     PinBuffer_Locked(bufHdr);
```

I think we don’t need to mention lock type in this comment, because the logic belongs to FlushUnlockedBuffer(). Also,
FlushBufferis misleading here, because we call FlushUnlockedBuffer() here, and FlushUnlockedBuffer() in turn calls
FlushBuffer().

So, I think we can simplify the comment as something like “Pin it and flush the buffer"

3 - 0005 - bufmgr.c
```
+inline void
+MarkBufferDirtyHint(Buffer buffer, bool buffer_std)
```

It’s quite uncommon to extern an inline function. I think usually if we want to make an inline function accessible from
external,we define it “static inline” in a header file. So, I guess “inline” is a typo here. 

4 - 0005 - bufmgr.c
```
/*
* MarkBufferDirtyHint
*
* Mark a buffer dirty for non-critical changes.
*
* This is essentially the same as MarkBufferDirty, except:
*
* 1. The caller does not write WAL; so if checksums are enabled, we may need
* to write an XLOG_FPI_FOR_HINT WAL record to protect against torn pages.
* 2. The caller might have only a share-exclusive-lock instead of an
* exclusive-lock on the buffer's content lock.
```

For point 2, do we need to mention “For shared buffers”. Because this function also handles local buffer that doesn’t
requirea lock. 

5 - 0005 - bufmgr.c
```
+/*
+ * Helper for BufferBeginSetHintBits() and BufferSetHintBits16().

+static inline bool
+SharedBufferBeginSetHintBits(Buffer buffer, BufferDesc *buf_hdr, uint64 *lockstate)
```

In the previous function comment:
```
- * MarkBufferDirtyHint
+ * Shared-buffer only helper for MarkBufferDirtyHint() and
+ * BufferSetHintBits16().
```

It mentions “Shared-buffer only helper”. I think SharedBufferBeginSetHintBits is also only for shared buffer, maybe
alsoadd “Shared-buffer only” before “helper” in the comment. 

6 - 0005 - bufmgr.c
```
+    }
+
+}
```

Nit: In function SharedBufferBeginSetHintBits, the last empty line is not needed.

7 - 0005 - bufmgr.c
```
+ * This checks if the current lock mode already suffices to allow hint bits
+ * being set and, if not, whether the current lock can be upgraded.
+ */
+static inline bool
+SharedBufferBeginSetHintBits(Buffer buffer, BufferDesc *buf_hdr, uint64 *lockstate)
```

Nit: "if not, whether the current lock can be upgraded” might be read as “lock can be upgraded, so the caller still
needto take some action to upgrade the lock”, but the function has upgraded the lock when returning true. So, how about
explicitlystate something like: "if not, it attempts to atomically upgrade it to share-exclusive. Returns true if hint
bitsmay be set (with or without an upgrade), false otherwise." 

8 - 0005 - fsmpage.c
```
 * needs to be updated. exclusive_lock_held should be set to true if the
* caller is already holding an exclusive lock, to avoid extra work.
```

The function comment of fsm_search_avail() needs to be updated. exclusive_lock_held should be set to true if the caller
isalready holding a **share-exclusive or** exclusive lock. 

Maybe the parameter name “exclusive_lock_held” can be enhanced as well.

9 - 0005 - fsmpage.c
```
+        if (!exclusive_lock_held)
+            BufferFinishSetHintBits(buf, false, true);
```

Nit: just curious why set the third parameter as “true”? When the second (mark_dirty) is false, the third parameter is
notused at all. 

10 - 0005 - freespace.c
```
-     * Reset the next slot pointer. This encourages the use of low-numbered
-     * pages, increasing the chances that a later vacuum can truncate the
-     * relation. We don't bother with marking the page dirty if it wasn't
-     * already, since this is just a hint.
+     * Try to reset the next slot pointer. This encourages the use of
+     * low-numbered pages, increasing the chances that a later vacuum can
+     * truncate the relation. We don't bother with marking the page dirty if
+     * it wasn't already, since this is just a hint.
      */
     LockBuffer(buf, BUFFER_LOCK_SHARE);
-    ((FSMPage) PageGetContents(page))->fp_next_slot = 0;
+    if (BufferBeginSetHintBits(buf))
+    {
+        ((FSMPage) PageGetContents(page))->fp_next_slot = 0;
+        BufferFinishSetHintBits(buf, false, false);
+    }
```

Before this patch, we unconditionally set fp_next_slot, now the setting might be skipped. You have add “Try to” in the
commentthat has explained the possibility of skipping setting fp_next_slot. Would it be better to add a brief statement
forwhat will result in when skipping setting fp_next_slot? Something like “Skipping the update only affects reuse, not
correctness”.

Lately, I saw you have done this in nbtinsert.c:
```
* mark the index entry killed. It's ok if we're not
* allowed to, this isn't required for correctness.
```
which, I think, confirmed my comment.

11 - 0005 Maybe not a problem. In nbtree.h:
```
/*
* We need to be able to tell the difference between read and write
* requests for pages, in order to do locking correctly.
*/
#define BT_READ BUFFER_LOCK_SHARE
#define BT_WRITE BUFFER_LOCK_EXCLUSIVE
```

Can the new lock type BUFFER_LOCK_SHARE_EXCLUSIVE be used by nbt? Maybe implicitly upgrading from BT_READ to
BUFFER_LOCK_SHARE_EXCLUSIVEis good enough? 

12 - 0005 - heapam_visibility.c

After this commit, tuple hint bits may remain unset if we can’t obtain share-exclusive permission. That’s fine because
hintbits are advisory and optional, but this is a behavior change. Would it make sense to mention this explicitly and
brieflyin the SetHintBitsExt() comment? 

13 - 0005 - nbtutils.c
```
+                /*
+                 * If we're not able to set hint bits, there's no point
+                 * continuing.
+                 */
+                if (!killedsomething &&
+                    !BufferBeginSetHintBits(buf))
+                    goto unlock_page;
```
I like this code, because it ensures to only call BufferBeginSetHintBits once. But lately, I saw the same logic in
gistget.c:
```
+        if (!killedsomething)
+        {
+            /*
+             * Use hint bit infrastructure to be allowed to modify the page
+             * without holding an exclusive lock.
+             */
+            if (!BufferBeginSetHintBits(buffer))
+                goto unlock;
+        }
```
I just feel the gist version is easier to read, so maybe change the nbt one to be the same as the gist one. But I think
thenbt one’s comment is better. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







pgsql-hackers by date:

Previous
From: lchch1990@sina.cn
Date:
Subject: Re: Can we change pg_rewind used without wal_log_hints and data_checksums
Next
From: Tatsuro Yamada
Date:
Subject: Re: [PATCH] psql: add \dcs to list all constraints