Home > mailing lists

Re: Checkpointer write combining - Mailing list pgsql-hackers

From	Melanie Plageman
Subject	Re: Checkpointer write combining
Date	January 15 02:35:54
Msg-id	CAAKRu_b_e96kWChReBPNjsD7GJuGKoM+exdyodP+8Oa=-T=pZA@mail.gmail.com Whole thread Raw
In response to	Re: Checkpointer write combining (Chao Li <li.evan.chao@gmail.com>)
List	pgsql-hackers

Tree view

On Wed, Jan 14, 2026 at 2:49 AM Chao Li <li.evan.chao@gmail.com> wrote:
>
>
> I went through the patch set again today, and paid special attention to 0001 and 0008 that I seemed to not review
before.Here are comments I got today: 

Thanks! v13 attached.

> --- a/src/include/storage/buf_internals.h
> +++ b/src/include/storage/buf_internals.h
> @@ -15,6 +15,7 @@
>  #ifndef BUFMGR_INTERNALS_H
>  #define BUFMGR_INTERNALS_H
>
> +#include "access/xlogdefs.h"
>
> I tried to build without adding this include, build still passed. I think that’s because there is a include path:
storage/bufmgr.h-> storage/bufpage.h -> access/xlogger.h. 
>
> So, maybe we can remove this include.

Generally, at least for new code, we try to avoid transitive includes.

> + * The buffer must be pinned and content locked and the buffer header spinlock
> + * must not be held.
> + *
>   * Returns true if buffer manager should ask for a new victim, and false
>   * if this buffer should be written and re-used.
>   */
>  bool
>  StrategyRejectBuffer(BufferAccessStrategy strategy, BufferDesc *buf, bool from_ring)
>  {
> +       XLogRecPtr      lsn;
> +
> +       if (!strategy)
> +               return false;
> ```
>
> As the new comment stated, the buffer must be pinned, maybe we can add an assert to ensure that:
> ```
> Assert(BufferIsPinned(buffer));
> ```
>
> Similarly, maybe we can also assert the locks are not held:
> ```
> Assert(BufferDescriptorGetContentLock(buffer));
> Assert(!pg_atomic_read_u32(&buf->state) & BM_LOCKED);

I've added something like this.

>
> 3 - 0001 - bufmgr.c - BufferNeedsWALFlush()
> ```
> +       buffer = BufferDescriptorGetBuffer(bufdesc);
> +       page = BufferGetPage(buffer);
> +
> +       Assert(BufferIsValid(buffer));
> ```
>
> I think the Assert should be moved to before "page = BufferGetPage(buffer);”.

I just deleted the assert -- BufferGetPage() already does it.

> +bool
> +BufferNeedsWALFlush(BufferDesc *bufdesc, bool exclusive, XLogRecPtr *lsn)
> ```
>
> I think the “exclusive" parameter is a bit subtle. The parameter is not explicitly explained in the header comment,
thoughthere is a paragraph that explains different behaviors when the caller hold and not hold content lock. Maybe we
canrename to a more direct name: hold_exclusive_content_lock, or a shorter one hold_content_lock. 

I've done this.

I've also gone and fixed the various typos in code and commit messages
you mentioned.

> +/*
> + * Prepare the buffer with bufdesc for writing. Returns true if the buffer
> + * actually needs writing and false otherwise. lsn returns the buffer's LSN if
> + * the table is logged.
> + */
> +static bool
> +PrepareFlushBuffer(BufferDesc *bufdesc, XLogRecPtr *lsn)
> +{
>         uint32          buf_state;
>
>         /*
> @@ -4425,42 +4445,16 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOObject io_object,
>          * someone else flushed the buffer before we could, so we need not do
>          * anything.
>          */
> -       if (!StartBufferIO(buf, false, false))
> -               return;
>
> The header comment says “lsn returns the buffer's LSN if the table is logged”, which looks inaccurate, because if
StartBufferIO()is true, the function returns early without setting *lsn. 

Yes, good point. I should initialize it to InvalidXLogRecPtr before
calling StartBufferIO() and document it appropriately. I've done that.
While looking at this, I realized that PrepareFlushBuffer() becomes
useless once I make normal (non-strategy) buffer flushes do write
combining. This inspired a refactor whichI think simplifies the code.

Unfortunately, it means this patch set introduces PrepareFlushBuffer()
in 0001 and then deletes it in 0008. I'm not really sure what to do
about that, but maybe it will come to me later.

> +       BlockNumber max_batch_size = 3; /* we only look for two successors */
> ```
>
> Using type BlockNumber for batch size seems odd. I would suggest change to uint32.

Right, at the least, I wasn't consistent with which one to use. I've changed it.

> 11 - 0008 - WriteBatchInit()
> ```
> +       LockBufHdr(batch_start);
> +       batch->max_lsn = BufferGetLSN(batch_start);
> +       UnlockBufHdr(batch_start);
> ```
>
> Should we check unlogged buffer before assigning max_lsn? In previous commits, we have done that in many places.

Good point, when fixing this, I realized that I was getting the page
LSN more times than I needed to for the victim buffer. I restructured
it in a way that I think is more clear.

On a separate note, I've made some other changes -- like getting rid
of the BufferBlockRequirements struct I added, because I realized that
was just a BufferTag.

And I found a few places where I hadn't documented my expectations
around the buffer header spinlock. Because I'm checking fields in the
buffer header for a heuristic, I'm bending the rules about concurrency
control. Each time I look at the patch, I find a new mistake in how
I'm handling access to the buffer header.

Speaking of which, periodically I see a test failure related to a
block having invalid contents (in 027_stream_regress). It's not
reproducible, so I haven't investigated much yet. There's at least one
bug in this patch set that needs shaking out, but I think that will
have to wait for another day. I'll probably want to add some tests.

- Melanie

Attachment

pgsql-hackers by date:

From: Tom Lane
Date: 15 January, 02:30:14
Subject: Re: Proposal to allow setting cursor options on Portals

From: Andres Freund
Date: 15 January, 02:37:54
Subject: Re: Buffer locking is special (hints, checksums, AIO writes)

Re: Checkpointer write combining - Mailing list pgsql-hackers

Attachment

Previous

Next