Re: [RFC] Lock-free XLog Reservation from WAL - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: [RFC] Lock-free XLog Reservation from WAL
Date
Msg-id CAEze2Wj+xro-X1PBAPFQR4eHuyqeWN=A7awuOsdLBzGTbjwW4A@mail.gmail.com
Whole thread Raw
In response to Re: [RFC] Lock-free XLog Reservation from WAL  (Yura Sokolov <y.sokolov@postgrespro.ru>)
Responses Re: [RFC] Lock-free XLog Reservation from WAL
List pgsql-hackers
On Fri, 10 Jan 2025 at 13:42, Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
>
> BTW, your version could make alike trick for guaranteed atomicity:
> - change XLogRecord's `XLogRecPtr xl_prev` to `uint32 xl_prev_offset`
> and store offset to prev record's start.

-1, I don't think that is possible without degrading what our current
WAL system protects against.

For intra-record torn write protection we have the checksum, but that
same protection doesn't cover the multiple WAL records on each page.
That is what the xl_prev pointer is used for - detecting that this
part of the page doesn't contain the correct data (e.g. the data of a
previous version of this recycled segment).
If we replaced xl_prev with just an offset into the segment, then this
protection would be much less effective, as the previous version of
the segment realistically used the same segment offsets at the same
offsets into the file.

To protect against torn writes while still only using record segment
offsets, you'd have zero and then fsync any segment before reusing it,
which would severely reduce the benefits we get from recycling
segments.
Note that we can't expect the page header to help here, as write tears
can happen at nearly any offset into the page - not just 8k intervals
- and so the page header is not always representative of the origins
of all bytes on the page - only the first 24 (if even that).

Kind regards,

Matthias van de Meent



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: use a non-locking initial test in TAS_SPIN on AArch64
Next
From: Greg Sabino Mullane
Date:
Subject: Re: pg_dump, pg_dumpall, pg_restore: Add --no-policies option