Re: Non-reproducible AIO failure - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Non-reproducible AIO failure
Date
Msg-id kzasuj5yb7o2fbii43pmxwdvbk5z7j2kbwwigfmdgv4vve2tkl@wvtxubexck2r
Whole thread Raw
In response to Re: Non-reproducible AIO failure  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Non-reproducible AIO failure
Re: Non-reproducible AIO failure
List pgsql-hackers
Hi,

On 2025-06-06 15:21:13 -0400, Tom Lane wrote:
> Konstantin Knizhnik <knizhnik@garret.ru> writes:
> > There is really essential difference in code generated by clang 15
> > (working) and 16 (not working).
>
> It's a mistake to think that this is a compiler bug.  The C standard
> explicitly allows compilers to use word-wide operations to access
> bit-field struct members.  Such accesses may fetch or overwrite other
> bit-fields in the same word, using a non-atomic read/modify/write
> sequence.  I don't have a recent C standard at hand, but I found this
> quote on the web [1]:
>
> The C Standard, 3.17, paragraph 3 [ISO/IEC 9899:2024], states
>
>     Note 2 to entry: A bit-field and an adjacent non-bit-field member
>     are in separate memory locations. The same applies to two
>     bit-fields, if one is declared inside a nested structure
>     declaration and the other is not, or if the two are separated by a
>     zero-length bit-field declaration, or if they are separated by a
>     non-bit-field member declaration. It is not safe to concurrently
>     update two non-atomic bit-fields in the same structure if all
>     members declared between them are also (nonzero-length)
>     bit-fields, no matter what the sizes of those intervening
>     bit-fields happen to be.
>
> So it's our code that is busted.  No doubt, what is happening is
> that process A is fetching two fields, modifying one of them,
> and storing the word back (with the observed value of the other
> field) concurrently with some other process trying to update
> the other field.  So the other process's update is lost.

There shouldn't be any concurrent accesses here, so I don't really see how the
above would explain the problem (the IO can only ever be modified by one
backend, initially the "owning backend", then, when submitted, by the IO
worker, and then again by the backend).


Alexander's latest post has a good example (due to the added logging)

At the start of reclaim (looks sane):
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2,
ioh->generation:29202
 

At the end of reclaim (looks sane), note ioh->op = 0
!!!pgaio_io_reclaim [91056]| ioh: 0x1066d15d0, ioh->op: 0, ioh->generation: 29203

Just a bit later (broken):
!!!AsyncReadBuffers [91056] (1)| blocknum: 73, ioh: 0x1066d15d0, ioh->op: 170, ioh->state: 1, ioh->result: 0,
ioh->num_callbacks:0, ioh->generation: 29203
 

Note that ioh->op went back to 170, despite the code that sets ioh->op not yet
having been called. However the increment of the increment of ioh->generation
survived.  Between the second "!!!pgaio_io_reclaim" and "!!!AsyncReadBuffers"
there definitely aren't any other processes accessing things.



> If we want to preserve the compactness of this struct, we have
> to use primitive int types not bit-fields.

I think we have to do that, if for no other reason than how bad the generated
code is. But I would still like to understand what actually causes the problem
here - I'm far from sure it's actually the bitfield.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Non-reproducible AIO failure
Next
From: "Daniel Verite"
Date:
Subject: Re: CREATE DATABASE command for non-libc providers