On 06/06/2025 2:31 am, Tom Lane wrote:
> Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
>> I have a very wild guess that's probably wrong in a weird way, but
>> here goes anyway:
>> Did anyone test if interleaving the enum-typed bitfield fields of
>> PgAioHandle with the uint8 fields might solve the issue?
> Ugh. I think you probably nailed it.
>
> IMO all those struct fields better be declared uint8.
>
> regards, tom lane
I also think that it can be in compiler. Bitfields with different enum
type looks really exotic, so no wonder that optimizer can do something
strange here.
I failed to reproduce the problem with old version of clang (15.0). Also
as far as I understand nobody was able to reproduce the problem with
disabled optimizations (-O0).
It definitely doesn't mean that there is bug in optimizer - just timing
can be changed.
Still it is not quite clear to me how `PGAIO_OP_READV` is managed to be
written.
There is just one place in the code when it is assigned:
```
pgaio_io_start_readv(PgAioHandle *ioh,
int fd, int iovcnt, uint64 offset)
{
...
pgaio_io_stage(ioh, PGAIO_OP_READV);
}
```
and `pgaio_io_stage` should update both `state` and `op`:
```
ioh->op = op;
ioh->result = 0;
pgaio_io_update_state(ioh, PGAIO_HS_DEFINED);
```
But as we see from the trace state is still PGAIO_HS_HANDED_OUT, so it
was not updated.
If there is some bug in optimizer which incorrectly construct mask for
bitfield assignment, it is still not clean where it managed to get this
PGAIO_OP_READV.
And we can be sure that it is really PGAIO_OP_READV and just arbitrary
garbage, because Alexander has replaced its value with 0xaa and we see
in logs that it is rally stored.
If there is race condition in `pgaio_io_update_state` (which enforces
memory barrier before updating state) then for example inserting some
sleep between assignment operation and status should increase
probability of error. But it doesn't happen. Also as far as I
understand, op is updated and read by the same backend. So it should not
be some synchronization issue.
So most likely it is bug in optimizer which generates incorrect code.
Can Alexander or somebody else who was able to reproduce the problem
share assembler code of `pgaio_io_reclaim` function?
I am not sure that the bug is in this function - but it is prime
suspect. Only `pgaio_io_start_readv` can set PGAIO_OP_READV, but we are
almost sure that it was no called.
So looks like that `op` was not cleared despite to what we see in logs.
But if there was incorrect code in `pgaio_io_reclaim`, then it should
always work incorrectly - doesn't clear "op" but in most cases it works...