Re: Non-reproducible AIO failure - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: Non-reproducible AIO failure
Date
Msg-id 1fea555c-0345-46dc-8da5-5e667cad436a@garret.ru
Whole thread Raw
In response to Re: Non-reproducible AIO failure  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
I tried to catch moment when memory is changed using mprotect.
I have aligned PgAioHandle on page boundary (16kb at MacOS), and disable 
writes in `pgaio_io_reclaim`:
```
static void
pgaio_io_reclaim(PgAioHandle *ioh)
{
    RESUME_INTERRUPTS();
     rc = mprotect(ioh, sizeof(*ioh), PROT_READ);
     Assert(rc == 0);
fprintf(stderr, "!!!pgaio_io_reclaim [%d]| ioh: %p, ioh->op: %d, 
ioh->generation: %llu\n", getpid(), ioh, ioh->op, ioh->generation);
}

```

and reenable writes in `pgaio_io_before_start` and `pgaio_io_acquire_nb`:

```

static void
pgaio_io_before_start(PgAioHandle *ioh)
{
     int rc = mprotect(ioh, sizeof(*ioh), PROT_READ|PROT_WRITE);
     Assert(rc == 0);

```

and

```
PgAioHandle *
pgaio_io_acquire_nb(struct ResourceOwnerData *resowner, PgAioReturn *ret)
{
      ...

         ioh = dclist_container(PgAioHandle, node, ion);

         Assert(ioh->state == PGAIO_HS_IDLE);
         Assert(ioh->owner_procno == MyProcNumber);

         rc = mprotect(ioh, sizeof(*ioh), PROT_READ|PROT_WRITE);
         Assert(rc == 0);
}

```


The error is reproduced after 133 iterations:
```
!!!pgaio_io_reclaim [20376]| ioh: 0x1019bc000, ioh->op: 0, 
ioh->generation: 19346
!!!AsyncReadBuffers [20376] (1)| blocknum: 21, ioh: 0x1019bc000, 
ioh->op: 1, ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 0, 
ioh->generation: 19346
2025-06-12 01:05:31.865 EEST [20376:918] pg_regress/psql LOG: 
!!!pgaio_io_before_start| ioh: 0x1019bc000, ioh->op: 1, ioh->state: 1, 
ioh->result: 0, ioh->num_callbacks: 2, ioh->generation: 19346
```

But no write protection violation happen.
Do not know how to interpret this fact. Changes are made by kernel? 
`pgaio_io_acquire_nb` was called between `pgaio_io_reclaim` and 
`pgaio_io_before_start`?

I am now going add trace to `pgaio_io_acquire_nb`.





pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: Fix slot synchronization with two_phase decoding enabled
Next
From: shveta malik
Date:
Subject: Re: Replication slot is not able to sync up