Re: Non-reproducible AIO failure - Mailing list pgsql-hackers

From Alexander Lakhin
Subject Re: Non-reproducible AIO failure
Date
Msg-id 94f9bf81-e8ff-434a-aa9f-ec6591f527d2@gmail.com
Whole thread Raw
In response to Re: Non-reproducible AIO failure  (Alexander Lakhin <exclusion@gmail.com>)
List pgsql-hackers
31.05.2025 06:00, Alexander Lakhin wrote:
> Hello Thomas,
>
> It looks like I managed to restore all the conditions needed to reproduce
> that Assert more or less reliably (within a couple of hours), so I can
> continue experiments.
>
> I've added the following debugging:
> ...

With additional logging (the patch is attached), I can see the following:
!!!pgaio_io_reclaim [63817]| ioh: 0x1046b5660, ioh->op: 1, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2
!!!AsyncReadBuffers [63817] (1)| blocknum: 15, ioh: 0x1046b5660, ioh->op: 0, ioh->state: 1, ioh->result: 0, 
ioh->num_callbacks: 0
!!!pgaio_io_start_readv [63817]| ioh: 0x1046b5660, ioh->op: 1
!!!pgaio_io_reclaim [63817]| ioh: 0x1046b5660, ioh->op: 1, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2
!!!AsyncReadBuffers [63817] (1)| blocknum: 16, ioh: 0x1046b5660, ioh->op: 0, ioh->state: 1, ioh->result: 0, 
ioh->num_callbacks: 0
!!!pgaio_io_start_readv [63817]| ioh: 0x1046b5660, ioh->op: 1
!!!pgaio_io_reclaim [63817]| ioh: 0x1046b5660, ioh->op: 1, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2
!!!AsyncReadBuffers [63817] (1)| blocknum: 17, ioh: 0x1046b5660, ioh->op: 0, ioh->state: 1, ioh->result: 0, 
ioh->num_callbacks: 0
!!!pgaio_io_start_readv [63817]| ioh: 0x1046b5660, ioh->op: 1
!!!pgaio_io_reclaim [63817]| ioh: 0x1046b5660, ioh->op: 1, ioh->state: 6, ioh->result: 8192, ioh->num_callbacks: 2
!!!AsyncReadBuffers [63817] (1)| blocknum: 18, ioh: 0x1046b5660, ioh->op: 0, ioh->state: 1, ioh->result: 0, 
ioh->num_callbacks: 0
2025-06-02 00:45:11.391 EDT [63817:14] pg_regress/brin LOG: !!!pgaio_io_before_start| ioh: 0x1046b5660, ioh->op: 1, 
ioh->state: 1, ioh->result: 0, ioh->num_callbacks: 2

TRAP: failed Assert("ioh->op == PGAIO_OP_INVALID"), File: "aio_io.c", Line: 164, PID: 63817
0   postgres                            0x0000000102fdb60c ExceptionalCondition + 236
1   postgres                            0x0000000102ce233c pgaio_io_before_start + 504
2   postgres                            0x0000000102ce20a8 pgaio_io_start_readv + 36
3   postgres                            0x0000000102d01200 FileStartReadV + 252
4   postgres                            0x0000000102d544e0 mdstartreadv + 668
5   postgres                            0x0000000102d57ac8 smgrstartreadv + 116
6   postgres                            0x0000000102cea474 AsyncReadBuffers + 1996
7   postgres                            0x0000000102ce9430 StartReadBuffersImpl + 1196
8   postgres                            0x0000000102ce8f74 StartReadBuffers + 64
9   postgres                            0x0000000102ce657c read_stream_start_pending_read + 1204
10  postgres                            0x0000000102ce5c60 read_stream_look_ahead + 812
11  postgres                            0x0000000102ce57f8 read_stream_next_buffer + 2356
12  postgres                            0x00000001026da1b8 heap_fetch_next_buffer + 284
13  postgres                            0x00000001026caf54 heapgettup_pagemode + 192
14  postgres                            0x00000001026cb524 heap_getnextslot + 84
15  postgres                            0x0000000102a4d984 table_scan_getnextslot + 340
16  postgres                            0x0000000102a4d790 SeqNext + 148
17  postgres                            0x0000000102a4de64 ExecScanFetch + 772
18  postgres                            0x0000000102a4da38 ExecScanExtended + 164
19  postgres                            0x0000000102a4d374 ExecSeqScanWithQualProject + 248
20  postgres                            0x0000000102a13248 ExecProcNode + 68
21  postgres                            0x0000000102a12e70 fetch_input_tuple + 180
22  postgres                            0x0000000102a12bb4 agg_retrieve_direct + 1664
23  postgres                            0x0000000102a0f914 ExecAgg + 240
...
2025-06-02 00:45:11.409 EDT [57192:4] LOG:  client backend (PID 63817) was terminated by signal 6: Abort trap: 6
2025-06-02 00:45:11.409 EDT [57192:5] DETAIL:  Failed process was running: DO $x$
         DECLARE
                 r record;
                 r2 record;
                 cond text;
                 idx_ctids tid[];
                 ss_ctids tid[];
                 count int;
                 plan_ok bool;
                 plan_line text;
         BEGIN
...

Best regards,
Alexander Lakhin
Neon (https://neon.tech)
Attachment

pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: Virtual generated columns
Next
From: John Naylor
Date:
Subject: Re: Batch TIDs lookup in ambulkdelete