Re: Should io_method=worker remain the default? - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Should io_method=worker remain the default?
Date
Msg-id d2018eee32e211bdfc505862e9ae24b55cec5af0.camel@j-davis.com
Whole thread Raw
In response to Re: Should io_method=worker remain the default?  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Should io_method=worker remain the default?
List pgsql-hackers
On Mon, 2025-09-08 at 14:39 +1200, Thomas Munro wrote:
> Some raw thoughts on this topic, and how we got here:  This type of
> extreme workload, namely not doing any physical I/O, just copying the
> same data from the kernel page cache to the buffer pool over and over
> again,

Isn't that one of the major selling points of AIO? It does "real
readahead" from kernel buffers into PG buffers ahead of the time, so
that the backend doesn't have to do the memcpy and checksum
calculation. The benefit will be even larger when AIO eventually
enables effective Direct IO, so that shared_buffers can be a larger
share of system memory, and we don't need to move back-and-forth
between kernel buffers and PG buffers (and recalculate the checksum).

The only problem right now is that it doesn't (yet) work great when the
concurrency is higher because:

(a) we don't adapt well to saturated workers; and
(b) there's lock contention on the queue if there are more workers

and I believe both of those problems can be solved in 19.

>  is also the workload where io_method=worker can beat
> io_method=io_uring (and other native I/O methods I've prototyped),
> assuming io_workers is increased to a level that keeps up.

Right, when the workers are not saturated, they *increase* the
parallelism because there are more processes doing the work (unless you
run into lock contention).


> didn't matter much before the
> checksums-by-default change went in just a little ahead of basic AIO
> support.

Yeah, the trade-offs are much different when checksums are on vs off.

>
> Interesting that it shows up so clearly for Andres but not for you.

When I increase the io_worker count, then it does seem to be limited by
lock contention (at least I'm seeing evidence with -DLWLOCK_STATS). I
suppose my case is just below some threshold.

>
> BTW There are already a couple of levels of signal suppression: if
> workers are not idle then we don't set any latches, and even if we
> did, SetLatch() only sends signals when the recipient is actually
> waiting, which shouldn't happen when the pool is completely busy.

Oh, good to know.

> +    nextWakeupWorker = (nextWakeupWorker + 1) % io_workers;
>
> FWIW, I experimented extensively with wakeup distribution schemes

My patch was really just to try to test the two hypotheses; I wasn't
proposing it. But I was curious whether a simpler scheme might be just
as good, and looks like you already considered and rejected it.

>
> I would value your feedback and this type of analysis on the thread
> about automatic tuning for v19.

OK, I will continue the tuning discussion there.

Regarding $SUBJECT: it looks like others are just fine with worker mode
as the default in 18. I have added discussion links to the "no change"
entry in the Open Items list.

I think we'll probably see some of the effects (worker saturation or
lock contention) from my test case appear in real workloads, but
affected users can change to sync mode until we sort these things out
in 19.

Regards,
    Jeff Davis




pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Next
From: Melanie Plageman
Date:
Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)