Re: Should we update the random_page_cost default value? - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Should we update the random_page_cost default value?
Date
Msg-id a3ac3e07-0150-4319-a69b-aa367ddf67a5@vondra.me
Whole thread Raw
In response to Re: Should we update the random_page_cost default value?  (Tomas Vondra <tomas@vondra.me>)
List pgsql-hackers

On 10/8/25 18:17, Tomas Vondra wrote:
> 
> 
> On 10/8/25 17:14, Ants Aasma wrote:
>> On Mon, 6 Oct 2025 at 12:24, Tomas Vondra <tomas@vondra.me> wrote:
>>> On 10/6/25 07:29, Tom Lane wrote:
>>>> Another angle is that I expect that the ongoing AIO work will largely
>>>> destroy the existing model altogether, at least if you think in terms
>>>> of the model as trying to predict query execution time.  But if what
>>>> we're trying to model is net resource demands, with an eye to
>>>> minimizing the total system load not execution time of any one query,
>>>> maybe we can continue to work with something close to what we've
>>>> traditionally done.
>>>>
>>>> No answers here, just more questions ...
>>>>
>>>
>>> I had the same thought, when working on the (index) prefetching. Which
>>> of course now relies on AIO. Without concurrency, there wasn't much
>>> difference between optimizing for resources and time, but AIO changes
>>> that. In fact, parallel query has a similar effect, because it also
>>> spreads the work to multiple concurrent processes.
>>>
>>> Parallel query simply divides the cost between workers, as if each use a
>>> fraction of resources. And the cost of the parallel plan is lower than
>>> summing up the per-worker costs. Maybe AIO should do something similar?
>>> That is, estimate the I/O concurrency and lower the cost a bit?
>>
>> OS and/or disk read-ahead is muddying the water here. Most modern
>> storage can saturate their bandwidth capability with enough concurrent
>> or large enough requests. The read-ahead is effectively increasing
>> request concurrency behind PostgreSQLs back while random is running
>> with concurrency 1. It would be very interesting to see what
>> debug_io_direct does, and also fio numbers for direct io.
>>
> 
> I think I've done some of the runs with direct I/O (e.g. the Azure runs
> were doing that), and the conclusions were mostly the same. I did a
> couple runs on the other machines, but I don't have results that I could
> present. I'll try to get some, maybe it'll be different.
> 

Here are results from with debug_io_direct (and shared_buffers=256MB in
both cases), from the single NVMe device.

                seqscan (s)      index scan (s)  random_page_cost
 -----------------------------------------------------------------
  buffered I/O          115               40404              16.6
  direct I/O            108               53053              23.4

I believe the difference is mostly due to page cache - with 182GB data
on 64GB RAM, that's about 30% cache hit ratio, give or take. And the
buffered runs are about 25% faster - not exactly 30%, but close. Also,
funnily enough, the seqscans are faster with direct I/O (so without
kernel read-ahead).

It's just one run, of course. But the results seem reasonable.

regards

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Re: Fix array access (src/bin/pg_dump/pg_dump.c)
Next
From: Nathan Bossart
Date:
Subject: Re: another autovacuum scheduling thread