Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward
Date
Msg-id aEioDuePEBRnfJYk@nathan
Whole thread Raw
In response to Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward  (Dimitrios Apostolou <jimis@gmx.net>)
Responses Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward
List pgsql-hackers
On Mon, Jun 09, 2025 at 10:09:57PM +0200, Dimitrios Apostolou wrote:
> Fix by avoiding forward seeks for jumps of less than 1MB forward.
> Do instead sequential reads.
> 
> Performance gain can be significant, depending on the size of the dump
> and the I/O subsystem. On my local NVMe drive, read speeds for that
> phase of pg_restore increased from 150MB/s to 3GB/s.

I was curious about what exactly was leading to the performance gains you
are seeing.  This page has an explanation:

    https://www.mjr19.org.uk/IT/fseek.html

I also wrote a couple of test programs to show the difference between
fseeko-ing and fread-ing through a file with various sizes.  On a Linux
machine, I see this:

     log2(n) | fseeko  | fread
    ---------+---------+-------
           1 | 109.288 | 5.528
           2 |  54.881 | 2.848
           3 |   27.65 | 1.504
           4 |  13.953 | 0.834
           5 |     7.1 |  0.49
           6 |   3.665 | 0.322
           7 |   1.944 | 0.244
           8 |   1.085 | 0.201
           9 |   0.658 | 0.185
          10 |   0.443 | 0.175
          11 |   0.253 | 0.171
          12 |   0.102 | 0.162
          13 |   0.075 |  0.13
          14 |   0.061 | 0.114
          15 |   0.054 |   0.1

So, fseeko() starts winning around 4096 bytes.  On macOS, the differences
aren't quite as dramatic, but 4096 bytes is the break-even point there,
too.  I imagine there's a buffer around that size somewhere...

This doesn't fully explain the results you are seeing, but it does seem to
validate the idea.  I'm curious if you see further improvement with even
lower thresholds (e.g., 8KB, 16KB, 32KB). 

-- 
nathan



pgsql-hackers by date:

Previous
From: "Daniel Verite"
Date:
Subject: Re: CREATE DATABASE command for non-libc providers
Next
From: Dimitrios Apostolou
Date:
Subject: Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward