Re: pg_restore scan - Mailing list pgsql-general

From Adrian Klaver
Subject Re: pg_restore scan
Date
Msg-id df4fe0e6-05e2-43c5-970f-1463f67c071a@aklaver.com
Whole thread Raw
In response to Re: pg_restore scan  (R Wahyudi <rwahyudi@gmail.com>)
Responses Re: pg_restore scan
List pgsql-general
On 9/18/25 05:58, R Wahyudi wrote:
> Hi All,
> 
> Thanks for the quick and accurate response!  I never been so happy 
> seeing IOwait on my system!

Because?

What did you find?

> 
> I might be blind as  I can't find information about 'offset' in pg_dump 
> documentation.
> Where can I find more info about this?

It is not in the user documentation.

 From the thread Ron referred to, there is an explanation here:

https://www.postgresql.org/message-id/366773.1756749256%40sss.pgh.pa.us

I believe the actual code, for the -Fc format, is in pg_backup_custom.c 
here:

https://github.com/postgres/postgres/blob/master/src/bin/pg_dump/pg_backup_custom.c#L723

Per comment at line 755:

"
  If possible, re-write the TOC in order to update the data offset 
information.  This is not essential, as pg_restore can cope in most
cases without it; but it can make pg_restore significantly faster
in some situations (especially parallel restore).  We can skip this
step if we're not dumping any data; there are no offsets to update
in that case.
"

> 
> Regards,
> Rianto
> 
> On Wed, 17 Sept 2025 at 13:48, Ron Johnson <ronljohnsonjr@gmail.com 
> <mailto:ronljohnsonjr@gmail.com>> wrote:
> 
> 
>     PG 17 has integrated zstd compression, while --format=directory lets
>     you do multi-threaded dumps.  That's much faster than a single-
>     threaded pg_dump into a multi-threaded compression program.
> 
>     (If for _Reasons_ you require a single-file backup, then tar the
>     directory of compressed files using the --remove-files option.)
> 
>     On Tue, Sep 16, 2025 at 10:50 PM R Wahyudi <rwahyudi@gmail.com
>     <mailto:rwahyudi@gmail.com>> wrote:
> 
>         Sorry for not including the full command - yes , its piping to a
>         compression command :
>           | lbzip2 -n <threadsforbzipgoeshere>--best > <filenamegoeshere>
> 
> 
>         I think we found the issue! I'll do further testing and see how
>         it goes !
> 
> 
> 
> 
> 
>         On Wed, 17 Sept 2025 at 11:02, Ron Johnson
>         <ronljohnsonjr@gmail.com <mailto:ronljohnsonjr@gmail.com>> wrote:
> 
>             So, piping or redirecting to a file?  If so, then that's the
>             problem.
> 
>             pg_dump directly to a file puts file offsets in the TOC.
> 
>             This how I do custom dumps:
>             cd $BackupDir
>             pg_dump -Fc --compress=zstd:long -v -d${db} -f ${db}.dump
>               2> ${db}.log
> 
>             On Tue, Sep 16, 2025 at 8:54 PM R Wahyudi
>             <rwahyudi@gmail.com <mailto:rwahyudi@gmail.com>> wrote:
> 
>                 pg_dump was done using the following command :
>                 pg_dump -Fc -Z 0 -h <host> -U <user> -w -d <database>
> 
>                 On Wed, 17 Sept 2025 at 08:36, Adrian Klaver
>                 <adrian.klaver@aklaver.com
>                 <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>                     On 9/16/25 15:25, R Wahyudi wrote:
>                      >
>                      > I'm trying to troubleshoot the slowness issue
>                     with pg_restore and
>                      > stumbled across a recent post about pg_restore
>                     scanning the whole file :
>                      >
>                      >  > "scanning happens in a very inefficient way,
>                     with many seek calls and
>                      > small block reads. Try strace to see them. This
>                     initial phase can take
>                      > hours in a huge dump file, before even starting
>                     any actual restoration."
>                      > see : https://www.postgresql.org/message-id/
>                     E48B611D-7D61-4575-A820- <https://
>                     www.postgresql.org/message-id/E48B611D-7D61-4575-A820->
>                      > B2C3EC2E0551%40gmx.net <http://40gmx.net>
>                     <https://www.postgresql.org/message-id/ <https://
>                     www.postgresql.org/message-id/>
>                      > E48B611D-7D61-4575-A820-B2C3EC2E0551%40gmx.net
>                     <http://40gmx.net>>
> 
>                     This was for pg_dump output that was streamed to a
>                     Borg archive and as
>                     result had no object offsets in the TOC.
> 
>                     How are you doing your pg_dump?
> 
> 
> 
>                     -- 
>                     Adrian Klaver
>                     adrian.klaver@aklaver.com
>                     <mailto:adrian.klaver@aklaver.com>
> 
> 
> 
>             -- 
>             Death to <Redacted>, and butter sauce.
>             Don't boil me, I'm still alive.
>             <Redacted> lobster!
> 
> 
> 
>     -- 
>     Death to <Redacted>, and butter sauce.
>     Don't boil me, I'm still alive.
>     <Redacted> lobster!
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



pgsql-general by date:

Previous
From: Ron Johnson
Date:
Subject: Re: Index (primary key) corrupt?
Next
From: Adrian Klaver
Date:
Subject: Re: Index (primary key) corrupt?