Thread: BUG #18266: Restore process request way too many archive log files via pgbackrest

BUG #18266: Restore process request way too many archive log files via pgbackrest

From
PG Bug reporting form
Date:
02 January 2024, 13:38:31
The following bug has been logged on the website:

Bug reference:      18266
Logged by:          upul sanj
Email address:      usanjewa0@gmail.com
PostgreSQL version: 15.0
Operating system:   Openshift
Description:

We are using Crunchydata Postgre Operator (PGO) on an openshift cluster to
run the postgres DB and pgbackrest for backup and restore (with S3 for
backup repos).

PG Version = 15

At a very high level, our requirement is to restore the DB to a specific DB
backup (either full/differential/incremental) with no expectation on PITR.
We are using the restore option in pgbackrest restore --type=immediate which
works well as it does not replay further WAL files beyond the immediate
consistent state.

However, during the restore process, it is visible that pgbackrest is
restoring way more archived log files than it actually requires for the
recovery process (this consumes more space in the PG instances). When this
has been raised in the pgbackrest support forum, they confirmed that, it is
not the pgbackrest actually deciding which archived log files to restore, it
is the Postgresql, therefore to raise this to PG.

Below is an example instance of a restore procesure executed.

The options used in the pgbackrest restore stanza for a restore done on
2023-10-23 to restore one of teh DBs to a backup taken on 2023-10-17
(20231017-120204F) is as below.

pgbackrest restore --set=20231017-120204F --type=immediate --stanza=db
--pg1-path=/pgdata/pg15 --repo=3 --delta
--link-map=pg_wal=/pgdata/pg15_wal

Below is the log extract from the restores.

The restore log state that it starts the redo at redo starts at 3/69000028
and consistent recovery state reached at 3/69000138 --> This should
corresponds to the file "000000020000000300000069"

However, it restored many additional log files from archive until the file
0000000200000005000000E5.
The file "0000000200000005000000E5" corresponds to backups from the date
2023-10-18 07:40

2023-10-23 13:29:48.448 GMT [21] LOG:  database system was interrupted; last
known up at 2023-10-17 12:02:05 GMT
2023-10-23 13:29:48.479 GMT [21] LOG:  starting point-in-time recovery to
earliest consistent point
2023-10-23 13:29:48.593 GMT [21] LOG:  restored log file "00000002.history"
from archive
2023-10-23 13:29:48.723 GMT [21] LOG:  restored log file
"000000020000000300000069" from archive
2023-10-23 13:29:48.847 GMT [21] LOG:  redo starts at 3/69000028
2023-10-23 13:29:48.930 GMT [21] LOG:  restored log file
"00000002000000030000006A" from archive
2023-10-23 13:29:49.138 GMT [21] LOG:  restored log file
"00000002000000030000006B" from archive
2023-10-23 13:29:49.320 GMT [21] LOG:  restored log file
"00000002000000030000006C" from archive
2023-10-23 13:29:49.509 GMT [21] LOG:  restored log file
"00000002000000030000006D" from archive
2023-10-23 13:29:49.695 GMT [21] LOG:  restored log file
"00000002000000030000006E" from archive
.
.
.
.
.
2023-10-23 13:32:21.819 GMT [21] LOG:  restored log file
"0000000200000005000000E4" from archive
2023-10-23 13:32:22.005 GMT [21] LOG:  restored log file
"0000000200000005000000E5" from archive
2023-10-23 13:32:22.101 GMT [21] LOG:  redo in progress, elapsed time:
153.25 s, current LSN: 3/69000060
2023-10-23 13:32:22.104 GMT [21] LOG:  consistent recovery state reached at
3/69000138
2023-10-23 13:32:22.104 GMT [21] LOG:  recovery stopping after reaching
consistency
2023-10-23 13:32:22.104 GMT [21] LOG:  pausing at the end of recovery
2023-10-23 13:32:22.104 GMT [21] HINT:  Execute pg_wal_replay_resume() to
promote.
2023-10-23 13:32:22.104 GMT [18] LOG:  database system is ready to accept
read-only connections
2023-10-23 13:32:22.146 GMT [21] LOG:  redo done at 3/69000138 system usage:
CPU: user: 0.05 s, system: 0.99 s, elapsed: 153.29 s
2023-10-23 13:32:22.229 GMT [21] LOG:  restored log file
"000000020000000300000069" from archive
2023-10-23 13:32:22.349 GMT [21] LOG:  restored log file "00000003.history"
from archive
2023-10-23 13:32:22.372 GMT [21] LOG:  restored log file "00000004.history"
from archive
2023-10-23 13:32:22.394 GMT [21] LOG:  restored log file "00000005.history"
from archive
2023-10-23 13:32:22.418 GMT [21] LOG:  restored log file "00000006.history"
from archive
2023-10-23 13:32:22.440 GMT [21] LOG:  restored log file "00000007.history"
from archive
2023-10-23 13:32:22.463 GMT [21] LOG:  restored log file "00000008.history"
from archive
2023-10-23 13:32:22.485 GMT [21] LOG:  restored log file "00000009.history"
from archive
2023-10-23 13:32:22.505 GMT [21] LOG:  restored log file "0000000A.history"
from archive
2023-10-23 13:32:22.527 GMT [21] LOG:  restored log file "0000000B.history"
from archive
2023-10-23 13:32:22.549 GMT [21] LOG:  restored log file "0000000C.history"
from archive
2023-10-23 13:32:22.573 GMT [21] LOG:  restored log file "0000000D.history"
from archive
2023-10-23 13:32:22.600 GMT [21] LOG:  restored log file "0000000E.history"
from archive
2023-10-23 13:32:22.624 GMT [21] LOG:  restored log file "0000000F.history"
from archive
2023-10-23 13:32:22.649 GMT [21] LOG:  restored log file "00000010.history"
from archive
2023-10-23 13:32:22.675 GMT [21] LOG:  selected new timeline ID: 17
2023-10-23 13:32:22.841 GMT [21] LOG:  restored log file "00000002.history"
from archive
2023-10-23 13:32:22.852 GMT [21] LOG:  archive recovery complete
2023-10-23 13:32:22.860 GMT [19] LOG:  checkpoint starting: end-of-recovery
immediate wait
2023-10-23 13:32:22.906 GMT [19] LOG:  checkpoint complete: wrote 3 buffers
(0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.013 s,
sync=0.006 s, total=0.049 s; sync files=2, longest=0.003 s, average=0.003 s;
distance=0 kB, estimate=0 kB
2023-10-23 13:32:28.433 GMT [18] LOG:  database system is ready to accept
connections

Could you please help me to understand why the process restore many
additioanl files than it requires ?