Thread: BUG #18266: Restore process request way too many archive log files via pgbackrest
BUG #18266: Restore process request way too many archive log files via pgbackrest
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18266 Logged by: upul sanj Email address: usanjewa0@gmail.com PostgreSQL version: 15.0 Operating system: Openshift Description: We are using Crunchydata Postgre Operator (PGO) on an openshift cluster to run the postgres DB and pgbackrest for backup and restore (with S3 for backup repos). PG Version = 15 At a very high level, our requirement is to restore the DB to a specific DB backup (either full/differential/incremental) with no expectation on PITR. We are using the restore option in pgbackrest restore --type=immediate which works well as it does not replay further WAL files beyond the immediate consistent state. However, during the restore process, it is visible that pgbackrest is restoring way more archived log files than it actually requires for the recovery process (this consumes more space in the PG instances). When this has been raised in the pgbackrest support forum, they confirmed that, it is not the pgbackrest actually deciding which archived log files to restore, it is the Postgresql, therefore to raise this to PG. Below is an example instance of a restore procesure executed. The options used in the pgbackrest restore stanza for a restore done on 2023-10-23 to restore one of teh DBs to a backup taken on 2023-10-17 (20231017-120204F) is as below. pgbackrest restore --set=20231017-120204F --type=immediate --stanza=db --pg1-path=/pgdata/pg15 --repo=3 --delta --link-map=pg_wal=/pgdata/pg15_wal Below is the log extract from the restores. The restore log state that it starts the redo at redo starts at 3/69000028 and consistent recovery state reached at 3/69000138 --> This should corresponds to the file "000000020000000300000069" However, it restored many additional log files from archive until the file 0000000200000005000000E5. The file "0000000200000005000000E5" corresponds to backups from the date 2023-10-18 07:40 2023-10-23 13:29:48.448 GMT [21] LOG: database system was interrupted; last known up at 2023-10-17 12:02:05 GMT 2023-10-23 13:29:48.479 GMT [21] LOG: starting point-in-time recovery to earliest consistent point 2023-10-23 13:29:48.593 GMT [21] LOG: restored log file "00000002.history" from archive 2023-10-23 13:29:48.723 GMT [21] LOG: restored log file "000000020000000300000069" from archive 2023-10-23 13:29:48.847 GMT [21] LOG: redo starts at 3/69000028 2023-10-23 13:29:48.930 GMT [21] LOG: restored log file "00000002000000030000006A" from archive 2023-10-23 13:29:49.138 GMT [21] LOG: restored log file "00000002000000030000006B" from archive 2023-10-23 13:29:49.320 GMT [21] LOG: restored log file "00000002000000030000006C" from archive 2023-10-23 13:29:49.509 GMT [21] LOG: restored log file "00000002000000030000006D" from archive 2023-10-23 13:29:49.695 GMT [21] LOG: restored log file "00000002000000030000006E" from archive . . . . . 2023-10-23 13:32:21.819 GMT [21] LOG: restored log file "0000000200000005000000E4" from archive 2023-10-23 13:32:22.005 GMT [21] LOG: restored log file "0000000200000005000000E5" from archive 2023-10-23 13:32:22.101 GMT [21] LOG: redo in progress, elapsed time: 153.25 s, current LSN: 3/69000060 2023-10-23 13:32:22.104 GMT [21] LOG: consistent recovery state reached at 3/69000138 2023-10-23 13:32:22.104 GMT [21] LOG: recovery stopping after reaching consistency 2023-10-23 13:32:22.104 GMT [21] LOG: pausing at the end of recovery 2023-10-23 13:32:22.104 GMT [21] HINT: Execute pg_wal_replay_resume() to promote. 2023-10-23 13:32:22.104 GMT [18] LOG: database system is ready to accept read-only connections 2023-10-23 13:32:22.146 GMT [21] LOG: redo done at 3/69000138 system usage: CPU: user: 0.05 s, system: 0.99 s, elapsed: 153.29 s 2023-10-23 13:32:22.229 GMT [21] LOG: restored log file "000000020000000300000069" from archive 2023-10-23 13:32:22.349 GMT [21] LOG: restored log file "00000003.history" from archive 2023-10-23 13:32:22.372 GMT [21] LOG: restored log file "00000004.history" from archive 2023-10-23 13:32:22.394 GMT [21] LOG: restored log file "00000005.history" from archive 2023-10-23 13:32:22.418 GMT [21] LOG: restored log file "00000006.history" from archive 2023-10-23 13:32:22.440 GMT [21] LOG: restored log file "00000007.history" from archive 2023-10-23 13:32:22.463 GMT [21] LOG: restored log file "00000008.history" from archive 2023-10-23 13:32:22.485 GMT [21] LOG: restored log file "00000009.history" from archive 2023-10-23 13:32:22.505 GMT [21] LOG: restored log file "0000000A.history" from archive 2023-10-23 13:32:22.527 GMT [21] LOG: restored log file "0000000B.history" from archive 2023-10-23 13:32:22.549 GMT [21] LOG: restored log file "0000000C.history" from archive 2023-10-23 13:32:22.573 GMT [21] LOG: restored log file "0000000D.history" from archive 2023-10-23 13:32:22.600 GMT [21] LOG: restored log file "0000000E.history" from archive 2023-10-23 13:32:22.624 GMT [21] LOG: restored log file "0000000F.history" from archive 2023-10-23 13:32:22.649 GMT [21] LOG: restored log file "00000010.history" from archive 2023-10-23 13:32:22.675 GMT [21] LOG: selected new timeline ID: 17 2023-10-23 13:32:22.841 GMT [21] LOG: restored log file "00000002.history" from archive 2023-10-23 13:32:22.852 GMT [21] LOG: archive recovery complete 2023-10-23 13:32:22.860 GMT [19] LOG: checkpoint starting: end-of-recovery immediate wait 2023-10-23 13:32:22.906 GMT [19] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.013 s, sync=0.006 s, total=0.049 s; sync files=2, longest=0.003 s, average=0.003 s; distance=0 kB, estimate=0 kB 2023-10-23 13:32:28.433 GMT [18] LOG: database system is ready to accept connections Could you please help me to understand why the process restore many additioanl files than it requires ?