Thread: checkpoint_timeout parameter & WAL archive delay, pgbackrest fails
Hi folks,
My pgbackrest backup on one of my RepoServer fails. The backup fails some times with the error WAL file cannot be archived before 60000 ms timeout.
The pgbackrest stanza check command is sometimes successful, but sometimes fails.
I don't know why PG is unable to copy WAL files from pg_wal to /data/myarchive_dir in real time. I always observed a delay of around 10 minutes for a wal file in pg_wal to appear in /data/my_archive_dir.
On investigation I'hv observed that our DB admin has put checkpoint_timeout = 10 m in the postgresql.conf file.
I think this causes the WAL archiving delay and subsequently my pgbackrest fails while trying to backup the DB to a remote RepoServer.
What the ideal value needed to be set for "checkpoint_timeout" to overcome this issue. I don't want pgbackrest backup fails due to this parameter ?. ( Is it possible to set a very minimum value for checkpoint_timeout what is the minimum value or can I put 0 ? )
archive_command = 'pgbackrest --stanza=My_Repo archive-push %p && cp %p /data/archive/%f'
From postgresql logs I am seeing this ..
HINT: check '/var/log/pgbackrest/My_Repo-archive-push-async.log' for errors.
INFO: archive-push command end: aborted with exception [082]
2025-05-02 12:15:17 IST LOG: archive command failed with exit code 82
2025-05-02 12:15:17 IST DETAIL: The failed archive command was: pgbackrest --stanza=My_Repo archive-push pg_wal/000000010000026300000002 && cp pg_wal/000000010000026300000002 /data/archive/000000010000026300000002
INFO: archive-push command begin 2.52.1: [pg_wal/000000010000026300000002] --archive-async --compress-type=zst --exec-id=2848559-384cf49c --log-level-console=info --log-level-file=debug --log-level-stderr=info --pg1-path= /var/lib/postgres/16/data --pg-version-force=16 --process-max=6 --repo1-host=10.50.12.202 --repo1-host-user=pgbackrest --spool-path=/var/spool/pgbackrest --stanza=My_Repo
top output on DB cluster:
top - 12:37:00 up 66 days, 17:24, 2 users, load average: 4.04, 4.72, 4.56
Tasks: 902 total, 4 running, 897 sleeping, 0 stopped, 1 zombie
%Cpu(s): 7.4 us, 1.7 sy, 0.0 ni, 89.9 id, 0.4 wa, 0.2 hi, 0.4 si, 0.0 st
MiB Mem : 31837.6 total, 706.1 free, 15243.0 used, 24741.0 buff/cache
MiB Swap: 8060.0 total, 6634.0 free, 1426.0 used. 16608.9 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2839363 postgre+ 20 0 8965608 7.2g 7.1g S 70.2 23.0 2:02.61 postgres
2864108 postgre+ 20 0 8967848 7.1g 7.1g S 64.9 22.8 0:30.04 postgres
2865547 postgre+ 20 0 8965432 7.1g 7.1g S 39.1 22.8 0:32.30 postgres
2865752 postgre+ 20 0 8964352 6.9g 6.9g S 16.6 22.3 0:32.94 postgres
Model name: Intel(R) Xeon(R) Gold 6430
BIOS Model name: Intel(R) Xeon(R) Gold 6430
CPU family: 6
Model: 143
Thread(s) per core: 1
Core(s) per socket: 16
These are vCPUs (16 nos) , OS RHEL 9, postgres 16
Any hints on how to make pgbackrest take backup properly are much appreciated.
Thanks,
2839363 postgre+ 20 0 8965608 7.2g 7.1g S 70.2 23.0 2:02.61 postgres
2864108 postgre+ 20 0 8967848 7.1g 7.1g S 64.9 22.8 0:30.04 postgres
2865547 postgre+ 20 0 8965432 7.1g 7.1g S 39.1 22.8 0:32.30 postgres
2865752 postgre+ 20 0 8964352 6.9g 6.9g S 16.6 22.3 0:32.94 postgres
Model name: Intel(R) Xeon(R) Gold 6430
BIOS Model name: Intel(R) Xeon(R) Gold 6430
CPU family: 6
Model: 143
Thread(s) per core: 1
Core(s) per socket: 16
These are vCPUs (16 nos) , OS RHEL 9, postgres 16
Any hints on how to make pgbackrest take backup properly are much appreciated.
Thanks,
Krishane