Thread: wal seams to be corrupted
Hi admins,
The number of wal files on my postgresql server is rising, because it seams that one wal is corrupted. Postgrsql is running normaly. I see this in postgresql log file:
2024-07-19 07:44:12 CEST [2205]: [32288-1] user=,db=,app=,client= DETAIL: The failed archive command was: test ! -f /var/lib/pgsql/ArchiveDir/000000010000044E0000009D && cp pg_wal/000000010000044E0000009D /var/lib/pgsql/ArchiveDir/000000010000044E0000009D
Usualy helped if I deleted wal in ArchiveDir directory. But not this time. Wal is copied again from pg_wal to ArchiveDir directory and error message continues.
What can I do to solve this problem? Is pg_resetwal solution fort his problem? If it is, how to use it?
Best regards!
Domen Šetar |
Attachment
Hi admins,
The number of wal files on my postgresql server is rising, because it seams that one wal is corrupted. Postgrsql is running normaly. I see this in postgresql log file:
2024-07-19 07:44:12 CEST [2205]: [32288-1] user=,db=,app=,client= DETAIL: The failed archive command was: test ! -f /var/lib/pgsql/ArchiveDir/000000010000044E0000009D && cp pg_wal/000000010000044E0000009D /var/lib/pgsql/ArchiveDir/000000010000044E0000009D
Usualy helped if I deleted wal in ArchiveDir directory. But not this time. Wal is copied again from pg_wal to ArchiveDir directory and error message continues.
What can I do to solve this problem? Is pg_resetwal solution fort his problem? If it is, how to use it?
Best regards!
Domen Šetar
Computer Systems Support
IZUM – Institute of Information Science | Prešernova ulica 17 | 2000 Maribor | Slovenia
T: +386 2 25 20 339 | M: +386 41 676 342 | www.izum.si | domen.setar@izum.si
Attachment
Hi admins,
The number of wal files on my postgresql server is rising, because it seams that one wal is corrupted. Postgrsql is running normaly. I see this in postgresql log file:
2024-07-19 07:44:12 CEST [2205]: [32288-1] user=,db=,app=,client= DETAIL: The failed archive command was: test ! -f /var/lib/pgsql/ArchiveDir/000000010000044E0000009D && cp pg_wal/000000010000044E0000009D /var/lib/pgsql/ArchiveDir/000000010000044E0000009D
Usualy helped if I deleted wal in ArchiveDir directory. But not this time. Wal is copied again from pg_wal to ArchiveDir directory and error message continues.
What can I do to solve this problem? Is pg_resetwal solution fort his problem? If it is, how to use it?
Best regards!
Domen Šetar
Computer Systems Support
IZUM – Institute of Information Science | Prešernova ulica 17 | 2000 Maribor | Slovenia
T: +386 2 25 20 339 | M: +386 41 676 342 | www.izum.si | domen.setar@izum.si
Attachment
The number of wal files on my postgresql server is rising, because it seams that one wal is corrupted. Postgrsql is running normaly. I see this in postgresql log file:
2024-07-19 07:44:12 CEST [2205]: [32288-1] user=,db=,app=,client= DETAIL: The failed archive command was: test ! -f /var/lib/pgsql/ArchiveDir/
000000010000044E0000009D && cp pg_wal/ 000000010000044E0000009D /var/lib/pgsql/ArchiveDir/ 000000010000044E0000009D
Usualy helped if I deleted wal in ArchiveDir directory. But not this time. Wal is copied again from pg_wal to ArchiveDir directory and error message continues.
What can I do to solve this problem? Is pg_resetwal solution fort his problem? If it is, how to use it?
I didn't claim that wal file is corrupted. I just say that archiver fail to copy it sucessfully so I persume, that something must be wrong with wal file.
And I need to fix it becuse this problem stops wals to move from pg_wal to archive directory.
Best regards!
Domen Šetar |
From: David G. Johnston <david.g.johnston@gmail.com>
Sent: Friday, July 19, 2024 8:16 AM
To: Domen Šetar <domen.setar@izum.si>
Cc: pgsql-admin@lists.postgresql.org
Subject: Re: wal seams to be corrupted
On Thursday, July 18, 2024, Domen Šetar <domen.setar@izum.si> wrote:
The number of wal files on my postgresql server is rising, because it seams that one wal is corrupted. Postgrsql is running normaly. I see this in postgresql log file:
2024-07-19 07:44:12 CEST [2205]: [32288-1] user=,db=,app=,client= DETAIL: The failed archive command was: test ! -f /var/lib/pgsql/ArchiveDir/000000010000044E0000009D && cp pg_wal/000000010000044E0000009D /var/lib/pgsql/ArchiveDir/000000010000044E0000009D
Usualy helped if I deleted wal in ArchiveDir directory. But not this time. Wal is copied again from pg_wal to ArchiveDir directory and error message continues.
What can I do to solve this problem? Is pg_resetwal solution fort his problem? If it is, how to use it?
Without knowing why the archive command failed it is impossible to say. But archiving doesn’t impact the server producing the WAL so messing with it isn’t a useful approach. Writing a better archive command is where you should expend your efforts.
If the WAL file is corrupt, which you’ve not shown, but the server is running, doing a full checkpoint and the. physical backup that doesn’t require the problematic WAL would let you not care about it since you would not need it for recovery.
David J.
Attachment
On Thursday, July 18, 2024, Kashif Zeeshan <kashi.zeeshan@gmail.com> wrote:
Hi DomenOn Fri, Jul 19, 2024 at 10:57 AM Domen Šetar <domen.setar@izum.si> wrote:Hi admins,
The number of wal files on my postgresql server is rising, because it seams that one wal is corrupted. Postgrsql is running normaly. I see this in postgresql log file:
2024-07-19 07:44:12 CEST [2205]: [32288-1] user=,db=,app=,client= DETAIL: The failed archive command was: test ! -f /var/lib/pgsql/ArchiveDir/
000000010000044E0000009D && cp pg_wal/ 000000010000044E0000009D /var/lib/pgsql/ArchiveDir/ 000000010000044E0000009D
Usualy helped if I deleted wal in ArchiveDir directory. But not this time. Wal is copied again from pg_wal to ArchiveDir directory and error message continues.
What can I do to solve this problem? Is pg_resetwal solution fort his problem? If it is, how to use it?
Yes you should use the pg_resetwal
Thanks for the answer.
What about to stop postgresql server which is primary in replication and promote another server?
Best regards!
Domen Šetar |
From: Kashif Zeeshan <kashi.zeeshan@gmail.com>
Sent: Friday, July 19, 2024 8:10 AM
To: Domen Šetar <domen.setar@izum.si>
Cc: pgsql-admin@lists.postgresql.org
Subject: Re: wal seams to be corrupted
Hi Domen
On Fri, Jul 19, 2024 at 10:57 AM Domen Šetar <domen.setar@izum.si> wrote:
Hi admins,
The number of wal files on my postgresql server is rising, because it seams that one wal is corrupted. Postgrsql is running normaly. I see this in postgresql log file:
2024-07-19 07:44:12 CEST [2205]: [32288-1] user=,db=,app=,client= DETAIL: The failed archive command was: test ! -f /var/lib/pgsql/ArchiveDir/000000010000044E0000009D && cp pg_wal/000000010000044E0000009D /var/lib/pgsql/ArchiveDir/000000010000044E0000009D
Usualy helped if I deleted wal in ArchiveDir directory. But not this time. Wal is copied again from pg_wal to ArchiveDir directory and error message continues.
What can I do to solve this problem? Is pg_resetwal solution fort his problem? If it is, how to use it?
Yes you should use the pg_resetwal what it does is clears the write-ahead log (WAL) and optionally resets some other control information stored in the pg_control file. This function is sometimes needed if these files have become corrupted. It should be used only as a last resort, when the server will not start due to such corruption.
You can find the help from the following link
Regards
Kashif Zeeshan
Best regards!
Domen Šetar
Computer Systems Support
IZUM – Institute of Information Science | Prešernova ulica 17 | 2000 Maribor | Slovenia
T: +386 2 25 20 339 | M: +386 41 676 342 | www.izum.si | domen.setar@izum.si
Attachment
Hi,
I think, that possible the best solution will be to stop postgresql on problem server (which is replication master), promote secondary, replicate data from promoted secondary back to problem server in make it replication master again. That way I'll get rid of problematic wal file.
Best regards!
Domen Šetar |
From: Domen Šetar
Sent: Friday, July 19, 2024 7:58 AM
To: pgsql-admin@lists.postgresql.org
Subject: wal seams to be corrupted
Hi admins,
The number of wal files on my postgresql server is rising, because it seams that one wal is corrupted. Postgrsql is running normaly. I see this in postgresql log file:
2024-07-19 07:44:12 CEST [2205]: [32288-1] user=,db=,app=,client= DETAIL: The failed archive command was: test ! -f /var/lib/pgsql/ArchiveDir/000000010000044E0000009D && cp pg_wal/000000010000044E0000009D /var/lib/pgsql/ArchiveDir/000000010000044E0000009D
Usualy helped if I deleted wal in ArchiveDir directory. But not this time. Wal is copied again from pg_wal to ArchiveDir directory and error message continues.
What can I do to solve this problem? Is pg_resetwal solution fort his problem? If it is, how to use it?
Best regards!
Domen Šetar |
Attachment
Hi,
I think, that possible the best solution will be to stop postgresql on problem server (which is replication master), promote secondary, replicate data from promoted secondary back to problem server in make it replication master again. That way I'll get rid of problematic wal file.
Best regards!
Domen Šetar
Computer Systems Support
IZUM – Institute of Information Science | Prešernova ulica 17 | 2000 Maribor | Slovenia
T: +386 2 25 20 339 | M: +386 41 676 342 | www.izum.si | domen.setar@izum.si
From: Domen Šetar
Sent: Friday, July 19, 2024 7:58 AM
To: pgsql-admin@lists.postgresql.org
Subject: wal seams to be corrupted
Hi admins,
The number of wal files on my postgresql server is rising, because it seams that one wal is corrupted. Postgrsql is running normaly. I see this in postgresql log file:
2024-07-19 07:44:12 CEST [2205]: [32288-1] user=,db=,app=,client= DETAIL: The failed archive command was: test ! -f /var/lib/pgsql/ArchiveDir/000000010000044E0000009D && cp pg_wal/000000010000044E0000009D /var/lib/pgsql/ArchiveDir/000000010000044E0000009D
Usualy helped if I deleted wal in ArchiveDir directory. But not this time. Wal is copied again from pg_wal to ArchiveDir directory and error message continues.
What can I do to solve this problem? Is pg_resetwal solution fort his problem? If it is, how to use it?
Best regards!
Domen Šetar
Computer Systems Support
IZUM – Institute of Information Science | Prešernova ulica 17 | 2000 Maribor | Slovenia
T: +386 2 25 20 339 | M: +386 41 676 342 | www.izum.si | domen.setar@izum.si
Attachment
Thank you Kashif.
I’ll try to find the cause of the problem. If I fail, I’ll do it with replica.
Best regards!
Domen Šetar |
From: Kashif Zeeshan <kashi.zeeshan@gmail.com>
Sent: Friday, July 19, 2024 8:42 AM
To: Domen Šetar <domen.setar@izum.si>
Cc: pgsql-admin@lists.postgresql.org
Subject: Re: wal seams to be corrupted
Hi
On Fri, Jul 19, 2024 at 11:37 AM Domen Šetar <domen.setar@izum.si> wrote:
Hi,
I think, that possible the best solution will be to stop postgresql on problem server (which is replication master), promote secondary, replicate data from promoted secondary back to problem server in make it replication master again. That way I'll get rid of problematic wal file.
This is the standard way and it will require a lot of time on your end and the down time as well, i think it's better to find the cause of the failure first and its possible that you can fix the issue in less time and effort but the solution you suggested is the safest way though.
Best regards!
Domen Šetar
Computer Systems Support
IZUM – Institute of Information Science | Prešernova ulica 17 | 2000 Maribor | Slovenia
T: +386 2 25 20 339 | M: +386 41 676 342 | www.izum.si | domen.setar@izum.si
From: Domen Šetar
Sent: Friday, July 19, 2024 7:58 AM
To: pgsql-admin@lists.postgresql.org
Subject: wal seams to be corrupted
Hi admins,
The number of wal files on my postgresql server is rising, because it seams that one wal is corrupted. Postgrsql is running normaly. I see this in postgresql log file:
2024-07-19 07:44:12 CEST [2205]: [32288-1] user=,db=,app=,client= DETAIL: The failed archive command was: test ! -f /var/lib/pgsql/ArchiveDir/000000010000044E0000009D && cp pg_wal/000000010000044E0000009D /var/lib/pgsql/ArchiveDir/000000010000044E0000009D
Usualy helped if I deleted wal in ArchiveDir directory. But not this time. Wal is copied again from pg_wal to ArchiveDir directory and error message continues.
What can I do to solve this problem? Is pg_resetwal solution fort his problem? If it is, how to use it?
Best regards!
Domen Šetar
Computer Systems Support
IZUM – Institute of Information Science | Prešernova ulica 17 | 2000 Maribor | Slovenia
T: +386 2 25 20 339 | M: +386 41 676 342 | www.izum.si | domen.setar@izum.si
Attachment
On Fri, 2024-07-19 at 05:57 +0000, Domen Šetar wrote: > The number of wal files on my postgresql server is rising, because it seams > that one wal is corrupted. Postgrsql is running normaly. I see this in postgresql > log file: > > 2024-07-19 07:44:12 CEST [2205]: [32288-1] user=,db=,app=,client= DETAIL: The failed archive command was: test ! -f /var/lib/pgsql/ArchiveDir/000000010000044E0000009D&& cp pg_wal/000000010000044E0000009D /var/lib/pgsql/ArchiveDir/000000010000044E0000009D > > Usualy helped if I deleted wal in ArchiveDir directory. But not this time. Wal is > copied again from pg_wal to ArchiveDir directory and error message continues. > What can I do to solve this problem? Is pg_resetwal solution fort his problem? If it is, how to use it? Don't listen to any advice to run "pg_resetwal". Only consider switching to the standby if your primary crashes because the disk is full. You need to determine the cause of the problem. 1. All error messages from "archive_command" end up in the log file. Search for those, they may help you determine the cause. 2. Is there a file /var/lib/pgsql/ArchiveDir/000000010000044E0000009D ? If yes, delete it, and the problem should be solved. 3. If there is no such file, it must be the "cp" command that is failing. In that case, you should definitely see an error message about that in the log file. Likely causes: - the permissions are not right (try by running the "cp" command as user "postgres" manually) - the target directory does not exist - the target directory is full Yours, Laurenz Albe
Thank you admins for helping me.
The problem was stupid and I'm a little bit ashamed.
Archive disk was full and I didn't notice it.
I made some space on it and everything is ok know.
Best regards!
Domen Šetar |
From: Domen Šetar
Sent: Friday, July 19, 2024 7:58 AM
To: pgsql-admin@lists.postgresql.org
Subject: wal seams to be corrupted
Hi admins,
The number of wal files on my postgresql server is rising, because it seams that one wal is corrupted. Postgrsql is running normaly. I see this in postgresql log file:
2024-07-19 07:44:12 CEST [2205]: [32288-1] user=,db=,app=,client= DETAIL: The failed archive command was: test ! -f /var/lib/pgsql/ArchiveDir/000000010000044E0000009D && cp pg_wal/000000010000044E0000009D /var/lib/pgsql/ArchiveDir/000000010000044E0000009D
Usualy helped if I deleted wal in ArchiveDir directory. But not this time. Wal is copied again from pg_wal to ArchiveDir directory and error message continues.
What can I do to solve this problem? Is pg_resetwal solution fort his problem? If it is, how to use it?
Best regards!
Domen Šetar |
Attachment
Thank you admins for helping me.
The problem was stupid and I'm a little bit ashamed.
Archive disk was full and I didn't notice it.
I made some space on it and everything is ok know.
Best regards!
Domen Šetar
Computer Systems Support
IZUM – Institute of Information Science | Prešernova ulica 17 | 2000 Maribor | Slovenia
T: +386 2 25 20 339 | M: +386 41 676 342 | www.izum.si | domen.setar@izum.si
From: Domen Šetar
Sent: Friday, July 19, 2024 7:58 AM
To: pgsql-admin@lists.postgresql.org
Subject: wal seams to be corrupted
Hi admins,
The number of wal files on my postgresql server is rising, because it seams that one wal is corrupted. Postgrsql is running normaly. I see this in postgresql log file:
2024-07-19 07:44:12 CEST [2205]: [32288-1] user=,db=,app=,client= DETAIL: The failed archive command was: test ! -f /var/lib/pgsql/ArchiveDir/000000010000044E0000009D && cp pg_wal/000000010000044E0000009D /var/lib/pgsql/ArchiveDir/000000010000044E0000009D
Usualy helped if I deleted wal in ArchiveDir directory. But not this time. Wal is copied again from pg_wal to ArchiveDir directory and error message continues.
What can I do to solve this problem? Is pg_resetwal solution fort his problem? If it is, how to use it?
Best regards!
Domen Šetar
Computer Systems Support
IZUM – Institute of Information Science | Prešernova ulica 17 | 2000 Maribor | Slovenia
T: +386 2 25 20 339 | M: +386 41 676 342 | www.izum.si | domen.setar@izum.si
Attachment
Hi,
glad you found the problem. Had it in mind but didn’t dare to say it 😉
Maybe you want to install a monitor software like check_mk or Icinga. We use both and uptime_kuma to get noticed if services trend to fail or have failed.
BTW: check_mk and the PostgreSQL is also a very nice tool.
Best regards,
Anton
Von: Domen Šetar <domen.setar@izum.si>
Gesendet: Freitag, 19. Juli 2024 09:08
An: pgsql-admin@lists.postgresql.org
Betreff: RE: wal seams to be corrupted
Thank you admins for helping me.
The problem was stupid and I'm a little bit ashamed.
Archive disk was full and I didn't notice it.
I made some space on it and everything is ok know.
Best regards!
Domen Šetar |
From: Domen Šetar
Sent: Friday, July 19, 2024 7:58 AM
To: pgsql-admin@lists.postgresql.org
Subject: wal seams to be corrupted
Hi admins,
The number of wal files on my postgresql server is rising, because it seams that one wal is corrupted. Postgrsql is running normaly. I see this in postgresql log file:
2024-07-19 07:44:12 CEST [2205]: [32288-1] user=,db=,app=,client= DETAIL: The failed archive command was: test ! -f /var/lib/pgsql/ArchiveDir/000000010000044E0000009D && cp pg_wal/000000010000044E0000009D /var/lib/pgsql/ArchiveDir/000000010000044E0000009D
Usualy helped if I deleted wal in ArchiveDir directory. But not this time. Wal is copied again from pg_wal to ArchiveDir directory and error message continues.
What can I do to solve this problem? Is pg_resetwal solution fort his problem? If it is, how to use it?
Best regards!
Domen Šetar |
Attachment
Yes. Sometimes we don't see obvious.
I noticed now that I don't have disk checks fort his host on Icinga and I'm adding some now. 😉
Best regards!
Domen Šetar |
From: Dischner, Anton <Anton.Dischner@med.uni-muenchen.de>
Sent: Friday, July 19, 2024 9:16 AM
To: Domen Šetar <domen.setar@izum.si>
Cc: pgsql-admin@lists.postgresql.org
Subject: AW: wal seams to be corrupted
Hi,
glad you found the problem. Had it in mind but didn’t dare to say it 😉
Maybe you want to install a monitor software like check_mk or Icinga. We use both and uptime_kuma to get noticed if services trend to fail or have failed.
BTW: check_mk and the PostgreSQL is also a very nice tool.
Best regards,
Anton
Von: Domen Šetar <domen.setar@izum.si>
Gesendet: Freitag, 19. Juli 2024 09:08
An: pgsql-admin@lists.postgresql.org
Betreff: RE: wal seams to be corrupted
Thank you admins for helping me.
The problem was stupid and I'm a little bit ashamed.
Archive disk was full and I didn't notice it.
I made some space on it and everything is ok know.
Best regards!
Domen Šetar |
From: Domen Šetar
Sent: Friday, July 19, 2024 7:58 AM
To: pgsql-admin@lists.postgresql.org
Subject: wal seams to be corrupted
Hi admins,
The number of wal files on my postgresql server is rising, because it seams that one wal is corrupted. Postgrsql is running normaly. I see this in postgresql log file:
2024-07-19 07:44:12 CEST [2205]: [32288-1] user=,db=,app=,client= DETAIL: The failed archive command was: test ! -f /var/lib/pgsql/ArchiveDir/000000010000044E0000009D && cp pg_wal/000000010000044E0000009D /var/lib/pgsql/ArchiveDir/000000010000044E0000009D
Usualy helped if I deleted wal in ArchiveDir directory. But not this time. Wal is copied again from pg_wal to ArchiveDir directory and error message continues.
What can I do to solve this problem? Is pg_resetwal solution fort his problem? If it is, how to use it?
Best regards!
Domen Šetar |
Attachment
Hi admins,
The number of wal files on my postgresql server is rising, because it seams that one wal is corrupted. Postgrsql is running normaly. I see this in postgresql log file:
2024-07-19 07:44:12 CEST [2205]: [32288-1] user=,db=,app=,client= DETAIL: The failed archive command was: test ! -f /var/lib/pgsql/ArchiveDir/000000010000044E0000009D && cp pg_wal/000000010000044E0000009D /var/lib/pgsql/ArchiveDir/000000010000044E0000009D
Usualy helped if I deleted wal in ArchiveDir directory. But not this time. Wal is copied again from pg_wal to ArchiveDir directory and error message continues.
What can I do to solve this problem? Is pg_resetwal solution fort his problem? If it is, how to use it?
Best regards!
Domen Šetar
Computer Systems Support
IZUM – Institute of Information Science | Prešernova ulica 17 | 2000 Maribor | Slovenia
T: +386 2 25 20 339 | M: +386 41 676 342 | www.izum.si | domen.setar@izum.si