Thread: Add checkpoint/redo LSNs to recovery errors.
Hackers, This patch adds checkpoint/redo LSNs to recovery error messages where they may be useful for debugging. When backup_label is present the LSNs are already output in a log message, but it still seems like a good idea to repeat them. When backup_label is not present, the checkpoint LSN is not logged unless backup recovery is in progress or the checkpoint is found. In the case where a backup is restored but the backup_label is missing, the checkpoint LSN is not logged so it is useful for debugging to have it in the error message. Regards, -David
Attachment
On Thu, Feb 29, 2024 at 10:53:15AM +1300, David Steele wrote: > This patch adds checkpoint/redo LSNs to recovery error messages where they > may be useful for debugging. Thanks for following up on that! > When backup_label is not present, the checkpoint LSN is not logged unless > backup recovery is in progress or the checkpoint is found. In the case where > a backup is restored but the backup_label is missing, the checkpoint LSN is > not logged so it is useful for debugging to have it in the error message. ereport(PANIC, - (errmsg("could not locate a valid checkpoint record"))); + (errmsg("could not locate a valid checkpoint record at %X/%X", + LSN_FORMAT_ARGS(CheckPointLoc)))); } I've seen this one in the field occasionally, so that's really a welcome addition IMO. I've scanned a bit xlogrecovery.c, and I have spotted a couple of that could gain more information. ereport(PANIC, (errmsg("invalid redo record in shutdown checkpoint"))); [...] ereport(PANIC, (errmsg("invalid redo in checkpoint record"))); These two could mention CheckPointLoc and checkPoint.redo. ereport(PANIC, (errmsg("invalid next transaction ID"))); Perhaps some XID information could be added here? ereport(FATAL, (errmsg("WAL ends before consistent recovery point"))); [...] ereport(FATAL, (errmsg("WAL ends before end of online backup"), These two are in xlog.c, and don't mention backupStartPoint for the first one. Perhaps there's a point in adding some information about EndOfLog and LocalMinRecoveryPoint as well? -- Michael
Attachment
On 2/29/24 16:42, Michael Paquier wrote: > On Thu, Feb 29, 2024 at 10:53:15AM +1300, David Steele wrote: >> This patch adds checkpoint/redo LSNs to recovery error messages where they >> may be useful for debugging. > > I've scanned a bit xlogrecovery.c, and I have spotted a couple of that > could gain more information. > > ereport(PANIC, > (errmsg("invalid redo record in shutdown checkpoint"))); > [...] > ereport(PANIC, > (errmsg("invalid redo in checkpoint record"))); > These two could mention CheckPointLoc and checkPoint.redo. > > ereport(PANIC, > (errmsg("invalid next transaction ID"))); > Perhaps some XID information could be added here? > > ereport(FATAL, > (errmsg("WAL ends before consistent recovery point"))); > [...] > ereport(FATAL, > (errmsg("WAL ends before end of online backup"), > > These two are in xlog.c, and don't mention backupStartPoint for the > first one. Perhaps there's a point in adding some information about > EndOfLog and LocalMinRecoveryPoint as well? For now I'd like to just focus on these three messages that are related to a missing backup_label or a misconfiguration of restore_command when backup_label is present. No doubt there are many other recovery log messages that could be improved, but I'd rather not bog down the patch. Regards, -David
On Sun, Mar 10, 2024 at 04:58:19PM +1300, David Steele wrote: > No doubt there are many other recovery log messages that could be improved, > but I'd rather not bog down the patch. Fair argument, and these additions are useful when taken independently. I've applied your suggestions for now. -- Michael