Thread: Missing pg_control crashes postmaster
Hey Hackers, If a postmaster is running and the pg_control file is removed postgres will PANIC. Steps to recreate: 1.) start a new cluster 2.) rm $DATADIR/pg_control 3.) psql => CHECKPOINT; PANIC: could not open control file "global/pg_control": No such file or directory After the PANIC there is no pg_control. Recovery would be difficult without a replica or a backup. Instead of crashing we can just write a new pg_control file since all the data is in memory at the time. There does not really seem to be a need for this behavior as all the information postgres needs is in memory at this point. I propose with a patch to just recreate pg_control on updates if it does not exist. -- Brian Faherty
Attachment
There does not really seem to be a need for this behavior as all the
information postgres needs is in memory at this point. I propose with
a patch to just recreate pg_control on updates if it does not exist.
Or at minimum create said file with a different name in PGDATA so an admin can rename it should they wish to accept the in memory version as being a valid replacement for whatever ended up happening to the original.
Even if it can be safely rebuilt having pg_control removed out from under a running server seems like something that shouldn't happen and the server is in its rights to panic if it does.
David J.
On July 23, 2018 12:31:13 PM PDT, Brian Faherty <anothergenericuser@gmail.com> wrote: >Hey Hackers, > >If a postmaster is running and the pg_control file is removed postgres >will PANIC. > >Steps to recreate: > >1.) start a new cluster >2.) rm $DATADIR/pg_control >3.) psql => CHECKPOINT; > >PANIC: could not open control file "global/pg_control": No such file >or directory > >After the PANIC there is no pg_control. Recovery would be difficult >without a replica or a backup. Instead of crashing we can just write a >new pg_control file since all the data is in memory at the time. > >There does not really seem to be a need for this behavior as all the >information postgres needs is in memory at this point. I propose with >a patch to just recreate pg_control on updates if it does not exist. What's the issue this would solve? Given that there's moments, until the control file is rewritten, where you would be toasteither way, I don't buy this gives much added safety. Nor have you explained which realistic scenarios lead to the filemissing, without much broader problems being present. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Brian Faherty <anothergenericuser@gmail.com> writes: > If a postmaster is running and the pg_control file is removed postgres > will PANIC. That's very intentional. Don't do it. > There does not really seem to be a need for this behavior as all the > information postgres needs is in memory at this point. I propose with > a patch to just recreate pg_control on updates if it does not exist. I would vote to reject any such patch; it's too likely to cause more problems than it solves. Generally, if critical files like that one have disappeared, trying to write new data isn't going to be enough to fix it and could well result in more corruption. As an example, imagine that you do "rm -rf $PGDATA; initdb" without remembering to shut down the old postmaster first. Currently, the old postmaster will panic/quit fairly promptly and no harm done. The more aggressive it is at trying to "recover" from the situation, the more likely it is to corrupt the new installation. (Note that you would have to break a few other things in order to make this particular scenario actually hazardous. My point is just that there *are* reasons not to try to recover automatically.) regards, tom lane
On Mon, Jul 23, 2018 at 07:00:30PM -0400, Tom Lane wrote: > I would vote to reject any such patch; it's too likely to cause more > problems than it solves. Generally, if critical files like that one > have disappeared, trying to write new data isn't going to be enough > to fix it and could well result in more corruption. +1. -- Michael
Attachment
On 24 July 2018 at 03:31, Brian Faherty <anothergenericuser@gmail.com> wrote:
Hey Hackers,
If a postmaster is running and the pg_control file is removed postgres
will PANIC.
How does that happen?
"Don't go deleting stuff in pgdata" is pretty fundamental.
On 7/23/18 7:00 PM, Tom Lane wrote: > Brian Faherty <anothergenericuser@gmail.com> writes: > >> There does not really seem to be a need for this behavior as all the >> information postgres needs is in memory at this point. I propose with >> a patch to just recreate pg_control on updates if it does not exist. > > I would vote to reject any such patch; it's too likely to cause more > problems than it solves. Generally, if critical files like that one > have disappeared, trying to write new data isn't going to be enough > to fix it and could well result in more corruption. > > As an example, imagine that you do "rm -rf $PGDATA; initdb" without > remembering to shut down the old postmaster first. Currently, the > old postmaster will panic/quit fairly promptly and no harm done. > The more aggressive it is at trying to "recover" from the situation, > the more likely it is to corrupt the new installation. It seems much more likely that a missing/modified postmaster.pid will cause postgres to panic than it is for a missing pg_control to do so. Older versions of postgres don't panic until the next checkpoint and newer versions won't panic at all on an idle system since we fixed redundant checkpoints in 9.6 (6ef2eba3). An idle postgres 11 cluster seems happy enough to run without a pg_control file indefinitely (or at least 10 minutes, which is past the default checkpoint time). As soon as I write data or perform a checkpoint it does panic, of course. Conversely, removing/modifying postmaster.pid causes postgres to panic very quickly on the versions I tested, 9.4 and 11. It seems to me that doing the postmaster.pid test at checkpoint time (if we don't already) would be enough to protect pg_control against unintentionally replaced clusters. Or perhaps writing to an alternate file as David J suggests would do the trick. It seems like an easy win if we can find a safe way to do it, though I admit that this is only a benefit in corner cases. Regards, -- -David david@pgmasters.net
On July 25, 2018 7:18:30 AM PDT, David Steele <david@pgmasters.net> wrote: >On 7/23/18 7:00 PM, Tom Lane wrote: >> Brian Faherty <anothergenericuser@gmail.com> writes: > > >>> There does not really seem to be a need for this behavior as all the >>> information postgres needs is in memory at this point. I propose >with >>> a patch to just recreate pg_control on updates if it does not exist. >> >> I would vote to reject any such patch; it's too likely to cause more >> problems than it solves. Generally, if critical files like that one >> have disappeared, trying to write new data isn't going to be enough >> to fix it and could well result in more corruption. >> >> As an example, imagine that you do "rm -rf $PGDATA; initdb" without >> remembering to shut down the old postmaster first. Currently, the >> old postmaster will panic/quit fairly promptly and no harm done. >> The more aggressive it is at trying to "recover" from the situation, >> the more likely it is to corrupt the new installation. > >It seems much more likely that a missing/modified postmaster.pid will >cause postgres to panic than it is for a missing pg_control to do so. > >Older versions of postgres don't panic until the next checkpoint and >newer versions won't panic at all on an idle system since we fixed >redundant checkpoints in 9.6 (6ef2eba3). An idle postgres 11 cluster >seems happy enough to run without a pg_control file indefinitely (or at > >least 10 minutes, which is past the default checkpoint time). As soon >as I write data or perform a checkpoint it does panic, of course. > >Conversely, removing/modifying postmaster.pid causes postgres to panic >very quickly on the versions I tested, 9.4 and 11. > >It seems to me that doing the postmaster.pid test at checkpoint time >(if >we don't already) would be enough to protect pg_control against >unintentionally replaced clusters. > >Or perhaps writing to an alternate file as David J suggests would do >the >trick. > >It seems like an easy win if we can find a safe way to do it, though I >admit that this is only a benefit in corner cases. What would we win here? Which scenario that's not contrived would be less bad due to the proposed change. This seems complexityfor it's own sake. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On 7/25/18 10:37 AM, Andres Freund wrote: > On July 25, 2018 7:18:30 AM PDT, David Steele <david@pgmasters.net> wrote: >> >> It seems like an easy win if we can find a safe way to do it, though I >> admit that this is only a benefit in corner cases. > > What would we win here? Which scenario that's not contrived would be less bad due to the proposed change. This seems complexityfor it's own sake. I think it's worth preserving pg_control even in the case where there is other damage to the cluster. The alternative in this case (if no backup exists) is to run pg_resetwal which means data since the last checkpoint will not be written out causing even more data loss. I have run clusters with checkpoint_timeout = 60m so data loss in this case is a real concern. I favor the contrived scenario that helps preserve the current cluster instead of a hypothetical newly init'd one. I also don't think that users deleting files out of a cluster is all that contrived. Adding O_CREATE to open() doesn't seem too complex to me. I'm not really in favor of the renaming idea, but I'm not against it either if it gets me a copy of the pg_control file. Regards, -- -David david@pgmasters.net
Hi, On 2018-07-25 10:52:08 -0400, David Steele wrote: > On 7/25/18 10:37 AM, Andres Freund wrote: > > On July 25, 2018 7:18:30 AM PDT, David Steele <david@pgmasters.net> wrote: > > > > > > It seems like an easy win if we can find a safe way to do it, though I > > > admit that this is only a benefit in corner cases. > > > > What would we win here? Which scenario that's not contrived would be less bad due to the proposed change. This seemscomplexity for it's own sake. > > I think it's worth preserving pg_control even in the case where there is > other damage to the cluster. The alternative in this case (if no backup > exists) is to run pg_resetwal which means data since the last checkpoint > will not be written out causing even more data loss. I have run clusters > with checkpoint_timeout = 60m so data loss in this case is a real concern. Wait, what? How is "data loss in this case is a real concern." - no even a remotely realistic scenario has been described where this matters so far. > I favor the contrived scenario that helps preserve the current cluster > instead of a hypothetical newly init'd one. I also don't think that users > deleting files out of a cluster is all that contrived. But trying to limp on in that case, and that being helpful, is. > Adding O_CREATE to open() doesn't seem too complex to me. I'm not really in > favor of the renaming idea, but I'm not against it either if it gets me a > copy of the pg_control file. The problem is that that'll just hide the issue for a bit longer, while continuing (due to the O_CREAT we'll not PANIC anymore). Which can lead to a lot of followup issues, like checkpoints removing old WAL that'd have been useful for data recovery. Greetings, Andres Freund
On 7/25/18 11:09 AM, Andres Freund wrote: > On 2018-07-25 10:52:08 -0400, David Steele wrote: > >> I favor the contrived scenario that helps preserve the current cluster >> instead of a hypothetical newly init'd one. I also don't think that users >> deleting files out of a cluster is all that contrived. > > But trying to limp on in that case, and that being helpful, is. OK, I can't argue with that. It would be wrong to continue operating without knowing what the damage is. >> Adding O_CREATE to open() doesn't seem too complex to me. I'm not really in >> favor of the renaming idea, but I'm not against it either if it gets me a >> copy of the pg_control file. > > The problem is that that'll just hide the issue for a bit longer, while > continuing (due to the O_CREAT we'll not PANIC anymore). Which can lead > to a lot of followup issues, like checkpoints removing old WAL that'd > have been useful for data recovery. So if a panic is the best thing to do, it might still be good to write out a copy of pg_control to another file and let the user know that it's there. More information seems better than less to me. Regards, -- -David david@pgmasters.net
David Steele <david@pgmasters.net> writes: > On 7/25/18 10:37 AM, Andres Freund wrote: >> What would we win here? Which scenario that's not contrived would be less bad due to the proposed change. This seemscomplexity for it's own sake. > I favor the contrived scenario that helps preserve the current cluster > instead of a hypothetical newly init'd one. I also don't think that > users deleting files out of a cluster is all that contrived. What's not contrived about it? Particularly the case of *only* deleting pg_control and not any other critical file? I don't recall having heard any such stories in the last twenty years. Also, even if we added the complexity needed to write data into say pg_control.bak, what then? You think that that would be any less prone to indiscriminate rm'ing? Where and how would we document this, and what's the odds that someone dumb enough to remove pg_control would ever have read that part of the documentation? I'm with Andres: this is a solution in search of a problem. regards, tom lane
Hi, On 2018-07-25 11:19:49 -0400, David Steele wrote: > On 7/25/18 11:09 AM, Andres Freund wrote: > > On 2018-07-25 10:52:08 -0400, David Steele wrote: > > The problem is that that'll just hide the issue for a bit longer, while > > continuing (due to the O_CREAT we'll not PANIC anymore). Which can lead > > to a lot of followup issues, like checkpoints removing old WAL that'd > > have been useful for data recovery. > > So if a panic is the best thing to do, it might still be good to write out a > copy of pg_control to another file and let the user know that it's there. > More information seems better than less to me. Sure, if that were the only concern. But code complexity and testability also is. This means we need to maintain code for an edge case that nobody has come up with a reason for. And at least during development, we need to come up with test cases. Or we continually run the test case as part of the regression tests for something without any sort of practical relevance. If one wanted to improve recoverability in scenarios like this, there'd be actually useful things like adding the option to extract control files, FPIs, clog contents from the WAL with pg_waldump. Greetings, Andres Freund
David Steele <david@pgmasters.net> writes: > On 7/25/18 11:09 AM, Andres Freund wrote: >> The problem is that that'll just hide the issue for a bit longer, while >> continuing (due to the O_CREAT we'll not PANIC anymore). Which can lead >> to a lot of followup issues, like checkpoints removing old WAL that'd >> have been useful for data recovery. > So if a panic is the best thing to do, it might still be good to write > out a copy of pg_control to another file and let the user know that it's > there. More information seems better than less to me. I'm still dubious that this is fixing any real-world problem that is more pressing than the problems it would create. If you're asked to resuscitate a dead cluster, do you trust pg_control.bak if you find it? Maybe it's horribly out of date (consider likelihood that someone removed pg_control more than once, having got away with that the first time). If there's both that and pg_control, which do you trust? regards, tom lane
On 7/25/18 11:26 AM, Andres Freund wrote: > On 2018-07-25 11:19:49 -0400, David Steele wrote: > > If one wanted to improve recoverability in scenarios like this, there'd > be actually useful things like adding the option to extract control > files, FPIs, clog contents from the WAL with pg_waldump. I think you'd need to be a bit careful about what to trust in the WAL but presumably the last checkpoint LSN would be safe enough. At the end of the day good backups are the best protection against an issue like this. Regards, -- -David david@pgmasters.net
On 7/25/18 11:33 AM, Tom Lane wrote: > David Steele <david@pgmasters.net> writes: >> On 7/25/18 11:09 AM, Andres Freund wrote: >>> The problem is that that'll just hide the issue for a bit longer, while >>> continuing (due to the O_CREAT we'll not PANIC anymore). Which can lead >>> to a lot of followup issues, like checkpoints removing old WAL that'd >>> have been useful for data recovery. > >> So if a panic is the best thing to do, it might still be good to write >> out a copy of pg_control to another file and let the user know that it's >> there. More information seems better than less to me. > > I'm still dubious that this is fixing any real-world problem that is > more pressing than the problems it would create. If you're asked to > resuscitate a dead cluster, do you trust pg_control.bak if you find > it? Maybe it's horribly out of date (consider likelihood that someone > removed pg_control more than once, having got away with that the first > time). If there's both that and pg_control, which do you trust? It would need to be a manual operation. I don't think automating it would be a good idea for the reasons that Andres has enumerated. Perhaps making pg_resetwal a bit smarter in these scenarios would be the way to go. It's already the tool of last resort so this kind of manipulation might be a better fit there. Regards, -- -David david@pgmasters.net