Thread: PostgreSQL & latest Mac OS Sonoma, a possible bug / configuration issue
Hello, I have a MacBook Pro 16 inch with M3 Max in the base configuration (48 gig RAM, 1 terabyte HD). Operating System macOS thelatest Sonoma 14.3. I use the latest Postgres 14 installed via brew in the standard configuration. From time to time (everysecond or third week) I use to reboot the Mac and since several weeks now (the last 4 reboots at least) I lose dataof the database when rebooting and I fall back to a state of several days ahead of the reboot. This affects structureand data added. I cover this via backups and it looks that data is kept in memory rather than written to the database.Beside this Mac and Postgres run fine. Regards Arnd Baranowski
Arnd Baranowski <baranowski@oculeus.com> writes: > I have a MacBook Pro 16 inch with M3 Max in the base configuration (48 gig RAM, 1 terabyte HD). Operating System macOSthe latest Sonoma 14.3. I use the latest Postgres 14 installed via brew in the standard configuration. From time totime (every second or third week) I use to reboot the Mac and since several weeks now (the last 4 reboots at least) I losedata of the database when rebooting and I fall back to a state of several days ahead of the reboot. This affects structureand data added. I cover this via backups and it looks that data is kept in memory rather than written to the database.Beside this Mac and Postgres run fine. Hmm, what have you got the fsync and wal_sync_method GUCs set to? What was the last macOS version that was stable for you? regards, tom lane
Re: PostgreSQL & latest Mac OS Sonoma, a possible bug / configuration issue
From
Arnd Baranowski
Date:
Correction fsync is „On" and the wal_sync_method is set to „open_datasync“ > Am 05.02.2024 um 21:44 schrieb Tom Lane <tgl@sss.pgh.pa.us>: > > Arnd Baranowski <baranowski@oculeus.com> writes: >> I have a MacBook Pro 16 inch with M3 Max in the base configuration (48 gig RAM, 1 terabyte HD). Operating System macOSthe latest Sonoma 14.3. I use the latest Postgres 14 installed via brew in the standard configuration. From time totime (every second or third week) I use to reboot the Mac and since several weeks now (the last 4 reboots at least) I losedata of the database when rebooting and I fall back to a state of several days ahead of the reboot. This affects structureand data added. I cover this via backups and it looks that data is kept in memory rather than written to the database.Beside this Mac and Postgres run fine. > > Hmm, what have you got the fsync and wal_sync_method GUCs set to? > What was the last macOS version that was stable for you? > > regards, tom lane
Re: PostgreSQL & latest Mac OS Sonoma, a possible bug / configuration issue
From
Arnd Baranowski
Date:
The latest stable version seemed to be 14.1. I do not know and it might have been a coincidence. Recently I got forced toupgrade my Postgres by Brew. Postgres moved from 14.7 to 14.10. This was about the same time my problem started > Am 05.02.2024 um 21:44 schrieb Tom Lane <tgl@sss.pgh.pa.us>: > > Arnd Baranowski <baranowski@oculeus.com> writes: >> I have a MacBook Pro 16 inch with M3 Max in the base configuration (48 gig RAM, 1 terabyte HD). Operating System macOSthe latest Sonoma 14.3. I use the latest Postgres 14 installed via brew in the standard configuration. From time totime (every second or third week) I use to reboot the Mac and since several weeks now (the last 4 reboots at least) I losedata of the database when rebooting and I fall back to a state of several days ahead of the reboot. This affects structureand data added. I cover this via backups and it looks that data is kept in memory rather than written to the database.Beside this Mac and Postgres run fine. > > Hmm, what have you got the fsync and wal_sync_method GUCs set to? > What was the last macOS version that was stable for you? > > regards, tom lane
Arnd Baranowski <baranowski@oculeus.com> writes: > Correction fsync is „On" and the wal_sync_method is set to „open_datasync“ That's what they should be. I tried to reproduce this by selecting "Restart..." immediately after creating/populating a table on my own MacBook running Sonoma 14.3. After the reboot, the table was there with the expected contents. Now, this test doesn't actually prove a heck of a lot about PG's crash recovery, because I see in the postmaster log 2024-02-05 21:00:30.322 EST [1148] LOG: database system was shut down at 2024-02-05 20:58:46 EST 2024-02-05 21:00:30.327 EST [1144] LOG: database system is ready to accept connections which indicates that Postgres had time to perform a clean shutdown before the system rebooted. (That is the expected scenario for an OS reboot, assuming that the kernel delivers us SIGTERM as it's required to do by POSIX and then gives us enough time to nail the windows shut, which it's not required to do.) The facts as you've presented them indicate that (1) checkpoints weren't working, (2) we didn't get SIGTERM at system shutdown, *and* (3) WAL wasn't written out to disk as it's supposed to be. It's a bit hard to credit that so many things are broken and nobody has noticed. I'm inclined to wonder if something is wrong with your disk drive. It would be interesting to know what appears in the first few lines of your postmaster log after a data-losing restart. Also, try running with log_checkpoints = on for awhile, and see if there are log entries claiming successful checkpoint completion. A different line of thought is that maybe the corruption is happening because you have two postmasters started in the same data directory. We have interlocks that are supposed to defend against that, but it'd be a lot easier to credit that those aren't working than that all the rest of this stuff broke. regards, tom lane
Re: PostgreSQL & latest Mac OS Sonoma, a possible bug / configuration issue
From
Arnd Baranowski
Date:
Hi Tom, Thanks for the feedback and insights. I will follow your advice, observe and report if I find something which could explainthis behavior Regard Arnd > Am 06.02.2024 um 03:18 schrieb Tom Lane <tgl@sss.pgh.pa.us>: > > Arnd Baranowski <baranowski@oculeus.com> writes: >> Correction fsync is „On" and the wal_sync_method is set to „open_datasync“ > > That's what they should be. > > I tried to reproduce this by selecting "Restart..." immediately after > creating/populating a table on my own MacBook running Sonoma 14.3. > After the reboot, the table was there with the expected contents. > Now, this test doesn't actually prove a heck of a lot about PG's > crash recovery, because I see in the postmaster log > > 2024-02-05 21:00:30.322 EST [1148] LOG: database system was shut down at 2024-02-05 20:58:46 EST > 2024-02-05 21:00:30.327 EST [1144] LOG: database system is ready to accept connections > > which indicates that Postgres had time to perform a clean shutdown > before the system rebooted. (That is the expected scenario for an > OS reboot, assuming that the kernel delivers us SIGTERM as it's > required to do by POSIX and then gives us enough time to nail the > windows shut, which it's not required to do.) > > The facts as you've presented them indicate that (1) checkpoints > weren't working, (2) we didn't get SIGTERM at system shutdown, *and* > (3) WAL wasn't written out to disk as it's supposed to be. It's > a bit hard to credit that so many things are broken and nobody has > noticed. I'm inclined to wonder if something is wrong with your > disk drive. > > It would be interesting to know what appears in the first few lines > of your postmaster log after a data-losing restart. Also, try > running with log_checkpoints = on for awhile, and see if there are > log entries claiming successful checkpoint completion. > > A different line of thought is that maybe the corruption is happening > because you have two postmasters started in the same data directory. > We have interlocks that are supposed to defend against that, but it'd > be a lot easier to credit that those aren't working than that all the > rest of this stuff broke. > > regards, tom lane
Re: PostgreSQL & latest Mac OS Sonoma, a possible bug / configuration issue
From
Arnd Baranowski
Date:
Hi Tom, I completely deleted my Mac installation of Postgres and Brew. Reinstalled everything from scratch and moved to PostgreSQL16.The issue is gone. It looks like a screwed PostgreSQL14 installation caused the problem. Regards Arnd --- Hi Tom, Thanks for the feedback and insights. I will follow your advice, observe and report if I find something which could explainthis behavior Regard Arnd > Am 06.02.2024 um 03:18 schrieb Tom Lane <tgl@sss.pgh.pa.us>: > > Arnd Baranowski <baranowski@oculeus.com> writes: >> Correction fsync is „On" and the wal_sync_method is set to „open_datasync“ > > That's what they should be. > > I tried to reproduce this by selecting "Restart..." immediately after > creating/populating a table on my own MacBook running Sonoma 14.3. > After the reboot, the table was there with the expected contents. > Now, this test doesn't actually prove a heck of a lot about PG's > crash recovery, because I see in the postmaster log > > 2024-02-05 21:00:30.322 EST [1148] LOG: database system was shut down at 2024-02-05 20:58:46 EST > 2024-02-05 21:00:30.327 EST [1144] LOG: database system is ready to accept connections > > which indicates that Postgres had time to perform a clean shutdown > before the system rebooted. (That is the expected scenario for an > OS reboot, assuming that the kernel delivers us SIGTERM as it's > required to do by POSIX and then gives us enough time to nail the > windows shut, which it's not required to do.) > > The facts as you've presented them indicate that (1) checkpoints > weren't working, (2) we didn't get SIGTERM at system shutdown, *and* > (3) WAL wasn't written out to disk as it's supposed to be. It's > a bit hard to credit that so many things are broken and nobody has > noticed. I'm inclined to wonder if something is wrong with your > disk drive. > > It would be interesting to know what appears in the first few lines > of your postmaster log after a data-losing restart. Also, try > running with log_checkpoints = on for awhile, and see if there are > log entries claiming successful checkpoint completion. > > A different line of thought is that maybe the corruption is happening > because you have two postmasters started in the same data directory. > We have interlocks that are supposed to defend against that, but it'd > be a lot easier to credit that those aren't working than that all the > rest of this stuff broke. > > regards, tom lane