Thread: PSA: Systemd will kill PostgreSQL
Hackers, This just came across my twitter feed: https://lists.freedesktop.org/archives/systemd-devel/2014-April/018373.html tl;dr; Systemd 212 defaults to remove all IPC (including SYSV memory) when a user "fully" logs out. JD
On 10/07/2016 19:56, Joshua D. Drake wrote: > Hackers, > > This just came across my twitter feed: > > https://lists.freedesktop.org/archives/systemd-devel/2014-April/018373.html > > tl;dr; Systemd 212 defaults to remove all IPC (including SYSV memory) > when a user "fully" logs out. > AFAIK it's only the case if the user is not a system user, and postgres user should be (at least with community packages). See https://github.com/systemd/systemd/issues/2039 -- Julien Rouhaud http://dalibo.com - http://dalibo.org
On 11 July 2016 at 01:56, Joshua D. Drake <linuxhiker@gmail.com> wrote:
Hackers,
This just came across my twitter feed:
https://lists.freedesktop.org/archives/systemd-devel/2014-April/018373.html
tl;dr; Systemd 212 defaults to remove all IPC (including SYSV memory) when a user "fully" logs out.
The underlying change sounds like a fix, not a problem. It ensures that when a user logs out, various dangling processes are cleaned up. Given the amount of work PostgreSQL has to do to try to make sure it's really gone, having systemd be able to just clobber everything is pretty nice. So long as there's control over it.
However, it will break existing deployments that use "non-system" users to run PostgreSQL. I had a look and didn't find any useful definition of what systemd considers a "system user". Perhaps by uid threshold in login.defs? But then what happens for people who're managing users via a directory, who need to avoid conflicting with host-local UIDs, but also need some of those users to have systemd "system user" like behaviour?
It's also not clear if there's any API apps can use to exempt themselves from this, or any wrapper command to spawn processes that aren't clobbered. With appropriate user privileges to permit it, at least.
I've asked for clarification on the bug, so I'd better don my fire-proof suit.
--On 11. Juli 2016 13:25:51 +0800 Craig Ringer <craig@2ndquadrant.com> wrote: > Perhaps by uid threshold in login.defs? systemd's configure.ac has this: AC_ARG_WITH(system-uid-max, AS_HELP_STRING([--with-system-uid-max=UID] [Maximum UID for system users]), [SYSTEM_UID_MAX="$withval"], [SYSTEM_UID_MAX="`awk 'BEGIN { uid=999 } /^\s*SYS_UID_MAX\s+/ { uid=$2 } END { print uid }' /etc/login.defs 2>/dev/null || echo 999`"]) so yes, it's the definition from there. > But then what happens for people > who're managing users via a directory, who need to avoid conflicting with > host-local UIDs, but also need some of those users to have systemd > "system user" like behaviour? We had this in the past in some setups and this would add another reason for unexpected headaches... -- Thanks Bernd
On 11 July 2016 at 17:49, Bernd Helmle <mailings@oopsware.de> wrote:
--On 11. Juli 2016 13:25:51 +0800 Craig Ringer <craig@2ndquadrant.com>
wrote:
> Perhaps by uid threshold in login.defs?
systemd's configure.ac has this:
AC_ARG_WITH(system-uid-max,
AS_HELP_STRING([--with-system-uid-max=UID]
[Maximum UID for system users]),
[SYSTEM_UID_MAX="$withval"],
[SYSTEM_UID_MAX="`awk 'BEGIN { uid=999 } /^\s*SYS_UID_MAX\s+/ {
uid=$2 } END { print uid }' /etc/login.defs 2>/dev/null || echo 999`"])
so yes, it's the definition from there.
At COMPILE TIME?
W.T.F?
In the thread about this, someone even says that's a bad idea. The systemd folks aren't really big on listening, though...
On 07/10/2016 10:56 AM, Joshua D. Drake wrote: > Hackers, > > This just came across my twitter feed: > > https://lists.freedesktop.org/archives/systemd-devel/2014-April/018373.html > > tl;dr; Systemd 212 defaults to remove all IPC (including SYSV memory) > when a user "fully" logs out. That looks like it was under discussion in April, though. Do we have confirmation it was never fixed? I'm not seeing systemd killing Postgres under Fedora24. -- -- Josh Berkus Red Hat OSAS (any opinions are my own)
Josh Berkus <josh@agliodbs.com> writes: > On 07/10/2016 10:56 AM, Joshua D. Drake wrote: >> tl;dr; Systemd 212 defaults to remove all IPC (including SYSV memory) >> when a user "fully" logs out. > That looks like it was under discussion in April, though. Do we have > confirmation it was never fixed? I'm not seeing systemd killing > Postgres under Fedora24. Last I heard, there's an exclusion for "system" accounts, so an installation that's using the Fedora-provided pgsql account isn't going to have a problem. It's homebrew installs running under ordinary-user accounts that are at risk. But they might have changed the policy some more since then. regards, tom lane
On 08/15/2016 02:43 PM, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: >> On 07/10/2016 10:56 AM, Joshua D. Drake wrote: >>> tl;dr; Systemd 212 defaults to remove all IPC (including SYSV memory) >>> when a user "fully" logs out. > >> That looks like it was under discussion in April, though. Do we have >> confirmation it was never fixed? I'm not seeing systemd killing >> Postgres under Fedora24. > > Last I heard, there's an exclusion for "system" accounts, so an > installation that's using the Fedora-provided pgsql account isn't > going to have a problem. It's homebrew installs running under > ordinary-user accounts that are at risk. Presumably people just need to add the system account tag to the unit file, no? -- -- Josh Berkus Red Hat OSAS (any opinions are my own)
Josh Berkus <josh@agliodbs.com> writes: > On 08/15/2016 02:43 PM, Tom Lane wrote: >> Last I heard, there's an exclusion for "system" accounts, so an >> installation that's using the Fedora-provided pgsql account isn't >> going to have a problem. It's homebrew installs running under >> ordinary-user accounts that are at risk. > Presumably people just need to add the system account tag to the unit > file, no? Well, yeah, it's easy to fix once you know you need to do so. The complaint is basically that out-of-the-box, it's broken, and it's not very clear what was gained by breaking it. regards, tom lane
On 08/15/2016 05:18 PM, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: >> On 08/15/2016 02:43 PM, Tom Lane wrote: >>> Last I heard, there's an exclusion for "system" accounts, so an >>> installation that's using the Fedora-provided pgsql account isn't >>> going to have a problem. It's homebrew installs running under >>> ordinary-user accounts that are at risk. > >> Presumably people just need to add the system account tag to the unit >> file, no? > > Well, yeah, it's easy to fix once you know you need to do so. The > complaint is basically that out-of-the-box, it's broken, and it's > not very clear what was gained by breaking it. You're welcome to argue with Lennart about that. I'm not personally supporting the feature, I just don't think it's that hard to work around. -- -- Josh Berkus Red Hat OSAS (any opinions are my own)
Josh Berkus <josh@agliodbs.com> writes: > On 08/15/2016 05:18 PM, Tom Lane wrote: >> Well, yeah, it's easy to fix once you know you need to do so. The >> complaint is basically that out-of-the-box, it's broken, and it's >> not very clear what was gained by breaking it. > You're welcome to argue with Lennart about that. Hah! I can think of more pleasant ways of wasting my time. regards, tom lane
On 16 August 2016 at 08:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I tried to ask on that bug for some more clarity on what exactly a "system account" was, where this behaviour was documented, whether it should really determine the system account uid threshold at COMPILE TIME by reading login.defs from configure (!), etc.Josh Berkus <josh@agliodbs.com> writes:
> On 08/15/2016 05:18 PM, Tom Lane wrote:
>> Well, yeah, it's easy to fix once you know you need to do so. The
>> complaint is basically that out-of-the-box, it's broken, and it's
>> not very clear what was gained by breaking it.
> You're welcome to argue with Lennart about that.
Hah! I can think of more pleasant ways of wasting my time.
I just got a "take it to the mailing list" sort of dismissal. I'd rather stick my hand in a meat grinder than post to the systemd mailing list, especially given the way the prior discussion on the topic went based on the archives, so I left it at that.
--
On Tue, Aug 16, 2016 at 12:41 AM, Josh Berkus <josh@agliodbs.com> wrote: > Presumably people just need to add the system account tag to the unit > file, no? That's a system level change though. How would a normal user manage this? -- greg
On 8/16/16 8:53 AM, Greg Stark wrote: > That's a system level change though. How would a normal user manage this? Arguably, if you are a normal user, you probably shouldn't be using systemd to start system services under your own account. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > On 8/16/16 8:53 AM, Greg Stark wrote: >> That's a system level change though. How would a normal user manage this? > Arguably, if you are a normal user, you probably shouldn't be using > systemd to start system services under your own account. I'm not totally sure, but I think that the complaints were not about systemd-driven services. (In such a case, it's almost certainly possible to fix it by adjusting your systemd unit definition file, anyway.) Rather, the problem arises when J. Ordinary User does nohup postmaster & and then logs out. That's certainly not much of a recipe for production services but people have been known to do it for testing --- in fact, that's pretty much what I do every day with test postmasters. I suppose whenever I migrate to a recent-systemd-based distro I'm going to have to turn off this miserable excuse for a feature. I sure hope there's a way to do so. regards, tom lane
<p dir="ltr"><p dir="ltr">On Aug 16, 2016 4:43 PM, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>wrote:<br /> ><br /> > Peter Eisentraut <<a href="mailto:peter.eisentraut@2ndquadrant.com">peter.eisentraut@2ndquadrant.com</a>>writes:<br /> > > On 8/16/168:53 AM, Greg Stark wrote:<br /> > >> That's a system level change though. How would a normal user managethis?<br /> ><br /> > > Arguably, if you are a normal user, you probably shouldn't be using<br /> > >systemd to start system services under your own account.<br /> ><br /> > I'm not totally sure, but I think thatthe complaints were not about<br /> > systemd-driven services. (In such a case, it's almost certainly possible<br/> > to fix it by adjusting your systemd unit definition file, anyway.)<br /> > Rather, the problem ariseswhen J. Ordinary User does<br /> ><br /> > nohup postmaster &<br /> ><br /> > and then logsout. That's certainly not much of a recipe for production<br /> > services but people have been known to do it fortesting --- in fact,<br /> > that's pretty much what I do every day with test postmasters. I suppose<br /> > wheneverI migrate to a recent-systemd-based distro I'm going to have to<br /> > turn off this miserable excuse for a feature. I sure hope there's a way<br /> > to do so.<p dir="ltr">I think this is a partially different issue though. Theyalready broke the nohup approach earlier with a different change, didn't they? <p dir="ltr">/Magnus <br />
Magnus Hagander <magnus@hagander.net> writes: > On Aug 16, 2016 4:43 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >> Rather, the problem arises when J. Ordinary User does >> nohup postmaster & >> and then logs out. > I think this is a partially different issue though. They already broke the > nohup approach earlier with a different change, didn't they? Dunno, it was still working the last time I used Fedora for anything much. Admittedly, that was about three years ago. But the issue would still arise if you prefer "pg_ctl start". regards, tom lane
<p dir="ltr"><p dir="ltr">On Aug 16, 2016 5:11 PM, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>wrote:<br /> ><br /> > Magnus Hagander <<a href="mailto:magnus@hagander.net">magnus@hagander.net</a>>writes:<br /> > > On Aug 16, 2016 4:43 PM, "Tom Lane"<<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br /> > >> Rather, the problem ariseswhen J. Ordinary User does<br /> > >> nohup postmaster &<br /> > >> and then logs out.<br />><br /> > > I think this is a partially different issue though. They already broke the<br /> > > nohup approachearlier with a different change, didn't they?<br /> ><br /> > Dunno, it was still working the last time I usedFedora for anything much.<br /> > Admittedly, that was about three years ago. But the issue would still<br /> >arise if you prefer "pg_ctl start".<br /> ><p dir="ltr">There are two independent changes AFAIK. One is that whenevera user that logged in interactively logs out all their processes are killed, regardless of nohup. The other one isthe one about shared memory mentioned here. They will both independently kill postgres sessions launched manually. Or withpg_ctl. <p dir="ltr">Both are fairly recent changes, certainly less than three years. <p dir="ltr">/Magnus
Magnus Hagander <magnus@hagander.net> writes: > On Aug 16, 2016 5:11 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >> Dunno, it was still working the last time I used Fedora for anything much. >> Admittedly, that was about three years ago. But the issue would still >> arise if you prefer "pg_ctl start". > There are two independent changes AFAIK. One is that whenever a user that > logged in interactively logs out all their processes are killed, regardless > of nohup. The other one is the one about shared memory mentioned here. They > will both independently kill postgres sessions launched manually. Or with > pg_ctl. Not sure I believe that --- the cases that have been reported to us involved postgres processes that were still alive but had had their SysV semaphore sets deleted out from under them. Likely the SysV shmem segments too, but that wouldn't cause any observable effects for the running cluster. (It *would* risk breaking the interlock against starting a new postmaster, I fear.) It might be that both behaviors exist now but more people know about how to turn off the killing-processes one. regards, tom lane
On Tue, Aug 16, 2016 at 5:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> On Aug 16, 2016 5:11 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> Dunno, it was still working the last time I used Fedora for anything much.
>> Admittedly, that was about three years ago. But the issue would still
>> arise if you prefer "pg_ctl start".
> There are two independent changes AFAIK. One is that whenever a user that
> logged in interactively logs out all their processes are killed, regardless
> of nohup. The other one is the one about shared memory mentioned here. They
> will both independently kill postgres sessions launched manually. Or with
> pg_ctl.
Not sure I believe that --- the cases that have been reported to us
involved postgres processes that were still alive but had had their
SysV semaphore sets deleted out from under them. Likely the SysV
shmem segments too, but that wouldn't cause any observable effects
for the running cluster. (It *would* risk breaking the interlock
against starting a new postmaster, I fear.)
It might be that both behaviors exist now but more people know about
how to turn off the killing-processes one.
Yes, I think it's the second. See for example https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=825394. You can configure KillUserProcesses=no in logind.conf to get rid of it (that bug discusses the debian default behaviour).
On 8/16/16 11:24 AM, Tom Lane wrote: > Not sure I believe that --- the cases that have been reported to us > involved postgres processes that were still alive but had had their > SysV semaphore sets deleted out from under them. Likely the SysV > shmem segments too, but that wouldn't cause any observable effects > for the running cluster. (It *would* risk breaking the interlock > against starting a new postmaster, I fear.) > > It might be that both behaviors exist now but more people know about > how to turn off the killing-processes one. They are two separate things. Both are controlled by settings in logind.conf. RemoveIPC= controls whether System V IPC objects are removed when a user logs out. System users are exempt. This was turned on by default in systemd version 212 (2014-03-25). RHEL7 ships 219. Debian stable ships 215. Apparently, the systemd package in RHEL7 is built with it defaulting to off. The package in Debian defaults to on, but I can't actually reproduce the issue. A brief look through the code and some reading between the lines of the documentation shows that it only cleans up shared memory segments that are no longer attached to, but there is no such check for semaphores. So there are some issues here to be worked out. KillUserProcesses= controls whether all processes of a user should be killed when the user logs out. This was turned on by default in systemd version 230 (2016-05-21). This is not yet shipped widely (Fedora Branched/25, Debian testing, stable-backports). There are various ways to adjust that, including the KillOnlyUsers=, KillExcludeUsers=, loginctl enable-linger, systemd-run. These are all explained on the logind.conf man page. (Being a "system user" has no influence here.) This will clearly result in some wide-spread annoyance among users and some wide-spread rejoicing among system administrators, but other than that I don't see a potential harm specific to PostgreSQL here. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > A brief look through the code and some reading between the lines of the > documentation shows that it only cleans up shared memory segments that > are no longer attached to, but there is no such check for semaphores. Oh, interesting. It had occurred to me that we might be able to dodge this issue if we started to recommend using unnamed POSIX semaphores instead of SysV. (Obviously we'd want to check performance, but it's at least a plausible alternative.) I had not wanted to go there if it meant that we could have silent loss of SysV shmem with no other symptoms, because as I said upthread, I'm concerned about that breaking the multiple-postmaster interlock. However, if the cleanup kills only semaphores and not attached-to shmem, then that objection goes away and this becomes something we should seriously consider. regards, tom lane
On 8/16/16 1:05 PM, Tom Lane wrote: > Oh, interesting. It had occurred to me that we might be able to dodge > this issue if we started to recommend using unnamed POSIX semaphores > instead of SysV. (Obviously we'd want to check performance, but it's > at least a plausible alternative.) I had not wanted to go there if > it meant that we could have silent loss of SysV shmem with no other > symptoms, because as I said upthread, I'm concerned about that breaking > the multiple-postmaster interlock. However, if the cleanup kills only > semaphores and not attached-to shmem, then that objection goes away and > this becomes something we should seriously consider. I was digging around this issue the other day again. We have switched to unnamed POSIX semaphores by default now, which will help. But for dynamic shared memory (DSM) we use POSIX shared memory by default, which is cleaned up without regarding to attachment. So there is still a potential for failures here, possibly more rare or obscure, given the usage of DSM. (If someone is keeping score, it appears the "safest" combination is SysV shared memory + POSIX semaphores.) I have started a wiki page to collect this information: https://wiki.postgresql.org/wiki/Systemd To be continued, I suppose ... -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Oct 21, 2016 at 8:29 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 8/16/16 1:05 PM, Tom Lane wrote: >> Oh, interesting. It had occurred to me that we might be able to dodge >> this issue if we started to recommend using unnamed POSIX semaphores >> instead of SysV. (Obviously we'd want to check performance, but it's >> at least a plausible alternative.) I had not wanted to go there if >> it meant that we could have silent loss of SysV shmem with no other >> symptoms, because as I said upthread, I'm concerned about that breaking >> the multiple-postmaster interlock. However, if the cleanup kills only >> semaphores and not attached-to shmem, then that objection goes away and >> this becomes something we should seriously consider. > > I was digging around this issue the other day again. We have switched > to unnamed POSIX semaphores by default now, which will help. But for > dynamic shared memory (DSM) we use POSIX shared memory by default, which > is cleaned up without regarding to attachment. So there is still a > potential for failures here, possibly more rare or obscure, given the > usage of DSM. The reason I did it that way is because System V shared memory is often subject to very low limits on how much can be allocated, which can also produce failures. It would be easy to switch the default implementation from POSIX to System V, but I suspect that would be a loser overall -- in other words, I suspect that if we switched the default, more people would get hosed by not being able to create those segments in the first place than are currently getting hosed by having them removed prematurely. Also, POSIX shared memory segments at least on Linux are implemented as files. If you remove a file, people who have it open can normally continue to access it. So it might work OK as long as the file isn't removed until after everybody involved in a parallel query has already attached. That's a dangerous thing to bet on, though, because the DSM facility also supports server-lifetime DSMs. We're not using those capabilities right now in core, but we might start - e.g. Magnus was suggesting that we could use DSMs plus Thomas Munro's DSA and DHT patches to replace the stats collector or the temp files used by pg_stat_statements. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company