Fwd: background worker shudown (was Re: [HACKERS] Why does logicalreplication launcher exit with exit code 1?) - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Fwd: background worker shudown (was Re: [HACKERS] Why does logicalreplication launcher exit with exit code 1?) |
Date | |
Msg-id | CA+TgmobwExL4kNj_eXJxPah_tVQ31N0cYDbUN0FFm6uaY_+X=w@mail.gmail.com Whole thread Raw |
In response to | [HACKERS] Why does logical replication launcher exit with exit code 1? (Thomas Munro <thomas.munro@enterprisedb.com>) |
Responses |
Re: background worker shudown (was Re: [HACKERS] Why does logicalreplication launcher exit with exit code 1?)
|
List | pgsql-hackers |
On Thu, Oct 4, 2018 at 7:37 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > I still think the current situation is non-ideal. I don't have a > strong view on whether some or all system-wide processes should say > hello and goodbye explicitly in the log, but I do think we need a way > to make that not look like an error condition, and ideally without > special cases for known built-in processes. > > I looked into this a bit today, while debugging an extension that runs > a background worker. Here are some assorted approaches to shutdown: > > 0. The default SIGTERM handler for bgworkers is bgworker_die(), which > generates a FATAL ereport "terminating background worker \"%s\" due to > administrator command", directly in the signal handler (hmm, see also > recent discussions about the legality of similar code in quickdie()). Yeah, I think that code is bad news, for the same reasons discussed with regard to quickdie(). > 1. Some processes install their own custom SIGTERM handler that sets > a flag, and use that to break out of their main loop and exit quietly. > Example: autovacuum.c, or the open-source pg_cron extension, and > probably many other things that just want a nice quiet shutdown. > > 2. Some processes install the standard SIGTERM handler die(), and > then use CHECK_FOR_INTERRUPTS() to break out of their main loop. By > default this looks like "FATAL: terminating connection due to > administrator command". This approach is effectively required if the > main loop will reach other code that has a CHECK_FOR_INTERRUPTS() wait > loop. For example, a "launcher"-type process that calls > WaitForBackgroundWorkerStartup() could hang forever if it used > approach 1 (ask me how I know). My experience with background workers has been that approach #1 does not really work. I mean, if you aren't doing anything complicated it might be OK, if for example you are executing SQL queries, and you try to do #1, then your SQL queries don't respond to interrupts. And that sucks. So I have generally adopted approach #2. To put that another way, nearly everything can reach CHECK_FOR_INTERRUPTS(), so saying that this is required whenever that in the case isn't much different from saying that it is required, full stop. > 3. Some system processes generally use approach 2, but have a special > case in ProcessInterrupts() to suppress or alter the usual FATAL > message or behaviour. This includes the logical replication launcher. Maybe we could replace this by a general-purpose hook. So instead of the various tests for process types that are there now, we would just have if (procdie_hook != NULL) (*procdie_hook)(); And that hook can do whatever you like (but probably including dying). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: