Thread: BUG #16785: Postgresql shutdown during initial pg_prewarm prewarming causes Pg to get stuck
BUG #16785: Postgresql shutdown during initial pg_prewarm prewarming causes Pg to get stuck
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16785 Logged by: John Thompson Email address: jthompson@maxmind.com PostgreSQL version: 12.5 Operating system: Ubuntu Linux Description: We use pg_prewarm via shared_preload_libraries and we've noticed that if we start Pg then stop it again before pg_prewarm finishes its initial prewarming, Pg gets stuck in shutdown until we kill the 'autoprewarm master' process with 'kill -9'. Removing pg_prewarm from shared_preload_libraries allows Pg to stop without issue. Also, if we start Pg, wait for 'autoprewarm successfully prewarmed XXXXXX of YYYYYY previously-loaded blocks' to appear in the logs, then stop, it shuts down as expected. This is happening on several different servers, but the one I've been testing on has 855569 lines in /var/lib/postgresql/12/main/autoprewarm.blocks, 32GB of RAM, and shared_buffers is 8028MB. We're using the 12.5 packages from apt.postgresql.org. Normally we use systemd to start and stop services, but here's an example with just pg_ctl: $ sudo -u postgres /usr/lib/postgresql/12/bin/pg_ctl start -D /var/lib/postgresql/12/main -l /var/log/postgresql/postgresql-12-main.log -o '-c config_file="/etc/postgresql/12/main/postgresql.conf"' waiting for server to start.... done server started $ sudo -u postgres /usr/lib/postgresql/12/bin/pg_ctl stop -s -w -D /var/lib/postgresql/12/main -m fast pg_ctl: server does not shut down $ ps auxwww | grep ^postgres postgres 448 0.7 0.7 8667168 240252 ? Ss 21:10 0:01 /usr/lib/postgresql/12/bin/postgres -D /var/lib/postgresql/12/main -c config_file=/etc/postgresql/12/main/postgresql.conf postgres 455 0.1 0.0 182036 4252 ? Ss 21:10 0:00 postgres: logger postgres 458 0.0 0.0 8666132 4088 ? Ss 21:10 0:00 postgres: checkpointer postgres 462 0.0 0.0 185232 5928 ? Ss 21:10 0:00 postgres: stats collector postgres 463 0.3 0.0 8682976 22740 ? Ss 21:10 0:00 postgres: autoprewarm master postgres 507 0.0 0.0 8667996 12616 ? Ss 21:10 0:00 postgres: walsender repmgr 10.150.0.16(33718) streaming 34A65/4243A5C8 postgres 508 0.0 0.0 8667996 12612 ? Ss 21:10 0:00 postgres: walsender repmgr 10.150.0.15(52982) streaming 34A65/4243A5C8 $ sudo kill 463 -9 $ ps auxwww | grep ^postgres $ Please let me know if any other information would be useful, I'm happy to help. Thanks for any help you can offer.
Re: BUG #16785: Postgresql shutdown during initial pg_prewarm prewarming causes Pg to get stuck
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > We use pg_prewarm via shared_preload_libraries and we've noticed that if we > start Pg then stop it again before pg_prewarm finishes its initial > prewarming, Pg gets stuck in shutdown until we kill the 'autoprewarm master' > process with 'kill -9'. Yeah. I assume you've got multiple large databases? When I try this here, the current autoprewarm worker process shuts down promptly, but then the master process tries to launch another worker for the next DB, and then just sits there because the postmaster is now ignoring worker start requests. So this is partly autoprewarm's fault, because it's taking no thought for the possibility that a shutdown will be requested before it's managed to preload everything. But it's *mostly* the fault of the incredibly brain-dead design for background workers, which has no way to tell worker requestors that their request is never going to be granted. I've ranted about that before, with little effect on the people responsible, so I guess at some point I'm going to have to go fix it myself. In the meantime, attached is a proposed patch that improves autoprewarm's behavior. It will only get stuck if you're unlucky enough to issue "pg_ctl stop" just as it's trying to launch another worker, which should be a pretty tight window. I noticed while testing this that when autoprewarm did get out of the situation promptly, it'd rewrite autoprewarm.blocks with only as many blocks as it'd managed to prewarm before being canceled. This doesn't seem like a good thing, because then the next start might have next door to no prewarm behavior. So the patch also hacks it to skip the final autoprewarm.blocks dump if we were told to terminate before the initial prewarming pass is done. This is against HEAD but it seems to apply cleanly to v12. If you're in a position to rebuild pg_prewarm with the patch, please see if it resolves the problem for you. regards, tom lane diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c index bfcea9bde3..8d6c81998c 100644 --- a/contrib/pg_prewarm/autoprewarm.c +++ b/contrib/pg_prewarm/autoprewarm.c @@ -153,6 +153,7 @@ void autoprewarm_main(Datum main_arg) { bool first_time = true; + bool final_dump_allowed = true; TimestampTz last_dump_time = 0; /* Establish signal handlers; once that's done, unblock signals. */ @@ -193,10 +194,15 @@ autoprewarm_main(Datum main_arg) * There's not much point in performing a dump immediately after we finish * preloading; so, if we do end up preloading, consider the last dump time * to be equal to the current time. + * + * If apw_load_buffers() is terminated early by a shutdown request, + * prevent dumping out our state below the loop, because we'd effectively + * just truncate the saved state to however much we'd managed to preload. */ if (first_time) { apw_load_buffers(); + final_dump_allowed = !ShutdownRequestPending; last_dump_time = GetCurrentTimestamp(); } @@ -254,7 +260,8 @@ autoprewarm_main(Datum main_arg) * Dump one last time. We assume this is probably the result of a system * shutdown, although it's possible that we've merely been terminated. */ - apw_dump_now(true, true); + if (final_dump_allowed) + apw_dump_now(true, true); } /* @@ -387,6 +394,18 @@ apw_load_buffers(void) if (!have_free_buffer()) break; + /* + * Likewise, don't launch if we've already been told to shut down. + * + * There is a race condition here: if the postmaster has received a + * fast-shutdown signal, but we've not heard about it yet, then the + * postmaster will ignore our worker start request and we'll wait + * forever. However, that's a bug in the general background-worker + * logic, not the fault of this module. + */ + if (ShutdownRequestPending) + break; + /* * Start a per-database worker to load blocks for this database; this * function will return once the per-database worker exits. @@ -404,10 +423,11 @@ apw_load_buffers(void) apw_state->pid_using_dumpfile = InvalidPid; LWLockRelease(&apw_state->lock); - /* Report our success. */ - ereport(LOG, - (errmsg("autoprewarm successfully prewarmed %d of %d previously-loaded blocks", - apw_state->prewarmed_blocks, num_elements))); + /* Report our success, if we were able to finish. */ + if (!ShutdownRequestPending) + ereport(LOG, + (errmsg("autoprewarm successfully prewarmed %d of %d previously-loaded blocks", + apw_state->prewarmed_blocks, num_elements))); } /*
Re: BUG #16785: Postgresql shutdown during initial pg_prewarm prewarming causes Pg to get stuck
From
Tom Lane
Date:
I wrote: > This is against HEAD but it seems to apply cleanly to v12. > If you're in a position to rebuild pg_prewarm with the patch, > please see if it resolves the problem for you. Eh, sorry about that, patch(1) was willing to apply it but the result didn't compile :-(. If you want to try local testing, the version that actually works in v12 is at https://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=f581e5383648913385d3e7789ad9323a3ebfb1d0 regards, tom lane
Re: BUG #16785: Postgresql shutdown during initial pg_prewarm prewarming causes Pg to get stuck
From
John Thompson
Date:
Hi Tom,
Take care
On Tue, Dec 22, 2020 at 12:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> This is against HEAD but it seems to apply cleanly to v12.
> If you're in a position to rebuild pg_prewarm with the patch,
> please see if it resolves the problem for you.
Eh, sorry about that, patch(1) was willing to apply it but the
result didn't compile :-(. If you want to try local testing,
the version that actually works in v12 is at
https://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=f581e5383648913385d3e7789ad9323a3ebfb1d0
regards, tom lane
Re: BUG #16785: Postgresql shutdown during initial pg_prewarm prewarming causes Pg to get stuck
From
Tom Lane
Date:
John Thompson <jthompson@maxmind.com> writes: > Thanks so much for taking a look at this. As you mentioned the patch didn't > apply cleanly to 12, but I ended up making the same changes as are in the > link you posted, and it fixes the problem in my testing. I appreciate your > help. OK, thanks for confirming! regards, tom lane