Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Perform streaming logical transactions by background workers and parallel apply |
Date | |
Msg-id | CAA4eK1JjRRwdT8=d86mrp-vX2YwWNXv5TUid7sb4fe-EmUXDWA@mail.gmail.com Whole thread Raw |
In response to | Re: Perform streaming logical transactions by background workers and parallel apply (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Perform streaming logical transactions by background workers and parallel apply
|
List | pgsql-hackers |
On Mon, May 8, 2023 at 11:08 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, May 8, 2023 at 12:52 PM Zhijie Hou (Fujitsu) > <houzj.fnst@fujitsu.com> wrote: > > > > On Monday, May 8, 2023 11:08 AM Masahiko Sawada <sawada.mshk@gmail.com> > > > > Hi, > > > > > > > > On Tue, May 2, 2023 at 12:22 PM Amit Kapila <amit.kapila16@gmail.com> > > > wrote: > > > > > > > > On Fri, Apr 28, 2023 at 11:48 AM Masahiko Sawada > > > <sawada.mshk@gmail.com> wrote: > > > > > > > > > > While investigating this issue, I've reviewed the code around > > > > > callbacks and worker termination etc and I found a problem. > > > > > > > > > > A parallel apply worker calls the before_shmem_exit callbacks in the > > > > > following order: > > > > > > > > > > 1. ShutdownPostgres() > > > > > 2. logicalrep_worker_onexit() > > > > > 3. pa_shutdown() > > > > > > > > > > Since the worker is detached during logicalrep_worker_onexit(), > > > > > MyLogicalReplication->leader_pid is an invalid when we call > > > > > pa_shutdown(): > > > > > > > > > > static void > > > > > pa_shutdown(int code, Datum arg) > > > > > { > > > > > Assert(MyLogicalRepWorker->leader_pid != InvalidPid); > > > > > SendProcSignal(MyLogicalRepWorker->leader_pid, > > > > > PROCSIG_PARALLEL_APPLY_MESSAGE, > > > > > InvalidBackendId); > > > > > > > > > > Also, if the parallel apply worker fails shm_toc_lookup() during the > > > > > initialization, it raises an error (because of noError = false) but > > > > > ends up a SEGV as MyLogicalRepWorker is still NULL. > > > > > > > > > > I think that we should not use MyLogicalRepWorker->leader_pid in > > > > > pa_shutdown() but instead store the leader's pid to a static variable > > > > > before registering pa_shutdown() callback. > > > > > > > > > > > > > Why not simply move the registration of pa_shutdown() to someplace > > > > after logicalrep_worker_attach()? > > > > > > If we do that, the worker won't call dsm_detach() if it raises an > > > ERROR in logicalrep_worker_attach(), is that okay? It seems that it's > > > no practically problem since we call dsm_backend_shutdown() in > > > shmem_exit(), but if so why do we need to call it in pa_shutdown()? > > > > I think the dsm_detach in pa_shutdown was intended to fire on_dsm_detach > > callbacks to give callback a chance to report stat before the stat system is > > shutdown, following what we do in ParallelWorkerShutdown() (e.g. > > sharedfileset.c callbacks cause fd.c to do ReportTemporaryFileUsage(), so we > > need to fire that earlier). > > > > But for parallel apply, we currently only have one on_dsm_detach > > callback(shm_mq_detach_callback) which doesn't report extra stats. So the > > dsm_detach in pa_shutdown is only used to make it a bit future-proof in case > > we add some other on_dsm_detach callbacks in the future which need to report > > stats. > > Make sense . Given that it's possible that we add other callbacks that > report stats in the future, I think it's better not to move the > position to register pa_shutdown() callback. > Hmm, what kind of stats do we expect to be collected before we register pa_shutdown? I think if required we can register such a callback after pa_shutdown. I feel without reordering the callbacks, the fix would be a bit complicated as explained in my previous email, so I don't think it is worth complicating this code unless really required. -- With Regards, Amit Kapila.
pgsql-hackers by date: