Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION |
Date | |
Msg-id | CAHGQGwHT0DrzCH22M4ag6oN-YSmm=RWPbQW6g3bbfnjLm-GaMw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION (Petr Jelinek <petr.jelinek@2ndquadrant.com>) |
Responses |
Re: [HACKERS] Cannot shutdown subscriber after DROP SUBSCRIPTION
|
List | pgsql-hackers |
On Sun, Feb 5, 2017 at 5:11 AM, Petr Jelinek <petr.jelinek@2ndquadrant.com> wrote: > On 03/02/17 19:38, Fujii Masao wrote: >> On Sat, Feb 4, 2017 at 12:49 AM, Fujii Masao <masao.fujii@gmail.com> wrote: >>> On Fri, Feb 3, 2017 at 10:56 AM, Kyotaro HORIGUCHI >>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >>>> At Fri, 3 Feb 2017 01:02:47 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in <CAHGQGwHqQVHmQ7wM=eLNnp1_oy-GVSSAcaJXWjE4nc2twSqXRg@mail.gmail.com> >>>>> On Thu, Feb 2, 2017 at 2:36 PM, Michael Paquier >>>>> <michael.paquier@gmail.com> wrote: >>>>>> On Thu, Feb 2, 2017 at 2:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>>>> Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: >>>>>>>> Then, the reason for the TRY-CATCH cluase is that I found that >>>>>>>> some functions called from there can throw exceptions. >>>>>>> >>>>>>> Yes, but all LWLocks should be released by normal error recovery. >>>>>>> It should not be necessary for this code to clean that up by hand. >>>>>>> If it were necessary, there would be TRY-CATCH around every single >>>>>>> LWLockAcquire in the backend, and we'd have an unreadable and >>>>>>> unmaintainable system. Please don't add a TRY-CATCH unless it's >>>>>>> *necessary* -- and you haven't explained why this one is. >>>>> >>>>> Yes. >>>> >>>> Thank you for the suggestion. I minunderstood that. >>>> >>>>>> Putting hands into the code and at the problem, I can see that >>>>>> dropping a subscription on a node makes it unresponsive in case of a >>>>>> stop. And that's just because calls to LWLockRelease are missing as in >>>>>> the patch attached. A try/catch problem should not be necessary. >>>>> >>>>> Thanks for the patch! >>>>> >>>>> With the patch, LogicalRepLauncherLock is released at the end of >>>>> DropSubscription(). But ISTM that the lock should be released just after >>>>> logicalrep_worker_stop() and there is no need to protect the removal of >>>>> replication slot with the lock. >>>> >>>> That's true. logicalrep_worker_stop returns after confirmig that >>>> worker->proc is cleard, so no false relaunch cannot be caused. >>>> After all, logicalrep_worker_stop is surrounded by >>>> LWLockAcquire/Relase pair. So it can be moved into the funciton >>>> and make the lock secrion to be more narrower. >> >> If we do this, Assert(LWLockHeldByMe(LogicalRepLauncherLock)) should be >> removed and the comment for logicalrep_worker_stop() should be updated. >> >> Your approach may cause the deadlock. The launcher takes LogicalRepWorkerLock >> while holding LogicalRepLauncherLock. OTOH, with your approach, >> logicalrep_worker_stop() takes LogicalRepLauncherLock while holding >> LogicalRepWorkerLock. >> >> Therefore I pushed the simple patch which adds LWLockRelease() just after >> logicalrep_worker_stop(). >> >> Another problem that I found while reading the code is that the launcher can >> start up the worker with the subscription that DROP SUBSCRIPTION just removed. >> That is, DROP SUBSCRIPTION removes the target entry from pg_subscription, >> but the launcher can see it and start new worker until the transaction for >> DROP has been committed. >> > > That was the reason why DropSubscription didn't release the lock in the > first place. It was supposed to be released at the end of the > transaction though. OK, I understood why you used the lock in that way. But using LWLock for that purpose is not valid. >> To fix this issue, I think that DROP SUBSCRIPTION should take >> AccessExclusiveLock on pg_subscription, instead of RowExclusiveLock, >> so that the launcher cannot see the entry to be being removed. >> > > The whole point of having LogicalRepLauncherLock is to avoid having to > do this, so if we do this we could probably get rid of it. Yes, let's remove LogicalRepLauncherLock and lock pg_subscription with AccessExclusive mode at the beginning of DROP SUBSCRIPTION. Attached patch does this. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: