Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Perform streaming logical transactions by background workers and parallel apply |
Date | |
Msg-id | CAD21AoBDLiFHThzfzvrnViTKnsm-pM5YvfTys_96-jBSXpWYqw@mail.gmail.com Whole thread Raw |
In response to | Re: Perform streaming logical transactions by background workers and parallel apply (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Perform streaming logical transactions by background workers and parallel apply
|
List | pgsql-hackers |
On Wed, Oct 12, 2022 at 3:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Oct 11, 2022 at 5:52 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Fri, Oct 7, 2022 at 2:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > About your point that having different partition structures for > > > publisher and subscriber, I don't know how common it will be once we > > > have DDL replication. Also, the default value of > > > publish_via_partition_root is false which doesn't seem to indicate > > > that this is a quite common case. > > > > So how can we consider these concurrent issues that could happen only > > when streaming = 'parallel'? Can we restrict some use cases to avoid > > the problem or can we have a safeguard against these conflicts? > > > > Yeah, right now the strategy is to disallow parallel apply for such > cases as you can see in *0003* patch. Tightening the restrictions could work in some cases but there might still be coner cases and it could reduce the usability. I'm not really sure that we can ensure such a deadlock won't happen with the current restrictions. I think we need something safeguard just in case. For example, if the leader apply worker is waiting for a lock acquired by its parallel worker, it cancels the parallel worker's transaction, commits its transaction, and restarts logical replication. Or the leader can log the deadlock to let the user know. > > > We > > could find a new problematic scenario in the future and if it happens, > > logical replication gets stuck, it cannot be resolved only by apply > > workers themselves. > > > > I think users can change streaming option to on/off and internally the > parallel apply worker can detect and restart to allow replication to > proceed. Having said that, I think that would be a bug in the code and > we should try to fix it. We may need to disable parallel apply in the > problematic case. > > The other ideas that occurred to me in this regard are (a) provide a > reloption (say parallel_apply) at table level and we can use that to > bypass various checks like different Unique Key between > publisher/subscriber, constraints/expressions having mutable > functions, Foreign Key (when enabled on subscriber), operations on > Partitioned Table. We can't detect whether those are safe or not > (primarily because of a different structure in publisher and > subscriber) so we prohibit parallel apply but if users use this > option, we can allow it even in those cases. The parallel apply worker is assigned per transaction, right? If so, how can we know which tables are modified in the transaction in advance? and what if two tables whose reloptions are true and false are modified in the same transaction? > (b) While enabling the > parallel option in the subscription, we can try to match all the > table(s) information of the publisher/subscriber. It will be tricky to > make this work because say even if match some trigger function name, > we won't be able to match the function body. The other thing is when > at a later point the table definition is changed on the subscriber, we > need to again validate the information between publisher and > subscriber which I think would be difficult as we would be already in > between processing some message and getting information from the > publisher at that stage won't be possible. Indeed. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: