Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Perform streaming logical transactions by background workers and parallel apply |
Date | |
Msg-id | CAA4eK1+S5CX=OzD01bgoEuxQqF45zyYV-dJGh3K2f4j6DyDiOA@mail.gmail.com Whole thread Raw |
In response to | Re: Perform streaming logical transactions by background workers and parallel apply (Peter Smith <smithpb2250@gmail.com>) |
Responses |
Re: Perform streaming logical transactions by background workers and parallel apply
|
List | pgsql-hackers |
On Tue, Dec 6, 2022 at 5:27 AM Peter Smith <smithpb2250@gmail.com> wrote: > > Here are my review comments for patch v55-0002 > ... > > 3. pa_spooled_messages > > Previously I suggested this function name should be changed but that > was rejected (see [1] #6a) > > > 6a. > > IMO a better name for this function would be > > pa_apply_spooled_messages(); > Not sure about this. > > ~ > > FYI the reason for the previous suggestion is because there is no verb > in the current function name, so the reader is left thinking > pa_spooled_messages "what"? > > It means the caller has to have extra comments like: > /* Check if changes have been serialized to a file. */ > pa_spooled_messages(); > > OTOH, if the function was called something better -- e.g. > pa_check_for_spooled_messages() or similar -- then it would be > self-explanatory. > I think pa_check_for_spooled_messages() could be misleading because we do apply the changes in that function, so probably a comment as suggested by you is a better option. > ~ > > 4. > > /* > + * Replay the spooled messages in the parallel apply worker if the leader apply > + * worker has finished serializing changes to the file. > + */ > +static void > +pa_spooled_messages(void) > > I'm not 100% sure of the logic, so IMO maybe the comment should say a > bit more about how this works: > > Specifically, let's say there was some timeout and the LA needed to > write the spool file, then let's say the PA timed out and found itself > inside this function. Now, let's say the LA is still busy writing the > file -- so what happens next? > > Does this function simply return, then the main PA loop waits again, > then the times out again, then PA finds itself back inside this > function again... and that keeps happening over and over until > eventually the spool file is found FS_READY? Some explanatory comments > might help. > No, PA will simply wait for LA to finish. See the code handling for FS_BUSY state. We might want to slightly improve part of the current comment to: "If the leader apply worker is busy serializing the partial changes then acquire the stream lock now and wait for the leader worker to finish serializing the changes". > > 16. apply_spooled_messages > > + stream_fd = BufFileOpenFileSet(stream_fileset, path, O_RDONLY, false); > > Something still seems a bit odd about this to me (previously also > mentioned in review [1] #29) but I cannot quite put my finger on it... > > AFAIK the 'stream_fd' is the global the LA is using to remember the > single stream spool file; It corresponds to the LogicalRepWorker's > 'stream_fileset'. So using that same global on the PA side somehow > seemed strange to me. The fileset at PA comes from a different place > (MyParallelShared->fileset). > I think 'stream_fd' is specific to apply module which can be used by apply, tablesync, or parallel worker. Unfortunately, now, the code in worker.c is a mix of worker and apply module. At some point, we should separate apply logic to a separate file. -- With Regards, Amit Kapila.
pgsql-hackers by date: