Home > mailing lists

Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From	Masahiko Sawada
Subject	Re: Perform streaming logical transactions by background workers and parallel apply
Date	October 6, 2022 08:06:30
Msg-id	CAD21AoBLPDPCE4jj16ZRE9J8AoxLtvo3yasToaP5kRp6++=yyg@mail.gmail.com Whole thread Raw
In response to	RE: Perform streaming logical transactions by background workers and parallel apply ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Responses	RE: Perform streaming logical transactions by background workers and parallel apply Re: Perform streaming logical transactions by background workers and parallel apply
List	pgsql-hackers

Tree view

On Tue, Sep 27, 2022 at 9:26 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Saturday, September 24, 2022 7:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Sep 22, 2022 at 3:41 PM Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > >
> > > On Thu, Sep 22, 2022 at 8:59 AM wangw.fnst@fujitsu.com
> > > <wangw.fnst@fujitsu.com> wrote:
> > > >
> > >
> > > Few comments on v33-0001
> > > =======================
> > >
> >
> > Some more comments on v33-0001
> > =============================
> > 1.
> > + /* Information from the corresponding LogicalRepWorker slot. */
> > + uint16 logicalrep_worker_generation;
> > +
> > + int logicalrep_worker_slot_no;
> > +} ParallelApplyWorkerShared;
> >
> > Both these variables are read/changed by leader/parallel workers without
> > using any lock (mutex). It seems currently there is no problem because of the
> > way the patch is using in_parallel_apply_xact but I think it won't be a good idea
> > to rely on it. I suggest using mutex to operate on these variables and also check
> > if the slot_no is in a valid range after reading it in parallel_apply_free_worker,
> > otherwise error out using elog.
>
> Changed.
>
> > 2.
> >  static void
> >  apply_handle_stream_stop(StringInfo s)
> >  {
> > - if (!in_streamed_transaction)
> > + ParallelApplyWorkerInfo *winfo = NULL; TransApplyAction apply_action;
> > +
> > + if (!am_parallel_apply_worker() &&
> > + (!in_streamed_transaction && !stream_apply_worker))
> >   ereport(ERROR,
> >   (errcode(ERRCODE_PROTOCOL_VIOLATION),
> >   errmsg_internal("STREAM STOP message without STREAM START")));
> >
> > This check won't be able to detect missing stream start messages for parallel
> > apply workers apart from the first pair of start/stop. I thought of adding
> > in_remote_transaction check along with
> > am_parallel_apply_worker() to detect the same but that also won't work
> > because the parallel worker doesn't reset it at the stop message.
> > Another possibility is to introduce yet another variable for this but that doesn't
> > seem worth it. I would like to keep this check simple.
> > Can you think of any better way?
>
> I feel we can reuse the in_streamed_transaction in parallel apply worker to
> simplify the check there. I tried to set this flag in parallel apply worker
> when stream starts and reset it when stream stop so that we can directly check
> this flag for duplicate stream start message and other related things.
>
> > 3. I think we can skip sending start/stop messages from the leader to the
> > parallel worker because unlike apply worker it will process only one
> > transaction-at-a-time. However, it is not clear whether that is worth the effort
> > because it is sent after logical_decoding_work_mem changes. For now, I have
> > added a comment for this in the attached patch but let me if I am missing
> > something or if I am wrong.
>
> I the suggested comments look good.
>
> > 4.
> > postgres=# select pid, leader_pid, application_name, backend_type from
> > pg_stat_activity;
> >   pid  | leader_pid | application_name |         backend_type
> > -------+------------+------------------+------------------------------
> >  27624 |            |                  | logical replication launcher
> >  17336 |            | psql             | client backend
> >  26312 |            |                  | logical replication worker
> >  26376 |            | psql             | client backend
> >  14004 |            |                  | logical replication worker
> >
> > Here, the second worker entry is for the parallel worker. Isn't it better if we
> > distinguish this by keeping type as a logical replication parallel worker? I think
> > for this you need to change bgw_type in logicalrep_worker_launch().
>
> Changed.
>
> > 5. Can we name parallel_apply_subxact_info_add() as
> > parallel_apply_start_subtrans()?
> >
> > Apart from the above, I have added/edited a few comments and made a few
> > other cosmetic changes in the attached.
>

While looking at v35 patch, I realized that there are some cases where
the logical replication gets stuck depending on partitioned table
structure. For instance, there are following tables, publication, and
subscription:

* On publisher
create table p (c int) partition by list (c);
create table c1 partition of p for values in (1);
create table c2 (c int);
create publication test_pub for table p, c1, c2 with
(publish_via_partition_root = 'true');

* On subscriber
create table p (c int) partition by list (c);
create table c1 partition of p for values In (2);
create table c2 partition of p for values In (1);
create subscription test_sub connection 'port=5551 dbname=postgres'
publication test_pub with (streaming = 'parallel', copy_data =
'false');

Note that while both the publisher and the subscriber have the same
name tables the partition structure is different and rows go to a
different table on the subscriber (eg, row c=1 will go to c2 table on
the subscriber). If two current transactions are executed as follows,
the apply worker (ig, the leader apply worker) waits for a lock on c2
held by its parallel apply worker:

* TX-1
BEGIN;
INSERT INTO p SELECT 1 FROM generate_series(1, 10000); --- changes are streamed

    * TX-2
    BEGIN;
    TRUNCATE c2; --- wait for a lock on c2

* TX-1
INSERT INTO p SELECT 1 FROM generate_series(1, 10000);
COMMIT;

This might not be a common case in practice but it could mean that
there is a restriction on how partitioned tables should be structured
on the publisher and the subscriber when using streaming = 'parallel'.
When this happens, since the logical replication cannot move forward
the users need to disable parallel-apply mode or increase
logical_decoding_work_mem. We could describe this limitation in the
doc but it would be hard for users to detect problematic table
structure.

BTW, when the leader apply worker waits for a lock on c2 in the above
example, the parallel apply worker is in a busy-loop, which should be
fixed.

Regards,

--
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

From: Bharath Rupireddy
Date: 06 October 2022, 08:03:33
Subject: Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication

From: Michael Paquier
Date: 06 October 2022, 08:09:42
Subject: Re: use has_privs_of_role() for pg_hba.conf

Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

Previous

Next