RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 - Mailing list pgsql-bugs

From Hayato Kuroda (Fujitsu)
Subject RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date
Msg-id OSCPR01MB149662920804EAA70CE1E286FF56FA@OSCPR01MB14966.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5  (Amit Kapila <amit.kapila16@gmail.com>)
Responses RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
List pgsql-bugs
Dear Amit,

> > ---
> > I'd like to make it clear again which case we need to execute
> > txn->invalidations as well as txn->invalidations_distributed (like in
> > ReorderBufferProcessTXN()) and which case we need to execute only
> > txn->invalidations (like in ReorderBufferForget() and
> > ReorderBufferAbort()). I think it might be worth putting some comments
> > about overall strategy somewhere.
> >
> > ---
> > BTW for back branches, a simple fix without ABI breakage would be to
> > introduce the RBTXN_INVAL_OVERFLOWED flag to limit the size of
> > txn->invalidations. That is, we accumulate inval messages both coming
> > from the current transaction and distributed by other transactions but
> > once the size reaches the threshold we invalidate all caches. Is it
> > worth considering for back branches?
> >
> 
> It should work and is worth considering. The main concern would be
> that it will hit sooner than we expect in the field, seeing the recent
> reports. So, such a change has the potential to degrade the
> performance. I feel that the number of people impacted due to
> performance would be more than the number of people impacted due to
> such an ABI change (adding the new members at the end of
> ReorderBufferTXN). However, if we think we want to go safe w.r.t
> extensions that can rely on the sizeof ReorderBufferTXN then your
> proposal makes sense.

While considering the approach, I found a doubtful point. Consider the below
workload:

0. S1: CREATE TABLE d(data text not null);
1. S1: BEGIN;
2. S1: INSERT INTO d VALUES ('d1')
3.                         S2: BEGIN;
4.                         S2: INSERT INTO d VALUES ('d2')
5. S1: ALTER PUBLICATION pb ADD TABLE d;
6. S1: ... lots of DDLs so overflow happens
7. S1: COMMIT;
8.                         S2: INSERT INTO d VALUES ('d3');
9.                         S2: COMMIT;
10.                        S2: INSERT INTO d VALUES ('d4');

In this case, the inval message generated by step 5 is discarded at step 6. No
invalidation messages are distributed in the  SnapBuildDistributeSnapshotAndInval().
While decoding S2, relcache cannot be discarded and tuples d3 and d4 won't be
replicated. Do you think this can happen?

Note that this won't happen for v11 patch. The patch won't discard txn->invalidations
in case of overflow, needed inval messages can be distributed.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


pgsql-bugs by date:

Previous
From: vignesh C
Date:
Subject: Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5