Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 - Mailing list pgsql-bugs
From | Amit Kapila |
---|---|
Subject | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 |
Date | |
Msg-id | CAA4eK1KonVMndZ+a4mCGCbgGDfOqKDiJvYV5EHXyjnF8oSn7BQ@mail.gmail.com Whole thread Raw |
In response to | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
|
List | pgsql-bugs |
On Fri, Jun 6, 2025 at 12:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Jun 5, 2025 at 4:07 AM Hayato Kuroda (Fujitsu) > <kuroda.hayato@fujitsu.com> wrote: > > > > Dear Amit, > > > > > > --- > > > > I'd like to make it clear again which case we need to execute > > > > txn->invalidations as well as txn->invalidations_distributed (like in > > > > ReorderBufferProcessTXN()) and which case we need to execute only > > > > txn->invalidations (like in ReorderBufferForget() and > > > > ReorderBufferAbort()). I think it might be worth putting some comments > > > > about overall strategy somewhere. > > > > > > > > --- > > > > BTW for back branches, a simple fix without ABI breakage would be to > > > > introduce the RBTXN_INVAL_OVERFLOWED flag to limit the size of > > > > txn->invalidations. That is, we accumulate inval messages both coming > > > > from the current transaction and distributed by other transactions but > > > > once the size reaches the threshold we invalidate all caches. Is it > > > > worth considering for back branches? > > > > > > > > > > It should work and is worth considering. The main concern would be > > > that it will hit sooner than we expect in the field, seeing the recent > > > reports. So, such a change has the potential to degrade the > > > performance. I feel that the number of people impacted due to > > > performance would be more than the number of people impacted due to > > > such an ABI change (adding the new members at the end of > > > ReorderBufferTXN). However, if we think we want to go safe w.r.t > > > extensions that can rely on the sizeof ReorderBufferTXN then your > > > proposal makes sense. > > > > While considering the approach, I found a doubtful point. Consider the below > > workload: > > > > 0. S1: CREATE TABLE d(data text not null); > > 1. S1: BEGIN; > > 2. S1: INSERT INTO d VALUES ('d1') > > 3. S2: BEGIN; > > 4. S2: INSERT INTO d VALUES ('d2') > > 5. S1: ALTER PUBLICATION pb ADD TABLE d; > > 6. S1: ... lots of DDLs so overflow happens > > 7. S1: COMMIT; > > 8. S2: INSERT INTO d VALUES ('d3'); > > 9. S2: COMMIT; > > 10. S2: INSERT INTO d VALUES ('d4'); > > > > In this case, the inval message generated by step 5 is discarded at step 6. No > > invalidation messages are distributed in the SnapBuildDistributeSnapshotAndInval(). > > While decoding S2, relcache cannot be discarded and tuples d3 and d4 won't be > > replicated. Do you think this can happen? > > I think that once the S1's inval messages got overflowed, we should > mark other transactions as overflowed instead of distributing inval > messages. > Yeah, this should work, but are you still advocating that we go with this approach (marking txn->invalidations also as overflowed) for backbranches? In the previous email, you seemed to agree with the performance impact due to DDLs, so it is not clear which approach you prefer. -- With Regards, Amit Kapila.
pgsql-bugs by date: