On Wed, Oct 8, 2025 at 11:39 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Tue, 7 Oct 2025 at 23:40, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Oct 6, 2025 at 6:55 PM Euler Taveira <euler@eulerto.com> wrote:
> > >
> > > On Mon, Oct 6, 2025, at 5:00 PM, Masahiko Sawada wrote:
> > > > I agree with your analysis. It seems there is no convenient way to
> > > > move RelationSyncCache inside PGOutputData. So let's proceed with the
> > > > proposed approach.
> > > >
> > >
> > > +1. Shouldn't we add a comment mentioning that it is a possibility?
> >
> > I think the primary reason why we cannot simply move RelationSyncCache
> > to PGOutputData is there is no way to unregister the invalidation
> > callback, which is already mentioned in some places:
> >
> > /*
> > * We can get here if the plugin was used in SQL interface as the
> > * RelationSyncCache is destroyed when the decoding finishes, but there is
> > * no way to unregister the relcache invalidation callback.
> > */
> > if (RelationSyncCache == NULL)
> >
> > >
> > > > I've done some minor changes to your v2 patch and updated the commit
> > > > message. IIUC this patch needs to be backpatched to v15. Please review
> > > > the attached patch.
> > > >
> > >
> > > I verified that the bug exists since v15 as reported. Despite of the test case
> > > provided by Vignesh (which I attached a modified version to be used in v15 or
> > > later), I also added another test case that has a similar problem with
> > > generated columns. This 2nd test case only works for v18 (where the feature was
> > > introduced). This patch also fixes this case.
> >
> > Thank you for the tests!
>
> I also have verified that the test works till the PG15 branch.
>
> > > I'm curious about other cases related to RelationSyncCache. Is there any other
> > > cases that this patch doesn't fix?
> >
> > I think that with the patch we can ensure to clean up
> > RelationSyncCache alongside with other memory contexts used in
> > pgoutput, fixing problems stemming from RelationSyncCache referencing
> > already-freed-memory in the pgoutput memory contexts.
> >
> > IIUC there is another memory leak issue in pgoutput as I reported on
> > this thread[1]. It should be fixed in a separate patch.
>
> +1 to fix it as a separate patch.
>
> > > This patch looks good to me. Do we really need a new function with the same
> > > content as pgoutput_shutdown?
> >
> > Probably we can call pgoutput_shutdown() in rel_sync_cache_reset_cb().
>
> Changed it to pgoutput_shutdown as it has the same functionality
>
> > > I don't like mcallback. It seems 'm' stands for
> > > 'memory' but if we want to use crypt names I would suggest 'mcb' (_m_emory
> > > _c_all_b_ack) -- that is also a crypt name but a shorter one. Same pattern is
> > > already used in another place with MemoryContextCallback.
> >
> > I think it's better to use 'mcallback' since back branches are already
> > using it (see commit bbe68c13a for example).
>
> Yes, that is better
>
> > While considering
> > backpatching this patch, I noticed that the memory context reset
> > callback function should be registered to ctx->context but not
> > ctx->cachectx.
>
> Modified
>
> The attached v4 version patch has the changes for the same.
Thank you for updating the patch! I was working on creating patches
for backbranches too, so let me share these patches.
I've made some cosmetic changes to align back branch codes, and
attached the patches for master to v15.
One thing we might want to consider is for v14 and v13. They don't
have this bug since the entry_ctx was introduced in v15, but it's
still true for them that RelationSyncCache is not cleaned up in error
cases if pgoutput is used via SQL API. For example, if
RelationSyncCache hash table gets corrupted for some reason, logical
decoding could repeat an error until logical decoding completes
successfully and its shutdown callback is called. Also, it might be a
good idea in general to ensure cleaning up the hash table after use.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com