Re: Invalid pointer access in logical decoding after error - Mailing list pgsql-hackers

From vignesh C
Subject Re: Invalid pointer access in logical decoding after error
Date
Msg-id CALDaNm0W=JSfgd=Wkk0S39iKTH+9ok2-EN1mOmoyqrL1s9683w@mail.gmail.com
Whole thread Raw
In response to Re: Invalid pointer access in logical decoding after error  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Invalid pointer access in logical decoding after error
List pgsql-hackers
On Thu, 9 Oct 2025 at 00:16, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Oct 8, 2025 at 11:39 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Tue, 7 Oct 2025 at 23:40, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Mon, Oct 6, 2025 at 6:55 PM Euler Taveira <euler@eulerto.com> wrote:
> > > >
> > > > On Mon, Oct 6, 2025, at 5:00 PM, Masahiko Sawada wrote:
> > > > > I agree with your analysis. It seems there is no convenient way to
> > > > > move RelationSyncCache inside PGOutputData. So let's proceed with the
> > > > > proposed approach.
> > > > >
> > > >
> > > > +1. Shouldn't we add a comment mentioning that it is a possibility?
> > >
> > > I think the primary reason why we cannot simply move RelationSyncCache
> > > to PGOutputData is there is no way to unregister the invalidation
> > > callback, which is already mentioned in some places:
> > >
> > >       /*
> > >        * We can get here if the plugin was used in SQL interface as the
> > >        * RelationSyncCache is destroyed when the decoding finishes, but there is
> > >        * no way to unregister the relcache invalidation callback.
> > >        */
> > >       if (RelationSyncCache == NULL)
> > >
> > > >
> > > > > I've done some minor changes to your v2 patch and updated the commit
> > > > > message. IIUC this patch needs to be backpatched to v15. Please review
> > > > > the attached patch.
> > > > >
> > > >
> > > > I verified that the bug exists since v15 as reported. Despite of the test case
> > > > provided by Vignesh (which I attached a modified version to be used in v15 or
> > > > later), I also added another test case that has a similar problem with
> > > > generated columns. This 2nd test case only works for v18 (where the feature was
> > > > introduced). This patch also fixes this case.
> > >
> > > Thank you for the tests!
> >
> > I also have verified that the test works till the PG15 branch.
> >
> > > > I'm curious about other cases related to RelationSyncCache. Is there any other
> > > > cases that this patch doesn't fix?
> > >
> > > I think that with the patch we can ensure to clean up
> > > RelationSyncCache alongside with other memory contexts used in
> > > pgoutput, fixing problems stemming from RelationSyncCache referencing
> > > already-freed-memory in the pgoutput memory contexts.
> > >
> > > IIUC there is another memory leak issue in pgoutput as I reported on
> > > this thread[1]. It should be fixed in a separate patch.
> >
> > +1 to fix it as a separate patch.
> >
> > > > This patch looks good to me. Do we really need a new function with the same
> > > > content as pgoutput_shutdown?
> > >
> > > Probably we can call pgoutput_shutdown() in rel_sync_cache_reset_cb().
> >
> > Changed it to pgoutput_shutdown as it has the same functionality
> >
> > > > I don't like mcallback. It seems 'm' stands for
> > > > 'memory' but if we want to use crypt names I would suggest 'mcb' (_m_emory
> > > > _c_all_b_ack) -- that is also a crypt name but a shorter one. Same pattern is
> > > > already used in another place with MemoryContextCallback.
> > >
> > > I think it's better to use 'mcallback' since back branches are already
> > > using it (see commit bbe68c13a for example).
> >
> > Yes, that is better
> >
> > > While considering
> > > backpatching this patch, I noticed that the memory context reset
> > > callback function should be registered to ctx->context but not
> > > ctx->cachectx.
> >
> > Modified
> >
> > The attached v4 version patch has the changes for the same.
>
> Thank you for updating the patch! I was working on creating patches
> for backbranches too, so let me share these patches.
>
> I've made some cosmetic changes to align back branch codes, and
> attached the patches for master to v15.
>
> One thing we might want to consider is for v14 and v13. They don't
> have this bug since the entry_ctx was introduced in v15, but it's
> still true for them that RelationSyncCache is not cleaned up in error
> cases if pgoutput is used via SQL API. For example, if
> RelationSyncCache hash table gets corrupted for some reason, logical
> decoding could repeat an error until logical decoding completes
> successfully and its shutdown callback is called. Also, it might be a
> good idea in general to ensure cleaning up the hash table after use.

Agreed, let's backpatch to PG13. Should we also add a test case in the
master branch, given that this issue has been around for a while?

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [PATCH] Remove unused #include's in src/backend/commands/*
Next
From: Álvaro Herrera
Date:
Subject: Re: [PATCH] Remove unused #include's in src/backend/commands/*