Re: Invalid pointer access in logical decoding after error - Mailing list pgsql-hackers
From | vignesh C |
---|---|
Subject | Re: Invalid pointer access in logical decoding after error |
Date | |
Msg-id | CALDaNm0W=JSfgd=Wkk0S39iKTH+9ok2-EN1mOmoyqrL1s9683w@mail.gmail.com Whole thread Raw |
In response to | Re: Invalid pointer access in logical decoding after error (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Invalid pointer access in logical decoding after error
|
List | pgsql-hackers |
On Thu, 9 Oct 2025 at 00:16, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Oct 8, 2025 at 11:39 AM vignesh C <vignesh21@gmail.com> wrote: > > > > On Tue, 7 Oct 2025 at 23:40, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Mon, Oct 6, 2025 at 6:55 PM Euler Taveira <euler@eulerto.com> wrote: > > > > > > > > On Mon, Oct 6, 2025, at 5:00 PM, Masahiko Sawada wrote: > > > > > I agree with your analysis. It seems there is no convenient way to > > > > > move RelationSyncCache inside PGOutputData. So let's proceed with the > > > > > proposed approach. > > > > > > > > > > > > > +1. Shouldn't we add a comment mentioning that it is a possibility? > > > > > > I think the primary reason why we cannot simply move RelationSyncCache > > > to PGOutputData is there is no way to unregister the invalidation > > > callback, which is already mentioned in some places: > > > > > > /* > > > * We can get here if the plugin was used in SQL interface as the > > > * RelationSyncCache is destroyed when the decoding finishes, but there is > > > * no way to unregister the relcache invalidation callback. > > > */ > > > if (RelationSyncCache == NULL) > > > > > > > > > > > > I've done some minor changes to your v2 patch and updated the commit > > > > > message. IIUC this patch needs to be backpatched to v15. Please review > > > > > the attached patch. > > > > > > > > > > > > > I verified that the bug exists since v15 as reported. Despite of the test case > > > > provided by Vignesh (which I attached a modified version to be used in v15 or > > > > later), I also added another test case that has a similar problem with > > > > generated columns. This 2nd test case only works for v18 (where the feature was > > > > introduced). This patch also fixes this case. > > > > > > Thank you for the tests! > > > > I also have verified that the test works till the PG15 branch. > > > > > > I'm curious about other cases related to RelationSyncCache. Is there any other > > > > cases that this patch doesn't fix? > > > > > > I think that with the patch we can ensure to clean up > > > RelationSyncCache alongside with other memory contexts used in > > > pgoutput, fixing problems stemming from RelationSyncCache referencing > > > already-freed-memory in the pgoutput memory contexts. > > > > > > IIUC there is another memory leak issue in pgoutput as I reported on > > > this thread[1]. It should be fixed in a separate patch. > > > > +1 to fix it as a separate patch. > > > > > > This patch looks good to me. Do we really need a new function with the same > > > > content as pgoutput_shutdown? > > > > > > Probably we can call pgoutput_shutdown() in rel_sync_cache_reset_cb(). > > > > Changed it to pgoutput_shutdown as it has the same functionality > > > > > > I don't like mcallback. It seems 'm' stands for > > > > 'memory' but if we want to use crypt names I would suggest 'mcb' (_m_emory > > > > _c_all_b_ack) -- that is also a crypt name but a shorter one. Same pattern is > > > > already used in another place with MemoryContextCallback. > > > > > > I think it's better to use 'mcallback' since back branches are already > > > using it (see commit bbe68c13a for example). > > > > Yes, that is better > > > > > While considering > > > backpatching this patch, I noticed that the memory context reset > > > callback function should be registered to ctx->context but not > > > ctx->cachectx. > > > > Modified > > > > The attached v4 version patch has the changes for the same. > > Thank you for updating the patch! I was working on creating patches > for backbranches too, so let me share these patches. > > I've made some cosmetic changes to align back branch codes, and > attached the patches for master to v15. > > One thing we might want to consider is for v14 and v13. They don't > have this bug since the entry_ctx was introduced in v15, but it's > still true for them that RelationSyncCache is not cleaned up in error > cases if pgoutput is used via SQL API. For example, if > RelationSyncCache hash table gets corrupted for some reason, logical > decoding could repeat an error until logical decoding completes > successfully and its shutdown callback is called. Also, it might be a > good idea in general to ensure cleaning up the hash table after use. Agreed, let's backpatch to PG13. Should we also add a test case in the master branch, given that this issue has been around for a while? Regards, Vignesh
pgsql-hackers by date: