Re: [HACKERS] [sqlsmith] Crash reading pg_stat_activity - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: [HACKERS] [sqlsmith] Crash reading pg_stat_activity |
Date | |
Msg-id | CAEepm=1vSunMb-sZ1wVYUuUYEj=Uu9qiUBUcnp=cdo7D6cZLbg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] [sqlsmith] Crash reading pg_stat_activity (Andreas Seltenreich <seltenreich@gmx.de>) |
Responses |
Re: [HACKERS] [sqlsmith] Crash reading pg_stat_activity
Re: [HACKERS] [sqlsmith] Crash reading pg_stat_activity |
List | pgsql-hackers |
On Wed, Dec 28, 2016 at 11:38 AM, Andreas Seltenreich <seltenreich@gmx.de> wrote: > Thomas Munro writes: > >> On Wed, Dec 28, 2016 at 10:40 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: >>> On Wed, Dec 28, 2016 at 10:01 AM, Andreas Seltenreich <seltenreich@gmx.de> wrote: >>>> testing master as of fe591f8bf6 produced a crash reading >>>> pg_stat_activity (backtrace below). Digging around with with gdb >>>> revealed that pgstat_get_wait_event() returned an invalid pointer for a >>>> classId PG_WAIT_LWLOCK. >>>> >>>> I think the culprit is dsa.c passing a pointer to memory that goes away >>>> on dsa_free() as a name to LWLockRegisterTranche. > [..] >>> Maybe we should replace it with another value when the DSA area is >>> detached, using a constant string. Something like > > I'm wondering: Is it safe to pass a pointer into a DSA at all? If I > understand the comments correctly, they are not necessarily mapped (at > the same address) in an unrelated backend looking into pg_stat_activity, > and in this case a dsa_free() is not actually needed to trigger a crash. It is safe, as long as the segment remains mapped. Each backend that attaches calls LWLockRegisterTranche giving it the address of the name in its virtual address space. The problem is that after the segment is unmapped, that address is now garbage, which is clearly a bug. That's why I started talking about how to 'unregister' it (or register a replacement constant name). We have some code that always runs before the control segment containing the name is detached, so that'd be the perfect place to do that. >> That line of thinking suggests another potential solution: go and >> register the name in RegisterLWLockTranches, and stop registering it >> in dsa.c. For other potential uses of DSA including extensions that >> call LWLockNewTrancheId() we'd have to decide whether to make the name >> an optional argument, or require those users to register it >> themselves. > > Maybe instead of copying the name, just put the passed pointer itself > into the area? Extensions using LWLockNewTrancheId need to use > shared_preload_libraries anyway, so static strings would be mapped in > all backends. Yeah that would be another way. I had this idea that only the process that creates a DSA area should name it, and then processes attaching would see the existing tranche ID and name, so could use a narrower interface. We could instead do as you say and make processes that attach provide a pointer to the name too, and make it the caller's problem to ensure that the pointers remain valid long enough; or go one step further and make them register/unregister it themselves. But I'm starting to think that the best way might be to do BOTH of the things I said in my previous message: make dsa.c register on create/attach and also unregister before detaching iff the name was supplied at creation time for the benefit of extension writers, but make it not do anything at all about tranche name registration/unregistration if NULL was passed in at create time. Then register this particular tranche (LWTRANCHE_PARALLEL_QUERY_DSA) in every process in RegisterLWLockTranches. That way, you'd get a useful name in pg_stat_activity for other backends that are running parallel queries if they are ever waiting for these locks (unlikely but interesting to know abotu if it happens). -- Thomas Munro http://www.enterprisedb.com
pgsql-hackers by date: