Re: error context for vacuum to include block number - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: error context for vacuum to include block number |
Date | |
Msg-id | CAA4eK1LKYnDV8bcx59TdR5-s2Cfk75g_wfAjRyy-cL8S+WvHXw@mail.gmail.com Whole thread Raw |
In response to | Re: error context for vacuum to include block number (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: error context for vacuum to include block number
|
List | pgsql-hackers |
On Fri, Mar 27, 2020 at 11:46 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Thu, Mar 26, 2020 at 11:44:24PM -0500, Justin Pryzby wrote: > > On Fri, Mar 27, 2020 at 09:49:29AM +0530, Amit Kapila wrote: > > > On Fri, Mar 27, 2020 at 3:47 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > > > > > Hm, I was just wondering what happens if an error happens *during* > > > > > update_vacuum_error_cbarg(). It seems like if we set > > > > > errcbarg->phase=VACUUM_INDEX before setting errcbarg->indname=indname, then an > > > > > error would cause a crash. > > > > > > > > > > > Can't that be avoided if you check if cbarg->indname is non-null in > > > vacuum_error_callback as we are already doing for > > > VACUUM_ERRCB_PHASE_TRUNCATE? > > > > > > > > And if we pfree and set indname before phase, it'd > > > > > be a problem when going from an index phase to non-index phase. > > > > > > How is it possible that we move to the non-index phase without > > > clearing indname as we always revert back the old phase information? > > > > The crash scenario I'm trying to avoid would be like statement_timeout or other > > asynchronous event occurring between two non-atomic operations. > > > > I said that there's an issue no matter what order we set indname/phase; > > If we wrote: > > |cbarg->indname = indname; > > |cbarg->phase = phase; > > ..and hit a timeout (or similar) between setting indname=NULL but before > > setting phase=VACUUM_INDEX, then we can crash due to null pointer. > > > > But if we write: > > |cbarg->phase = phase; > > |if (cbarg->indname) {pfree(cbarg->indname);} > > |cbarg->indname = indname ? pstrdup(indname) : NULL; > > ..then we can still crash if we timeout between freeing cbarg->indname and > > setting it to null, due to acccessing a pfreed allocation. > > If "phase" is updated before "indname", I'm able to induce a synthetic crash > like this: > > +if (errinfo->phase==VACUUM_ERRCB_PHASE_VACUUM_INDEX && errinfo->indname==NULL) > +{ > +kill(getpid(), SIGINT); > +pg_sleep(1); // that's needed since signals are delivered asynchronously > +} > > And another crash if we do this after pfree but before setting indname. > > +if (errinfo->phase==VACUUM_ERRCB_PHASE_VACUUM_INDEX && errinfo->indname!=NULL) > +{ > +kill(getpid(), SIGINT); > +pg_sleep(1); > +} > > I'm not sure if those are possible outside of "induced" errors. Maybe the > function is essentially atomic due to no CHECK_FOR_INTERRUPTS or similar? > Yes, this is exactly the point. I think unless you have CHECK_FOR_INTERRUPTS in that function, the problems you are trying to think won't happen. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: