Home > mailing lists

Re: error context for vacuum to include block number - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: error context for vacuum to include block number
Date	March 27, 2020 06:20:30
Msg-id	CAA4eK1LKYnDV8bcx59TdR5-s2Cfk75g_wfAjRyy-cL8S+WvHXw@mail.gmail.com Whole thread Raw
In response to	Re: error context for vacuum to include block number (Justin Pryzby <pryzby@telsasoft.com>)
Responses	Re: error context for vacuum to include block number
List	pgsql-hackers

Tree view

On Fri, Mar 27, 2020 at 11:46 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Thu, Mar 26, 2020 at 11:44:24PM -0500, Justin Pryzby wrote:
> > On Fri, Mar 27, 2020 at 09:49:29AM +0530, Amit Kapila wrote:
> > > On Fri, Mar 27, 2020 at 3:47 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > >
> > > > > Hm, I was just wondering what happens if an error happens *during*
> > > > > update_vacuum_error_cbarg().  It seems like if we set
> > > > > errcbarg->phase=VACUUM_INDEX before setting errcbarg->indname=indname, then an
> > > > > error would cause a crash.
> > > > >
> > >
> > > Can't that be avoided if you check if cbarg->indname is non-null in
> > > vacuum_error_callback as we are already doing for
> > > VACUUM_ERRCB_PHASE_TRUNCATE?
> > >
> > > > >  And if we pfree and set indname before phase, it'd
> > > > > be a problem when going from an index phase to non-index phase.
> > >
> > > How is it possible that we move to the non-index phase without
> > > clearing indname as we always revert back the old phase information?
> >
> > The crash scenario I'm trying to avoid would be like statement_timeout or other
> > asynchronous event occurring between two non-atomic operations.
> >
> > I said that there's an issue no matter what order we set indname/phase;
> > If we wrote:
> > |cbarg->indname = indname;
> > |cbarg->phase = phase;
> > ..and hit a timeout (or similar) between setting indname=NULL but before
> > setting phase=VACUUM_INDEX, then we can crash due to null pointer.
> >
> > But if we write:
> > |cbarg->phase = phase;
> > |if (cbarg->indname) {pfree(cbarg->indname);}
> > |cbarg->indname = indname ? pstrdup(indname) : NULL;
> > ..then we can still crash if we timeout between freeing cbarg->indname and
> > setting it to null, due to acccessing a pfreed allocation.
>
> If "phase" is updated before "indname", I'm able to induce a synthetic crash
> like this:
>
> +if (errinfo->phase==VACUUM_ERRCB_PHASE_VACUUM_INDEX && errinfo->indname==NULL)
> +{
> +kill(getpid(), SIGINT);
> +pg_sleep(1); // that's needed since signals are delivered asynchronously
> +}
>
> And another crash if we do this after pfree but before setting indname.
>
> +if (errinfo->phase==VACUUM_ERRCB_PHASE_VACUUM_INDEX && errinfo->indname!=NULL)
> +{
> +kill(getpid(), SIGINT);
> +pg_sleep(1);
> +}
>
> I'm not sure if those are possible outside of "induced" errors.  Maybe the
> function is essentially atomic due to no CHECK_FOR_INTERRUPTS or similar?
>

Yes, this is exactly the point.  I think unless you have
CHECK_FOR_INTERRUPTS in that function, the problems you are trying to
think won't happen.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Justin Pryzby
Date: 27 March 2020, 06:16:01
Subject: Re: error context for vacuum to include block number

From: Andres Freund
Date: 27 March 2020, 06:29:27
Subject: Re: backup manifests

Re: error context for vacuum to include block number - Mailing list pgsql-hackers

Previous

Next