Re: Postgres, fsync, and OSs (specifically linux) - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Postgres, fsync, and OSs (specifically linux)
Date
Msg-id CAEepm=17CeKmRXensshd7mux1jUCz8KXHDHjihDjYRbf-HUfBA@mail.gmail.com
Whole thread Raw
In response to Re: Postgres, fsync, and OSs (specifically linux)  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Postgres, fsync, and OSs (specifically linux)
List pgsql-hackers
On Fri, Nov 9, 2018 at 9:06 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Nov 8, 2018 at 3:04 PM Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
> > My reasoning for choosing bms_join() is that it cannot fail, assuming
> > the heap is not corrupted.  It simply ORs the two bit-strings into
> > whichever is the longer input string, and frees the shorter input
> > string.  (In an earlier version I used bms_union(), this function's
> > non-destructive sibling, but then realised that it could fail to
> > allocate() causing us to lose track of a 1 bit).
>
> Oh, OK.  I was assuming it was allocating.

I did some more testing using throw-away fault injection patch 0003.
I found one extra problem:  fsync_fname() needed data_sync_elevel()
treatment, because it is used in eg CheckPointCLOG().

With data_sync_retry = on, if you update a row, touch
/tmp/FileSync_EIO and try to checkpoint then the checkpoint fails, and
the cluster keeps running.  Future checkpoint attempts report the same
error about the same file, showing that patch 0001 works (we didn't
forget about the dirty file).  Then rm /tmp/FileSync_EIO, and the next
checkpoint should succeed.

With data_sync_retry = off (the default), the same test produces a
PANIC, showing that patch 0002 works.

It's similar if you touch /tmp/pg_sync_EIO instead.  That shows that
cases like fsync_fname("pg_xact") also cause PANIC when
data_sync_retry = off, but it hides the bug that 0001 fixes when
data_sync_retry = on, hence my desire to test the two different fault
injection points.

I think these patches are looking good now.  If I don't spot any other
problems or hear any objections, I will commit them tomorrow-ish.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Dmitry Dolgov
Date:
Subject: Re: New GUC to sample log queries
Next
From: Alvaro Herrera
Date:
Subject: Re: Psql patch to show access methods info