Home > mailing lists

Re: [HACKERS] WAL logging problem in 9.4.3? - Mailing list pgsql-hackers

From	Noah Misch
Subject	Re: [HACKERS] WAL logging problem in 9.4.3?
Date	November 18, 2019 04:54:34
Msg-id	20191118045434.GA1173436@rfd.leadboat.com Whole thread Raw
In response to	Re: [HACKERS] WAL logging problem in 9.4.3? (Noah Misch <noah@leadboat.com>)
Responses	Re: [HACKERS] WAL logging problem in 9.4.3? Re: [HACKERS] WAL logging problem in 9.4.3? Re: [HACKERS] WAL logging problem in 9.4.3?
List	pgsql-hackers

Tree view

On Tue, Nov 05, 2019 at 02:53:35PM -0800, Noah Misch wrote:
> I started pre-commit editing on 2019-10-28, and comment+README updates have
> been the largest part of that.  I'll check my edits against the things you
> list here, and I'll share on-list before committing.  I've now marked the CF
> entry Ready for Committer.

Having dedicated many days to that, I am attaching v24nm.  I know of two
remaining defects:

=== Defect 1: gistGetFakeLSN()

When I modified pg_regress.c to use wal_level=minimal for all suites,
src/test/isolation/specs/predicate-gist.spec failed the assertion in
gistGetFakeLSN().  One could reproduce the problem just by running this
sequence in psql:

          begin;
          create table gist_point_tbl(id int4, p point);
          create index gist_pointidx on gist_point_tbl using gist(p);
          insert into gist_point_tbl (id, p)
          select g, point(g*10, g*10) from generate_series(1, 1000) g;

I've included a wrong-in-general hack to make the test pass.  I see two main
options for fixing this:

(a) Introduce an empty WAL record that reserves an LSN and has no other
effect.  Make GiST use that for permanent relations that are skipping WAL.
Further optimizations are possible.  For example, we could use a backend-local
counter (like the one gistGetFakeLSN() uses for temp relations) until the
counter is greater a recent real LSN.  That optimization is probably too
clever, though it would make the new WAL record almost never appear.

(b) Exempt GiST from most WAL skipping.  GiST index build could still skip
WAL, but it would do its own smgrimmedsync() in addition to the one done at
commit.  Regular GiST mutations would test RELPERSISTENCE_PERMANENT instead of
RelationNeedsWal(), and we'd need some hack for index_copy_data() and possibly
other AM-independent code that skips WAL.

Overall, I like the cleanliness of (a).  The main argument for (b) is that it
ensures we have all the features to opt-out of WAL skipping, which could be
useful for out-of-tree index access methods.  (I think we currently have the
features for a tableam to do so, but not for an indexam to do so.)  Overall, I
lean toward (a).  Any other ideas or preferences?

=== Defect 2: repetitive work when syncing many relations

For deleting relfilenodes, smgrDoPendingDeletes() collects a list for
smgrdounlinkall() to pass to DropRelFileNodesAllBuffers(), which is
sophisticated about optimizing the shared buffers scan.  Commit 279628a
introduced that, in 2013.  I think smgrDoPendingSyncs() should do likewise, to
further reduce the chance of causing performance regressions.  (One could,
however, work around the problem by raising wal_skip_threshold.)  Kyotaro, if
you agree, could you modify v24nm to implement that?

Notable changes in v24nm:

- Wrote section "Skipping WAL for New RelFileNode" in
  src/backend/access/transam/README to be the main source concerning the new
  coding rules.

- Updated numerous comments and doc sections.

- Eliminated the pendingSyncs list in favor of a "sync" field in
  pendingDeletes.  I mostly did this to eliminate the possibility of the lists
  getting out of sync.  This removed considerable parallel code for managing a
  second list at end-of-xact.  We now call smgrDoPendingSyncs() only when
  committing or preparing a top-level transaction.

- Whenever code sets an rd_*Subid field of a Relation, it must call
  EOXactListAdd().  swap_relation_files() was not doing so, so the field
  remained set during the next transaction.  I introduced
  RelationAssumeNewRelfilenode() to handle both tasks, and I located the call
  so it also affects the mapped relation case.

- In ALTER TABLE cases where TryReuseIndex() avoided an index rebuild,
  rd_createSubid remained set.  (That happened before this patch, but it has
  been harmless.)  I fixed this in heap_create().

- Made smgrDoPendingSyncs() stop exempting FSM_FORKNUM.  A sync is necessary
  when checksums are enabled.  Observe the precedent that
  RelationCopyStorage() has not been exempting FSM_FORKNUM.

- Pass log_newpage_range() a "false" for page_std, for the same reason
  RelationCopyStorage() does.

- log_newpage_range() ignored its forkNum and page_std arguments, so we logged
  the wrong data for non-main forks.  Before this patch, callers always passed
  MAIN_FORKNUM and "true", hence the lack of complaints.

- Restored table_finish_bulk_insert(), though heapam no longer provides a
  callback.  The API is still well-defined, and other table AMs might have use
  for it.  Removing it feels like a separate proposal.

- Removed TABLE_INSERT_SKIP_WAL.  Any out-of-tree code using it should revisit
  itself in light of this patch.

- Fixed smgrDoPendingSyncs() to reinitialize total_blocks for each relation;
  it was overcounting.

- Made us skip WAL after SET TABLESPACE, like we do after CLUSTER.

- Moved the wal_skip_threshold docs from "Resource Consumption" -> "Disk" to
  "Write Ahead Log" -> "Settings", between similar settings
  wal_writer_flush_after and commit_delay.  The other place I considered was
  "Resource Consumption" -> "Asynchronous Behavior", due to the similarity of
  backend_flush_after.

- Gave each test a unique name.  Changed test table names to be descriptive,
  e.g. test7 became trunc_trig.

- Squashed all patches into one.  Split patches are good when one could
  reasonably choose to push a subset, but that didn't apply here.  I wouldn't
  push a GUC implementation without its documentation.  Since the tests fail
  without the main bug fix, I wouldn't push tests separately.

By the way, based on the comment at zheap_prepare_insert(), I expect zheap
will exempt itself from skipping WAL.  It may stop calling RelationNeedsWAL()
and instead test for RELPERSISTENCE_PERMANENT.

nm

Attachment

skip-wal-v24nm.patch

pgsql-hackers by date:

From: vignesh C
Date: 18 November 2019, 03:43:14
Subject: Re: dropdb --force

From: Pavel Stehule
Date: 18 November 2019, 05:02:44
Subject: Re: dropdb --force

Re: [HACKERS] WAL logging problem in 9.4.3? - Mailing list pgsql-hackers

Attachment

Previous

Next