Re: Improve read_local_xlog_page_guts by replacing polling with latch-based waiting - Mailing list pgsql-hackers

From Xuneng Zhou
Subject Re: Improve read_local_xlog_page_guts by replacing polling with latch-based waiting
Date
Msg-id CABPTF7WuFr6Z7zPMoqgk4BCLs8uA1ihCSDKEq1wbxJJB4Qy+Sg@mail.gmail.com
Whole thread Raw
In response to Re: Improve read_local_xlog_page_guts by replacing polling with latch-based waiting  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Hi,

The following is the split patch set. There are certain limitations to
this simplification effort, particularly in patch 2. The
read_local_xlog_page_guts callback demands more functionality from the
facility than the WAIT FOR patch — specifically, it must wait for WAL
flush events, though it does not require timeout handling. In some
sense, parts of patch 3 can be viewed as a superset of the WAIT FOR
patch, since it installs wake-up hooks in more locations. Unlike the
WAIT FOR patch, which only needs wake-ups triggered by replay,
read_local_xlog_page_guts must also handle wake-ups triggered by WAL
flushes.

Workload characteristics play a key role here. A sorted dlist performs
well when insertions and removals occur in order, achieving O(1)
complexity in the best case. In synchronous replication, insertion
patterns seem generally monotonic with commit LSNs, though not
strictly ordered due to timing variations and contention. When most
insertions remain ordered, a dlist can be efficient. However, as the
number of elements grows and out-of-order insertions become more
frequent, the insertion cost can degrade to O(n) more often.

By contrast, a pairing heap maintains stable O(1) insertion for both
ordered and disordered inputs, with amortized O(log n) removals. Since
LSNs in the WAIT FOR command are likely to arrive in a non-sequential
fashion, the pairing heap introduced in v6 provides more predictable
performance under such workloads.

At this stage (v7), no consolidation between syncrep and xlogwait has
been implemented. This is mainly because the dlist and pairing heap
each works well under different workloads — neither is likely to be
universally optimal. Introducing the facility with a pairing heap
first seems reasonable, as it offers flexibility for future
refactoring: we could later replace dlist with a heap or adopt a
modular design depending on observed workload characteristics.

Best,
Xuneng

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward
Next
From: jian he
Date:
Subject: Re: create table like including storage parameter