Thread: SERIALIZABLE on standby servers

SERIALIZABLE on standby servers

From

Thomas Munro

Date:

08 November 2016, 04:56:58

Hi hackers

Here is an experimental WIP patch to allow SERIALIZABLE READ ONLY
DEFERRABLE transactions on standby servers without serialisation
anomalies, based loosely on an old email from Kevin Grittner[1].  I'm
not sure how far this is from what he had in mind or whether I've
misunderstood something fundamental here, but I hope this can at least
serve as a starting point and we can try to get something into
Postgres 10.

The patch works by teaching the standby how to do somethings similar
to what SERIALIZABLE READ ONLY DEFERRABLE does on the primary server,
with some help from the primary server in the form of extra
information in the WAL.

The basic idea is: the standby should wait until a point in the
transaction stream where it can take a snapshot and either (1) there
were no concurrent read/write SERIALIZABLE transactions running on the
primary, or (2) the last concurrent read/write SERIALIZABLE
transaction at snapshot time has now ended without creating a
dangerous cycle with our transaction.

In case (1), the primary sometimes adds an extra
xl_xact_snapshot_safety struct to commit messages which says 'a
snapshot taken after this commit and before any future SSI commits is
safe, because there are no concurrent read/write SSI transactions at
this moment'.

In case (2), the xl_xact_snapshot_safety struct embedded in a commit
record instead says 'a snapshot taken after this commit and before any
future SSI commits is of unknown safety, because there are concurrent
transactions; I'll tell you when it has been determined; please
remember this token'.  The token (which happens to be a CSN but that
is not important) will appear in a future independent snapshot safety
message which says whether a snapshot taken at that time is safe or
unsafe.

Note that xl_xact_snapshot_safety is embedded in the commit messages
(for SSI transactions only), but also appears as its own WAL record to
report the final state of a token from an earlier commit.  So if you
do a lot of non-overlapping writable SSI transactions, you'll get just
a few extra bytes in each commit record, but overlapping transactions
will generate a stream of extra snapshot safety messages, one for each
commit involved.

In order to generate follow-up snapshot safety messages, the patch
creates  'hypothetical' transactions on the primary whenever a
writeable SSI transaction commits, so that it can figure out whether
such a transaction would conflict.  These phantom transactions are
proxies for any transaction that may be created on a standby at the
same point in the transaction stream (with respect to SSI commits) on
any standby, and survive in memory just until they are found to be
safe or unsafe.

Example of use:

T1 on primary: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
T1 on primary: INSERT INTO foo VALUES ('x');
T2 on primary:   BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
T2 on primary:   INSERT INTO foo VALUES ('x');
T2 on primary:   COMMIT;
T3 on standby:     BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE READ
ONLY DEFERRABLE;
T3 on standby:     SELECT * FROM foo;
T3 on standby:     <...waits...>
T1 on primary: COMMIT;
T3 on standby:     <...continues...>

Not tested much and certainly has bugs and many details to sort out,
but first...  is this sound or could it be made so?  Is there a better
way?

[1] https://www.postgresql.org/message-id/4D3735E30200002500039869%40gw.wicourts.gov

--
Thomas Munro
http://www.enterprisedb.com

Attachment

ssi-standby-v1.patch

Re: SERIALIZABLE on standby servers

From

Thomas Munro

Date:

15 November 2016, 20:26:10

On Tue, Nov 8, 2016 at 5:56 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Here is an experimental WIP patch to allow SERIALIZABLE READ ONLY
> DEFERRABLE transactions on standby servers without serialisation
> anomalies, based loosely on an old email from Kevin Grittner[1].  I'm
> not sure how far this is from what he had in mind or whether I've
> misunderstood something fundamental here, but I hope this can at least
> serve as a starting point and we can try to get something into
> Postgres 10.

While out walking I realised what was wrong with that.  It's going to
take me a while to find the time to get back to this, so I figured I
should share this realisation in case anyone else is interested in the
topic.

The problem is that it determines snapshot safety in
PreCommit_CheckForSerializationFailure, and then races other backends
to XactLogCommitRecord.  It could determine that a hypothetical
snapshot taken after this commit is safe, but then other activity
resulting in a hypothetical snapshot of unknown safety could happen
and be logged before we record our determination in the log.

One solution could be to serialise XactLogCommitRecord for SSI
transactions using SerializableXactHashLock, and determine
hypothetical snapshot safety at the same time, so that commit replay
order matches safety determination order.  But it would suck to add
another point of lock contention to SSI commits.  Another solution
could be to have recovery on the standby detect tokens (CSNs
incremented by PreCommit_CheckForSerializationFailure) arriving out of
order, but I don't know what exactly it should do about that when it
is detected: you shouldn't respect an out-of-order claim of safety,
but then what should you wait for?  Perhaps if the last replayed
commit record before that was marked SNAPSHOT_SAFE then it's OK to
leave it that way, and if it was marked SNAPSHOT_SAFETY_UNKNOWN then
you have to wait for that one to be resolved by a follow-up snapshot
safety message and then rince-and-repeat (take a new snapshot etc).  I
think that might work, but it seems strange to allow random races on
the primary to create extra delays on the standby.  Perhaps there is
some much simpler way to do all this that I'm missing.

Another detail is that standbys that start up from a checkpoint and
don't see any SSI transactions commit don't yet have any snapshot
safety information, but defaulting to assuming that this point is safe
doesn't seem right, so I suspect it needs to be in checkpoints.

Attached is a tidied up version which doesn't try to address the above
problems yet.  When time permits I'll come back to this.

--
Thomas Munro
http://www.enterprisedb.com

Attachment

ssi-standby-v2.patch

Re: [HACKERS] SERIALIZABLE on standby servers

From

Thomas Munro

Date:

20 January 2017, 00:16:00

On Wed, Nov 16, 2016 at 9:26 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Tue, Nov 8, 2016 at 5:56 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
> [..]  Another solution
> could be to have recovery on the standby detect tokens (CSNs
> incremented by PreCommit_CheckForSerializationFailure) arriving out of
> order, but I don't know what exactly it should do about that when it
> is detected: you shouldn't respect an out-of-order claim of safety,
> but then what should you wait for?  Perhaps if the last replayed
> commit record before that was marked SNAPSHOT_SAFE then it's OK to
> leave it that way, and if it was marked SNAPSHOT_SAFETY_UNKNOWN then
> you have to wait for that one to be resolved by a follow-up snapshot
> safety message and then rince-and-repeat (take a new snapshot etc).  I
> think that might work, but it seems strange to allow random races on
> the primary to create extra delays on the standby.  Perhaps there is
> some much simpler way to do all this that I'm missing.
>
> Another detail is that standbys that start up from a checkpoint and
> don't see any SSI transactions commit don't yet have any snapshot
> safety information, but defaulting to assuming that this point is safe
> doesn't seem right, so I suspect it needs to be in checkpoints.
>
> Attached is a tidied up version which doesn't try to address the above
> problems yet.  When time permits I'll come back to this.

I haven't looked at this again yet but a nearby thread reminded me of
another problem with this which I wanted to restate explicitly here in
the context of this patch.  Even without replication in the picture,
there is a race to reach ProcArrayEndTransaction() after
RecordTransactionCommit() runs, which means that the DO history
(normal primary server) and REDO history (recovery) don't always agree
on the order that transactions become visible.  With this patch, this
kind of diverging DO and REDO could allow undetectable read only
serialization anomalies.  I think that ProcArrayEndTransaction() and
RecordTransactionCommit() need to be made atomic in the simple case so
that DO and REDO agree.  Synchronous replication can make that more
likely and it seems like some other approach is probably needed to
delay visibility of not-yet-durable transactions while keeping the
order that transactions become visible the same on all nodes.

Aside from the problems I mentioned in my earlier message (race
between snapshot safety decision and logging order, and lack of
checkpointing of snapshot safety information), it seems like the two
DO vs REDO problems (race to ProcArrayEndTransaction, and deliberately
delayed visibility in syncrep) also need to be addressed before
SERIALIZABLE DEFERRABLE on standbys could make a water tight
guarantee.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] SERIALIZABLE on standby servers

From

Simon Riggs

Date:

17 November 2017, 01:34:50

On 7 November 2016 at 23:56, Thomas Munro <thomas.munro@enterprisedb.com> wrote:

> The patch works by teaching the standby how to do somethings similar
> to what SERIALIZABLE READ ONLY DEFERRABLE does on the primary server,
> with some help from the primary server in the form of extra
> information in the WAL.

This is going in a direction I'm interested in, though I had some
independent thoughts.

> The basic idea is: the standby should wait until a point in the
> transaction stream where it can take a snapshot and either (1) there
> were no concurrent read/write SERIALIZABLE transactions running on the
> primary, or (2) the last concurrent read/write SERIALIZABLE
> transaction at snapshot time has now ended without creating a
> dangerous cycle with our transaction.
>
> In case (1), the primary sometimes adds an extra
> xl_xact_snapshot_safety struct to commit messages which says 'a
> snapshot taken after this commit and before any future SSI commits is
> safe, because there are no concurrent read/write SSI transactions at
> this moment'.

That has the problem that we need to decorate every commit. I'd be
looking for a way to make the default case the safe path, so we add
little new data to WAL.

ISTM possible to insert a wait prior to the commit record when we
detect that our commit would cause a pivot failure, so we actively try
to avoid unsafe orderings.

The purpose of that is to 1) ensure that the WAL ordering is a
Serializable ordering in most cases. The wait would have a timeout for
the cases where we can't determine safety. But is also has purpose 2)
reduce the number of serialization failures on primary.

Once we know that the WAL sequence is serializable we can use that
information for both physical and logical.

> In case (2), the xl_xact_snapshot_safety struct embedded in a commit
> record instead says 'a snapshot taken after this commit and before any
> future SSI commits is of unknown safety, because there are concurrent
> transactions; I'll tell you when it has been determined; please
> remember this token'.  The token (which happens to be a CSN but that
> is not important) will appear in a future independent snapshot safety
> message which says whether a snapshot taken at that time is safe or
> unsafe.
>
> Note that xl_xact_snapshot_safety is embedded in the commit messages
> (for SSI transactions only), but also appears as its own WAL record to
> report the final state of a token from an earlier commit.  So if you
> do a lot of non-overlapping writable SSI transactions, you'll get just
> a few extra bytes in each commit record, but overlapping transactions
> will generate a stream of extra snapshot safety messages, one for each
> commit involved.

I would want to add the Serilializable ordering info, not just "it is safe".

I think we'd need to add a timeout to the deferrable snapshot request,
because it could be hours.

> In order to generate follow-up snapshot safety messages, the patch
> creates  'hypothetical' transactions on the primary whenever a
> writeable SSI transaction commits, so that it can figure out whether
> such a transaction would conflict.  These phantom transactions are
> proxies for any transaction that may be created on a standby at the
> same point in the transaction stream (with respect to SSI commits) on
> any standby, and survive in memory just until they are found to be
> safe or unsafe.

That needs a lot more explanation to determine the soundness of that
theory. README++

I think we need some isolationtester tests that generate various
orderings and then standby tests to check that works.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] SERIALIZABLE on standby servers

From

Simon Riggs

Date:

17 November 2017, 01:52:16

On 19 January 2017 at 16:16, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
> On Wed, Nov 16, 2016 at 9:26 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Tue, Nov 8, 2016 at 5:56 PM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>> [..]  Another solution
>> could be to have recovery on the standby detect tokens (CSNs
>> incremented by PreCommit_CheckForSerializationFailure) arriving out of
>> order, but I don't know what exactly it should do about that when it
>> is detected: you shouldn't respect an out-of-order claim of safety,
>> but then what should you wait for?  Perhaps if the last replayed
>> commit record before that was marked SNAPSHOT_SAFE then it's OK to
>> leave it that way, and if it was marked SNAPSHOT_SAFETY_UNKNOWN then
>> you have to wait for that one to be resolved by a follow-up snapshot
>> safety message and then rince-and-repeat (take a new snapshot etc).  I
>> think that might work, but it seems strange to allow random races on
>> the primary to create extra delays on the standby.  Perhaps there is
>> some much simpler way to do all this that I'm missing.
>>
>> Another detail is that standbys that start up from a checkpoint and
>> don't see any SSI transactions commit don't yet have any snapshot
>> safety information, but defaulting to assuming that this point is safe
>> doesn't seem right, so I suspect it needs to be in checkpoints.
>>
>> Attached is a tidied up version which doesn't try to address the above
>> problems yet.  When time permits I'll come back to this.
>
> I haven't looked at this again yet but a nearby thread reminded me of
> another problem with this which I wanted to restate explicitly here in
> the context of this patch.  Even without replication in the picture,
> there is a race to reach ProcArrayEndTransaction() after
> RecordTransactionCommit() runs, which means that the DO history
> (normal primary server) and REDO history (recovery) don't always agree
> on the order that transactions become visible.  With this patch, this
> kind of diverging DO and REDO could allow undetectable read only
> serialization anomalies.  I think that ProcArrayEndTransaction() and
> RecordTransactionCommit() need to be made atomic in the simple case so
> that DO and REDO agree.

Not atomic, we just need to make ProcArrayEndTransaction() apply
changes in the order of commits.

I think that is more easily possible by reusing the
SyncRepWaitForLSN() code, since that already orders things by LSN.

So make all committers wait and then get WALwriter to wake people
after ProcArrayEndTransaction() has been applied.

> Synchronous replication can make that more
> likely and it seems like some other approach is probably needed to
> delay visibility of not-yet-durable transactions while keeping the
> order that transactions become visible the same on all nodes.
> Aside from the problems I mentioned in my earlier message (race
> between snapshot safety decision and logging order, and lack of
> checkpointing of snapshot safety information), it seems like the two
> DO vs REDO problems (race to ProcArrayEndTransaction, and deliberately
> delayed visibility in syncrep) also need to be addressed before
> SERIALIZABLE DEFERRABLE on standbys could make a water tight
> guarantee.

While the difference in ordering is there, it would be useful to show
how that allows serializable anomalies iff the WAL ordering is already
known serializable.

Mixing robustness modes with access to the same data is avoidable by
design, but we could have a parameter(s) to prevent that if desirable,
but only to prevent documented problems.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] SERIALIZABLE on standby servers

From

Robert Haas

Date:

17 November 2017, 18:03:26

On Thu, Nov 16, 2017 at 5:52 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> I haven't looked at this again yet but a nearby thread reminded me of
>> another problem with this which I wanted to restate explicitly here in
>> the context of this patch.  Even without replication in the picture,
>> there is a race to reach ProcArrayEndTransaction() after
>> RecordTransactionCommit() runs, which means that the DO history
>> (normal primary server) and REDO history (recovery) don't always agree
>> on the order that transactions become visible.  With this patch, this
>> kind of diverging DO and REDO could allow undetectable read only
>> serialization anomalies.  I think that ProcArrayEndTransaction() and
>> RecordTransactionCommit() need to be made atomic in the simple case so
>> that DO and REDO agree.
>
> Not atomic, we just need to make ProcArrayEndTransaction() apply
> changes in the order of commits.
>
> I think that is more easily possible by reusing the
> SyncRepWaitForLSN() code, since that already orders things by LSN.
>
> So make all committers wait and then get WALwriter to wake people
> after ProcArrayEndTransaction() has been applied.

That doesn't solve any problem we have.  Making committers wait after
ProcArrayEndTransaction() wouldn't keep the effects of those
transactions from being visible to other transactions in the system.
And that's precisely the issue: on the primary, transactions become
visible to other transactions in the order in which they perform
ProcArrayEndTransaction(), but on the standby, they become visible in
the order in which they RecordTransactionCommit().  That difference is
a source of serialization anomalies.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] SERIALIZABLE on standby servers

From

Thomas Munro

Date:

21 September 2018, 15:28:50

Hi Kevin, all,

/me pokes ancient thread

I haven't done any more work on the problems mentioned above, but I
ran into Kevin at PostgresOpen in San Francisco and he said he might
have some time to look at this problem.  So, here is a long overdue
rebase of the WIP patch.  It shows a first order approximation of
DEFERRABLE working on a standby (for example, see the sequence from
the first message in the thread[1]).  I'll add it to the next
Commitfest so I know when to rebase it.

[1]
https://www.postgresql.org/message-id/flat/CAEepm%3D2b9TV%2BvJ4UeSBixDrW7VUiTjxPwWq8K3QwFSWx0pTXHQ%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

0001-SERIALIZABLE-READ-ONLY-DEFERRABLE-for-hot-standby-v3.patch

Re: [HACKERS] SERIALIZABLE on standby servers

From

Thomas Munro

Date:

21 September 2018, 16:55:01

On Sat, Sep 22, 2018 at 12:28 AM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> I'll add it to the next
> Commitfest so I know when to rebase it.

And cfbot immediately showed that this assertion in
OldSerXidSetActiveSerXmin() could fail in the isolation tests:

    Assert(!TransactionIdIsValid(oldSerXidControl->tailXid)
           || TransactionIdFollows(xid, oldSerXidControl->tailXid));

Not sure how that ever worked or if I screwed something up while
rebasing, but the quick (and possibly wrong?) solution I found was to
exclude hypothetical SERIALIABLEXACTs when scanning for the new oldest
xid:

@@ -3181,6 +3322,7 @@ SetNewSxactGlobalXmin(void)
        for (sxact = FirstPredXact(); sxact != NULL; sxact =
NextPredXact(sxact))
        {
                if (!SxactIsRolledBack(sxact)
+                       && !SxactIsHypothetical(sxact)
                        && !SxactIsCommitted(sxact)
                        && sxact != OldCommittedSxact)

Here's a version like that in the meantime so that the CI passes.  The
real solution might be to give them their own xid (right now they
"borrow" one, see PreCommit_CheckForSerializationFailure()... now that
I think about it, that must be wrong), when I have more time for this
project.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

0001-SERIALIZABLE-READ-ONLY-DEFERRABLE-for-hot-standby-v4.patch

Re: [HACKERS] SERIALIZABLE on standby servers

From

Kevin Grittner

Date:

25 September 2018, 17:50:57

On Fri, Sep 21, 2018 at 7:29 AM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

> I'll add it to the next Commitfest so I know when to rebase it.

I signed up as reviewer in that CF.

-- 
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

Re: [HACKERS] SERIALIZABLE on standby servers

From

Dmitry Dolgov

Date:

29 November 2018, 14:01:55

> On Fri, Sep 21, 2018 at 2:29 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>
> Hi Kevin, all,
>
> /me pokes ancient thread

This amazing feeling of being like Indiana Jones, thinking whether it's worth
it to touch another ancient artifact.

On Tue, Sep 25, 2018 at 4:51 PM Kevin Grittner <kgrittn@gmail.com> wrote:
>
> On Fri, Sep 21, 2018 at 7:29 AM Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>
> > I'll add it to the next Commitfest so I know when to rebase it.
>
> I signed up as reviewer in that CF.

Great! Can you provide a review already?

Re: [HACKERS] SERIALIZABLE on standby servers

From

Thomas Munro

Date:

28 December 2018, 01:21:44

On Fri, Nov 30, 2018 at 3:01 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> On Tue, Sep 25, 2018 at 4:51 PM Kevin Grittner <kgrittn@gmail.com> wrote:
> > On Fri, Sep 21, 2018 at 7:29 AM Thomas Munro
> > <thomas.munro@enterprisedb.com> wrote:
> > > I'll add it to the next Commitfest so I know when to rebase it.
> >
> > I signed up as reviewer in that CF.
>
> Great! Can you provide a review already?

Just to be clear, although this patch is registered in the commitfest
and currently applies and tests pass, it is prototype/WIP code with
significant problems that remain to be resolved.  I sort of wish there
were a way to indicate that in the CF but since there isn't, I'm
saying that here.  What I hope to get from Kevin, Simon or other
reviewers is some feedback on the general approach and problems
discussed upthread (and other problems and better ideas I might have
missed).  So it's not seriously proposed for commit in this CF.

(Unlike the nearby "SERIALIZABLE with parallel query" thread, which is
about ready to land by all accounts, pending a round of perf testing
if a decent test can be selected.)

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] SERIALIZABLE on standby servers

From

Michael Paquier

Date:

04 February 2019, 06:19:10

On Fri, Dec 28, 2018 at 02:21:44PM +1300, Thomas Munro wrote:
> Just to be clear, although this patch is registered in the commitfest
> and currently applies and tests pass, it is prototype/WIP code with
> significant problems that remain to be resolved.  I sort of wish there
> were a way to indicate that in the CF but since there isn't, I'm
> saying that here.  What I hope to get from Kevin, Simon or other
> reviewers is some feedback on the general approach and problems
> discussed upthread (and other problems and better ideas I might have
> missed).  So it's not seriously proposed for commit in this CF.

No feedback has actually come, so I have moved it to next CF.
--
Michael

Attachment

signature.asc

Re: [HACKERS] SERIALIZABLE on standby servers

From

Thomas Munro

Date:

15 June 2020, 07:36:41

On Mon, Jun 15, 2020 at 5:00 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Fri, Dec 28, 2018 at 02:21:44PM +1300, Thomas Munro wrote:
> > Just to be clear, although this patch is registered in the commitfest
> > and currently applies and tests pass, it is prototype/WIP code with
> > significant problems that remain to be resolved.  I sort of wish there
> > were a way to indicate that in the CF but since there isn't, I'm
> > saying that here.  What I hope to get from Kevin, Simon or other
> > reviewers is some feedback on the general approach and problems
> > discussed upthread (and other problems and better ideas I might have
> > missed).  So it's not seriously proposed for commit in this CF.
>
> No feedback has actually come, so I have moved it to next CF.

Having been nerd-sniped by SSI again, I spent some time this weekend
rebasing this old patch, making a few improvements, and reformulating
the problems to be solved as I see them.  It's very roughly based on
Kevin Grittner and Dan Ports' description of how you could give
SERIALIZABLE a useful meaning on hot standbys.  The short version of
the theory is that you can make it work like SERIALIZABLE READ ONLY
DEFERRABLE by adding a bit of extra information into the WAL stream.

Problems:

1.  As a prerequisite, we'd need to teach primary servers to make
transactions visible in the same order that they log commits.
Otherwise, we permit nonsense like seeing TX1 but not TX2 on the
primary, and TX2 but not TX1 on the replica.  You can probably argue
that our read replicas don't satisfy the lower isolation levels, let
alone serializable.

2.  Similarly, it's probably not OK that
PreCommit_CheckForSerializationFailure() determines
MySerializableXact->snapshotSafetyAfterThisCommit.  That may not
happen in exactly the same order as commits are logged.  Or maybe
there is some argument for why that is OK, based on what we're doing
with prepareSeqNo, or maybe we can do something with that to detect
disorder.

3.  The patch doesn't yet attempt to checkpoint the snapshot safety
state.  That's needed to start up in a sane state, without having to
wait for WAL activity.

4.  XactLogSnapshotSafetyRecord() flushes the WAL an extra time after
a commit is flushed, which I put in for testing; that's silly...
somehow it needs to be better integrated so we don't generate two sync
I/Os in a row.

5.  You probably want a way to turn off the extra WAL records and
SERIALIZABLEXACT consumption if you're using SERIALIZABLE on a primary
but not on the standby.  Or maybe there is some way to make it come on
automatically.

I think I have cleared up the matter of xmin tracking for
"hypothetical" SERIALIZABLEXACTs mentioned earlier.  It's not needed,
so should be set to InvalidTransactionId, and I added a comment to
explain why.

I also wrote a TAP test to exercise this thing.  It is the same
schedule as src/test/isolation/specs/read-only-anomaly-3.spec, except
that transaction 3 runs on a streaming replica.

One thing to point out is that this patch only aims to make it so that
streaming replicas can't observe a state that would have caused a
transaction to abort if it had been observed on the primary.  The TAP
test still has to insert its own wait-for-LSN loop to make sure step
"s1c" is replayed before "s3r" runs.  We could use
synchronous_commit=remote_apply, and that'd probably work just as well
for this particular test, but I'm not sure how to square that with
fixing problem #1 above.

The perl hackery I used to do overlapping transactions in a TAP test
is pretty crufty.  I guess we'd ideally have the isolation tester
support per-session connection strings, and somehow get some perl code
to orchestrate the cluster setup but then run the real isolation
tester.  Or something like that.