If you call this procedure on a stand-alone server, you get:
postgres=# call pg_wal_replay_wait('1234/0');
ERROR: recovery is not in progress
DETAIL: Recovery ended before replaying target LSN 1234/0; last replay
LSN 0/0.
The DETAIL seems a bit misleading. Recovery never ended, because it
never started in the first place. Last replay LSN is indeed 0/0, but
that's not helpful.
If a standby server has been promoted and you pass an LSN that's earlier
than the last replay LSN, it returns successfully. That makes sense I
guess; if you connect to a standby and wait for it to replay a commit
that you made in the primary, and the standby gets promoted, that seems
correct. But it's a little inconsistent: If the standby crashes
immediately after promotion, and you call pg_wal_replay_wait() after
recovery, it returns success. However, if you shut down the promoted
server and restart it, then last replay LSN is 0/0, and the call will
fail because no recovery happened.
What is the use case for the 'no_error' argument? Why would you ever
want to pass no_error=true ? The separate pg_wal_replay_wait_status()
function feels weird to me. Also it surely shouldn't be marked IMMUTABLE
nor parallel safe.
This would benefit from more documentation, explaining how you would use
this in practice. I believe the use case is that you want "read your
writes" consistency between a primary and a standby. You commit a
transaction in the primary, and you then want to run read-only queries
in a standby, and you want to make sure that you see your own commit,
but you're ok with slightly delayed results otherwise. It would be good
to add a chapter somewhere in the docs to show how to do that in
practice with these functions.
--
Heikki Linnakangas
Neon (https://neon.tech)