Re: Support for N synchronous standby servers - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Support for N synchronous standby servers |
Date | |
Msg-id | CAHGQGwEJeKFTnF+TG_KA5B+Z+U6U9r+SC3bXgDLt9GiNoBda8A@mail.gmail.com Whole thread Raw |
In response to | Re: Support for N synchronous standby servers (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: Support for N synchronous standby servers
|
List | pgsql-hackers |
On Wed, Aug 13, 2014 at 4:10 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Aug 13, 2014 at 2:10 PM, Fujii Masao <masao.fujii@gmail.com> wrote: >> I sent the SIGSTOP signal to the walreceiver process in one of sync standbys, >> and then ran write transactions again. In this case, they must not be completed >> because their WAL cannot be replicated to the standby that its walreceiver >> was stopped. But they were successfully completed. > > At the end of SyncRepReleaseWaiters, SYNC_REP_WAIT_WRITE and > SYNC_REP_WAIT_FLUSH in walsndctl were able to update with only one wal > sender in sync, making backends wake up even if other standbys did not > catch up. But we need to scan all the synchronous wal senders and find > the minimum write and flush positions and update walsndctl with those > values. Well that's a code path I forgot to cover. > > Attached is an updated patch fixing the problem you reported. + At any one time there will be at a number of active synchronous standbys + defined by <varname>synchronous_standby_num</>; transactions waiting It's better to use <xref linkend="guc-synchronous-standby-num">, instead. + for commit will be allowed to proceed after those standby servers + confirms receipt of their data. The synchronous standbys will be Typo: confirms -> confirm + <para> + Specifies the number of standbys that support + <firstterm>synchronous replication</>, as described in + <xref linkend="synchronous-replication">, and listed as the first + elements of <xref linkend="guc-synchronous-standby-names">. + </para> + <para> + Default value is 1. + </para> synchronous_standby_num is defined with PGC_SIGHUP. So the following should be added into the document. This parameter can only be set in the postgresql.conf file or on the server command line. The name of the parameter "synchronous_standby_num" sounds to me that the transaction must wait for its WAL to be replicated to s_s_num standbys. But that's not true in your patch. If s_s_names is empty, replication works asynchronously whether the value of s_s_num is. I'm afraid that it's confusing. The description of s_s_num is not sufficient. I'm afraid that users can easily misunderstand that they can use quorum commit feature by using s_s_names and s_s_num. That is, the transaction waits for its WAL to be replicated to any s_s_num standbys listed in s_s_names. When s_s_num is set to larger value than max_wal_senders, we should warn that? + for (i = 0; i < num_sync; i++) + { + volatile WalSnd *walsndloc = &WalSndCtl->walsnds[sync_standbys[i]]; + + if (min_write_pos > walsndloc->write) + min_write_pos = walsndloc->write; + if (min_flush_pos > walsndloc->flush) + min_flush_pos = walsndloc->flush; + } I don't think that it's safe to see those shared values without spinlock. Regards, -- Fujii Masao
pgsql-hackers by date: