Re: Support for N synchronous standby servers - take 2 - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Support for N synchronous standby servers - take 2 |
Date | |
Msg-id | CAHGQGwG2Ze0YD=U35bZFQxLFU1cA_=+5v864mLHuvhKER8MkpQ@mail.gmail.com Whole thread Raw |
In response to | Re: Support for N synchronous standby servers - take 2 (Thomas Munro <thomas.munro@enterprisedb.com>) |
Responses |
Re: Support for N synchronous standby servers - take 2
Re: Support for N synchronous standby servers - take 2 Re: Support for N synchronous standby servers - take 2 Re: Support for N synchronous standby servers - take 2 |
List | pgsql-hackers |
On Sat, Apr 2, 2016 at 10:20 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Thu, Mar 31, 2016 at 5:11 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> On Thu, Mar 31, 2016 at 3:55 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> On Wed, Mar 30, 2016 at 11:43 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>> On Tue, Mar 29, 2016 at 5:36 PM, Kyotaro HORIGUCHI >>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >>>>> I personally don't think it needs such a survive measure. It is >>>>> very small syntax and the parser reads very short text. If the >>>>> parser failes in such mode, something more serious should have >>>>> occurred. >>>>> >>>>> At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in <CAHGQGwFth8pnYhaLBx0nF8o4qmwctdzEOcWRqEu7HOwgdJGa3g@mail.gmail.com> >>>>>> On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI >>>>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >>>>>> > Hello, >>>>>> > >>>>>> > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoAJMDV1EUKMfeyaV24arx4pzUjGHYbY4ZxzKpkiCUvh0Q@mail.gmail.com> >>>>>> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI >>>>>> >> <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >>>>>> > As mentioned in my comment, SQL parser converts yy_fatal_error >>>>>> > into ereport(ERROR), which can be caught by the upper PG_TRY (by >>>>>> > #define'ing fprintf). So it is doable if you mind exit(). >>>>>> >>>>>> I'm afraid that your idea doesn't work in postmaster. Because ereport(ERROR) is >>>>>> implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal >>>>>> flex fatal error occurs, postmaster just exits instead of jumping out of parser. >>>>> >>>>> If The ERROR may be LOG or DEBUG2 either, if we think the parser >>>>> fatal erros are recoverable. guc-file.l is doing so. >>>>> >>>>>> ISTM that, when an internal flex fatal error occurs, it's >>>>>> better to elog(FATAL) and terminate the problematic >>>>>> process. This might lead to the server crash (e.g., if >>>>>> postmaster emits a FATAL error, it and its all child processes >>>>>> will exit soon). But probably we can live with this because the >>>>>> fatal error basically rarely happens. >>>>> >>>>> I agree to this >>>>> >>>>>> OTOH, if we make the process keep running even after it gets an internal >>>>>> fatal error (like Sawada's patch or your idea do), this might cause more >>>>>> serious problem. Please imagine the case where one walsender gets the fatal >>>>>> error (e.g., because of OOM), abandon new setting value of >>>>>> synchronous_standby_names, and keep running with the previous setting value. >>>>>> OTOH, the other walsender processes successfully parse the setting and >>>>>> keep running with new setting. In this case, the inconsistency of the setting >>>>>> which each walsender is based on happens. This completely will mess up the >>>>>> synchronous replication. >>>>> >>>>> On the other hand, guc-file.l seems ignoring parser errors under >>>>> normal operation, even though it may cause similar inconsistency, >>>>> if any.. >>>>> >>>>> | LOG: received SIGHUP, reloading configuration files >>>>> | LOG: input in flex scanner failed at file "/home/horiguti/data/data_work/postgresql.conf" line 1 >>>>> | LOG: configuration file "/home/horiguti/data/data_work/postgresql.conf" contains errors; no changes were applied >>>>> >>>>>> Therefore, I think that it's better to make the problematic process exit >>>>>> with FATAL error rather than ignore the error and keep it running. >>>>> >>>>> +1. Restarting walsender would be far less harmful than keeping >>>>> it running in doubtful state. >>>>> >>>>> Sould I wait for the next version or have a look on the latest? >>>>> >>>> >>>> Attached latest patch incorporate some review comments so far, and is >>>> rebased against current HEAD. >>>> >>> >>> Sorry I attached wrong patch. >>> Attached patch is correct patch. Thanks for updating the patch! I applied the following changes to the patch. Attached is the revised version of the patch. - Changed syncrep_flex_fatal() so that it just calls ereport(FATAL), based on the recent discussion with Horiguchi-san. - Improved the documentation. - Fixed some bugs. - Removed the changes for recovery testing framework. I'd like to commit those changes later separately from the main patch of multiple sync rep. Barring any objections, I'll commit this patch. >> One thing I noticed is that there are LOG messages telling me when a >> standby becomes a synchronous standby, but nothing to tell me if a >> standby stops being a standby (ie because a higher priority one has >> taken its place in the quorum). Would that be interesting? +1 >> Also, I spotted some tiny mistakes: >> >> + <indexterm zone="high-availability"> >> + <primary>Dedicated language for multiple synchornous replication</primary> >> + </indexterm> >> >> s/synchornous/synchronous/ Confirmed that there is no typo "synchornous" in the latest patch. >> + /* >> + * If we are managing the sync standby, though we weren't >> + * prior to this, then announce we are now the sync standby. >> + */ >> >> s/ the / a / (two occurrences) Fixed. >> + ereport(LOG, >> + (errmsg("standby \"%s\" is now the synchronous standby with priority %u", >> + application_name, MyWalSnd->sync_standby_priority))); >> >> s/ the / a / I have no objection to this change itself. But we have used this message in 9.5 or before, so if we apply this change, probably we need back-patching. >> >> offered by a transaction commit. This level of protection is referred >> - to as 2-safe replication in computer science theory. >> + to as 2-safe replication in computer science theory, and group-1-safe >> + (group-safe and 1-safe) when <varname>synchronous_commit</> is set to >> + more than <literal>remote_write</>. >> >> Why "more than"? I think those two words should be changed to "at >> least", or removed. Removed. >> + <para> >> + This syntax allows us to define a synchronous group that will wait for at >> + least N standbys of them, and a comma-separated list of group >> members that are surrounded by >> + parantheses. The special value <literal>*</> for server name >> matches any standby. >> + By surrounding list of group members using parantheses, >> synchronous standbys are chosen from >> + that group using priority method. >> + </para> >> >> s/parantheses/parentheses/ (two occurrences) Confirmed that this typo doesn't exist in the latest patch. >> >> + <sect2 id="dedicated-language-for-multi-sync-replication-priority"> >> + <title>Prioirty Method</title> >> >> s/Prioirty Method/Priority Method/ Confirmed that this typo doesn't exist in the latest patch. > A couple more comments: > > /* > - * If we aren't managing the highest priority standby then just leave. > + * If the number of sync standbys is less than requested or we aren't > + * managing the sync standby then just leave. > */ > - if (syncWalSnd != MyWalSnd) > + if (!got_oldest || !am_sync) > > s/ the sync / a sync / Fixed. > + /* > + * Consider all pending standbys as sync if the number of them plus > + * already-found sync ones is lower than the configuration requests. > + */ > + if (list_length(result) + list_length(pending) <= SyncRepConfig->num_sync) > + return list_concat(result, pending); > > The cells from 'pending' will be attached to 'result', and 'result' > will be freed by the caller. But won't the List header object from > 'pending' be leaked? Yes if 'result' is not NIL. I added pfree(pending) for that case. > + result = lappend_int(result, i); > + if (list_length(result) == SyncRepConfig->num_sync) > + { > + list_free(pending); > + return result; /* Exit if got enough sync standbys */ > + } > > If we didn't take the early return in the list-not-long-enough case > mentioned above, we should *always* exit via this return statement, > right? Since we know that the pending list had enough elements to > reach num_sync. I think that is worth a comment, and also a "not > reached" comment at the bottom of the function, if it is true. Good catch! I added the comments. Also added Assert(false) at the bottom of the function. > As a future improvement, I wonder if we could avoid recomputing the > current set of sync standbys in every walsender every time we call > SyncRepReleaseWaiters, perhaps by maintaining that set incrementally > in shmem when walsender states change etc. +1 > I don't have any other comments, other than to say: thank you to all > the people who have contributed to this feature so far and I really > really hope it goes into 9.6! +1000 Regards, -- Fujii Masao
Attachment
pgsql-hackers by date: