Home > mailing lists

Re: Support for N synchronous standby servers - take 2 - Mailing list pgsql-hackers

From	Fujii Masao
Subject	Re: Support for N synchronous standby servers - take 2
Date	April 4, 2016 08:28:20
Msg-id	CAHGQGwG2Ze0YD=U35bZFQxLFU1cA_=+5v864mLHuvhKER8MkpQ@mail.gmail.com Whole thread Raw
In response to	Re: Support for N synchronous standby servers - take 2 (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses	Re: Support for N synchronous standby servers - take 2 Re: Support for N synchronous standby servers - take 2 Re: Support for N synchronous standby servers - take 2 Re: Support for N synchronous standby servers - take 2
List	pgsql-hackers

Tree view

On Sat, Apr 2, 2016 at 10:20 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, Mar 31, 2016 at 5:11 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Thu, Mar 31, 2016 at 3:55 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> On Wed, Mar 30, 2016 at 11:43 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>> On Tue, Mar 29, 2016 at 5:36 PM, Kyotaro HORIGUCHI
>>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>>>> I personally don't think it needs such a survive measure. It is
>>>>> very small syntax and the parser reads very short text. If the
>>>>> parser failes in such mode, something more serious should have
>>>>> occurred.
>>>>>
>>>>> At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao <masao.fujii@gmail.com> wrote in
<CAHGQGwFth8pnYhaLBx0nF8o4qmwctdzEOcWRqEu7HOwgdJGa3g@mail.gmail.com>
>>>>>> On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
>>>>>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>>>>> > Hello,
>>>>>> >
>>>>>> > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoAJMDV1EUKMfeyaV24arx4pzUjGHYbY4ZxzKpkiCUvh0Q@mail.gmail.com>
>>>>>> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
>>>>>> >> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>>>>> > As mentioned in my comment, SQL parser converts yy_fatal_error
>>>>>> > into ereport(ERROR), which can be caught by the upper PG_TRY (by
>>>>>> > #define'ing fprintf). So it is doable if you mind exit().
>>>>>>
>>>>>> I'm afraid that your idea doesn't work in postmaster. Because ereport(ERROR) is
>>>>>> implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
>>>>>> flex fatal error occurs, postmaster just exits instead of jumping out of parser.
>>>>>
>>>>> If The ERROR may be LOG or DEBUG2 either, if we think the parser
>>>>> fatal erros are recoverable. guc-file.l is doing so.
>>>>>
>>>>>> ISTM that, when an internal flex fatal error occurs, it's
>>>>>> better to elog(FATAL) and terminate the problematic
>>>>>> process. This might lead to the server crash (e.g., if
>>>>>> postmaster emits a FATAL error, it and its all child processes
>>>>>> will exit soon). But probably we can live with this because the
>>>>>> fatal error basically rarely happens.
>>>>>
>>>>> I agree to this
>>>>>
>>>>>> OTOH, if we make the process keep running even after it gets an internal
>>>>>> fatal error (like Sawada's patch or your idea do), this might cause more
>>>>>> serious problem. Please imagine the case where one walsender gets the fatal
>>>>>> error (e.g., because of OOM), abandon new setting value of
>>>>>> synchronous_standby_names, and keep running with the previous setting value.
>>>>>> OTOH, the other walsender processes successfully parse the setting and
>>>>>> keep running with new setting. In this case, the inconsistency of the setting
>>>>>> which each walsender is based on happens. This completely will mess up the
>>>>>> synchronous replication.
>>>>>
>>>>> On the other hand, guc-file.l seems ignoring parser errors under
>>>>> normal operation, even though it may cause similar inconsistency,
>>>>> if any..
>>>>>
>>>>> | LOG:  received SIGHUP, reloading configuration files
>>>>> | LOG:  input in flex scanner failed at file "/home/horiguti/data/data_work/postgresql.conf" line 1
>>>>> | LOG:  configuration file "/home/horiguti/data/data_work/postgresql.conf" contains errors; no changes were
applied
>>>>>
>>>>>> Therefore, I think that it's better to make the problematic process exit
>>>>>> with FATAL error rather than ignore the error and keep it running.
>>>>>
>>>>> +1. Restarting walsender would be far less harmful than keeping
>>>>> it running in doubtful state.
>>>>>
>>>>> Sould I wait for the next version or have a look on the latest?
>>>>>
>>>>
>>>> Attached latest patch incorporate some review comments so far, and is
>>>> rebased against current HEAD.
>>>>
>>>
>>> Sorry I attached wrong patch.
>>> Attached patch is correct patch.

Thanks for updating the patch!

I applied the following changes to the patch.
Attached is the revised version of the patch.

- Changed syncrep_flex_fatal() so that it just calls ereport(FATAL), based on
  the recent discussion with Horiguchi-san.
- Improved the documentation.
- Fixed some bugs.
- Removed the changes for recovery testing framework. I'd like to commit
   those changes later separately from the main patch of multiple sync rep.

Barring any objections, I'll commit this patch.

>> One thing I noticed is that there are LOG messages telling me when a
>> standby becomes a synchronous standby, but nothing to tell me if a
>> standby stops being a standby (ie because a higher priority one has
>> taken its place in the quorum).  Would that be interesting?

+1

>> Also, I spotted some tiny mistakes:
>>
>> +  <indexterm zone="high-availability">
>> +   <primary>Dedicated language for multiple synchornous replication</primary>
>> +  </indexterm>
>>
>> s/synchornous/synchronous/

Confirmed that there is no typo "synchornous" in the latest patch.

>> + /*
>> + * If we are managing the sync standby, though we weren't
>> + * prior to this, then announce we are now the sync standby.
>> + */
>>
>> s/ the / a / (two occurrences)

Fixed.

>> + ereport(LOG,
>> + (errmsg("standby \"%s\" is now the synchronous standby with priority %u",
>> + application_name, MyWalSnd->sync_standby_priority)));
>>
>> s/ the / a /

I have no objection to this change itself. But we have used this message
in 9.5 or before, so if we apply this change, probably we need
back-patching.

>>
>>      offered by a transaction commit. This level of protection is referred
>> -    to as 2-safe replication in computer science theory.
>> +    to as 2-safe replication in computer science theory, and group-1-safe
>> +    (group-safe and 1-safe) when <varname>synchronous_commit</> is set to
>> +    more than <literal>remote_write</>.
>>
>> Why "more than"?  I think those two words should be changed to "at
>> least", or removed.

Removed.

>> +   <para>
>> +    This syntax allows us to define a synchronous group that will wait for at
>> +    least N standbys of them, and a comma-separated list of group
>> members that are surrounded by
>> +    parantheses.  The special value <literal>*</> for server name
>> matches any standby.
>> +    By surrounding list of group members using parantheses,
>> synchronous standbys are chosen from
>> +    that group using priority method.
>> +   </para>
>>
>> s/parantheses/parentheses/ (two occurrences)

Confirmed that this typo doesn't exist in the latest patch.

>>
>> +  <sect2 id="dedicated-language-for-multi-sync-replication-priority">
>> +   <title>Prioirty Method</title>
>>
>> s/Prioirty Method/Priority Method/

Confirmed that this typo doesn't exist in the latest patch.

> A couple more comments:
>
>   /*
> - * If we aren't managing the highest priority standby then just leave.
> + * If the number of sync standbys is less than requested or we aren't
> + * managing the sync standby then just leave.
>   */
> - if (syncWalSnd != MyWalSnd)
> + if (!got_oldest || !am_sync)
>
> s/ the sync / a sync /

Fixed.

> + /*
> + * Consider all pending standbys as sync if the number of them plus
> + * already-found sync ones is lower than the configuration requests.
> + */
> + if (list_length(result) + list_length(pending) <= SyncRepConfig->num_sync)
> + return list_concat(result, pending);
>
> The cells from 'pending' will be attached to 'result', and 'result'
> will be freed by the caller.  But won't the List header object from
> 'pending' be leaked?

Yes if 'result' is not NIL. I added pfree(pending) for that case.

> + result = lappend_int(result, i);
> + if (list_length(result) == SyncRepConfig->num_sync)
> + {
> + list_free(pending);
> + return result; /* Exit if got enough sync standbys */
> + }
>
> If we didn't take the early return in the list-not-long-enough case
> mentioned above, we should *always* exit via this return statement,
> right?  Since we know that the pending list had enough elements to
> reach num_sync.  I think that is worth a comment, and also a "not
> reached" comment at the bottom of the function, if it is true.

Good catch! I added the comments. Also added Assert(false) at
the bottom of the function.

> As a future improvement, I wonder if we could avoid recomputing the
> current set of sync standbys in every walsender every time we call
> SyncRepReleaseWaiters, perhaps by maintaining that set incrementally
> in shmem when walsender states change etc.

+1

> I don't have any other comments, other than to say: thank you to all
> the people who have contributed to this feature so far and I really
> really hope it goes into 9.6!

+1000

Regards,

--
Fujii Masao

Attachment

multi_sync_replication_v22.patch

pgsql-hackers by date:

From: Simon Riggs
Date: 04 April 2016, 08:25:52
Subject: Re: PATCH: use foreign keys to improve join estimates v1

From: Amit Langote
Date: 04 April 2016, 08:45:22
Subject: Re: postgres_fdw : altering foreign table not invalidating prepare statement execution plan.

Re: Support for N synchronous standby servers - take 2 - Mailing list pgsql-hackers

Attachment

Previous

Next