Re: [HACKERS] Quorum commit for multiple synchronous replication. - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [HACKERS] Quorum commit for multiple synchronous replication. |
Date | |
Msg-id | CAD21AoD8nibM68vt1XBPFYd5ZU61m5D_Cp1rQqsEbH1kETR75A@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Quorum commit for multiple synchronous replication. (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: [HACKERS] Quorum commit for multiple synchronous replication.
|
List | pgsql-hackers |
On Sat, Dec 10, 2016 at 5:17 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, Dec 8, 2016 at 3:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> On Thu, Dec 8, 2016 at 4:39 PM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> On Thu, Dec 8, 2016 at 9:07 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>>> You could do that, but first I would code up the simplest, cleanest >>>> algorithm you can think of and see if it even shows up in a 'perf' >>>> profile. Microbenchmarking is probably overkill here unless a problem >>>> is visible on macrobenchmarks. >>> >>> This is what I would go for! The current code is doing a simple thing: >>> select the Nth element using qsort() after scanning each WAL sender's >>> values. And I think that Sawada-san got it right. Even running on my >>> laptop a pgbench run with 10 sync standbys using a data set that fits >>> into memory, SyncRepGetOldestSyncRecPtr gets at most 0.04% of overhead >>> using perf top on a non-assert, non-debug build. Hash tables and >>> allocations get a far larger share. Using the patch, >>> SyncRepGetSyncRecPtr is at the same level with a quorum set of 10 >>> nodes. Let's kick the ball for now. An extra patch could make things >>> better later on if that's worth it. >> >> Yeah, since the both K and N could be not large these algorithm takes >> almost the same time. And current patch does simple thing. When we >> need over 100 or 1000 replication node the optimization could be >> required. >> Attached latest v9 patch. >> > > Few comments: Thank you for reviewing. > + * In 10.0 we support two synchronization methods, priority and > + * quorum. The number of synchronous standbys that transactions > + * must wait for replies from and synchronization method are specified > + * in synchronous_standby_names. This parameter also specifies a list > + * of standby names, which determines the priority of each standby for > + * being chosen as a synchronous standby. In priority method, the standbys > + * whose names appear earlier in the list are given higher priority > + * and will be considered as synchronous. Other standby servers appearing > + * later in this list represent potential synchronous standbys. If any of > + * the current synchronous standbys disconnects for whatever reason, > + * it will be replaced immediately with the next-highest-priority standby. > + * In quorum method, the all standbys appearing in the list are > + * considered as a candidate for quorum commit. > > In the above description, is priority method represented by FIRST and > quorum method by ANY in the synchronous_standby_names syntax? If so, > it might be better to write about it explicitly. Added description. > > 2. > --- a/src/backend/utils/sort/tuplesort.c > +++ b/src/backend/utils/sort/tuplesort.c > @@ -2637,16 +2637,8 @@ mergeruns(Tuplesortstate *state) > } > > /* > - * Allocate a new 'memtuples' array, for the heap. It will hold one tuple > - * from each input tape. > - */ > - state->memtupsize = numInputTapes; > - state->memtuples = (SortTuple *) palloc(numInputTapes * sizeof(SortTuple)); > - USEMEM(state, GetMemoryChunkSpace(state->memtuples)); > - > - /* > - * Use all the remaining memory we have available for read buffers among > - * the input tapes. > + * Use all the spare memory we have available for read buffers among the > + * input tapes. > > This doesn't belong to this patch. Oops, fixed. > 3. > + * Return the list of sync standbys using according to synchronous method, > > In above sentence, "using according" seems to either incomplete or wrong usage. Fixed. > 4. > + else > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + "invalid synchronization method is specified \"%d\"", > + SyncRepConfig->sync_method)); > > Here, the error message doesn't seem to aligned and you might want to > use errmsg for the same. > Fixed. > 5. > + * In priority method, we need the oldest these positions among sync > + * standbys. In quorum method, we need the newest these positions > + * specified by SyncRepConfig->num_sync. > > /oldest these/oldest of these > /newest these positions specified/newest of these positions as specified Fixed. > Instead of newest, can we consider to use latest? Yeah, I changed it so. > 6. > + int sync_method; /* synchronization method */ > /* member_names contains nmembers consecutive nul-terminated C strings */ > char member_names[FLEXIBLE_ARRAY_MEMBER]; > } SyncRepConfigData; > > Can't we use 1 or 2 bytes to store sync_method information? I changed it to uint8. Attached latest v10 patch incorporated the review comments so far. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: