Home > mailing lists

Re: Proposal: "Causal reads" mode for load balancing reads without stale data - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: Proposal: "Causal reads" mode for load balancing reads without stale data
Date	March 6, 2016 23:12:13
Msg-id	CAEepm=3D_qNup8xFKNV_mdbX_z99S8pzWLZVXkN2hmzqx_3dkQ@mail.gmail.com Whole thread Raw
In response to	Re: Proposal: "Causal reads" mode for load balancing reads without stale data (Michael Paquier <michael.paquier@gmail.com>)
List	pgsql-hackers

Tree view

On Thu, Mar 3, 2016 at 7:34 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Mar 1, 2016 at 11:53 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> On Tue, Mar 1, 2016 at 2:46 PM, Amit Langote
>> <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>>>
>>> Hi,
>>>
>>> On 2016/02/29 18:05, Thomas Munro wrote:
>>>> On Mon, Feb 29, 2016 at 9:05 PM, Amit Langote wrote:
>>>>> +         servers.  A transaction that is run with
>>>>> <varname>causal_reads</> set
>>>>> +         to <literal>on</> is guaranteed either to see the effects of all
>>>>> +         completed transactions run on the primary with the setting on, or to
>>>>> +         receive an error "standby is not available for causal reads".
>>>>>
>>>>> "A transaction that is run" means "A transaction that is run on a
>>>>> standby", right?
>>>>
>>>> Well, it could be any server, standby or primary.  Of course standbys
>>>> are the interesting case since it it was already true that if you run
>>>> two sequential transactions run on the primary, the second can see the
>>>> effect of the first, but I like the idea of a general rule that
>>>> applies anywhere, allowing you not to care which server it is.
>>>
>>> I meant actually in context of that sentence only.
>>
>> Ok, here's a new version that includes that change, fixes a conflict
>> with recent commit 10b48522 and removes an accidental duplicate copy
>> of the README file.
>
> Finally I got my eyes on this patch.
>
> To put it short, this patch introduces two different concepts:
> - addition of a new value, remote_apply, for synchronous_commit, which
> is actually where things overlap a bit with the N-sync patch, because
> by combining the addition of remote_apply with the N-sync patch, it is
> possible to ensure that an LSN is applied to multiple targets instead
> of one now. In this case, as the master will wait for the LSN to be
> applied on the sync standby, there is no need for the application to
> have error handling in case a read transaction is happening on the
> standby as the change is already visible on the standby when
> committing it on master. However the cost here is that if the standby
> node lags behind, it puts some extra wait as well on the master side.
> - casual reads, which makes the master able to drop standbys that are
> lagging too much behind and let callers of standbys know that it is
> lagging to much behind and cannot satisfy causal reads. In this case
> error handling is needed by the application in case a standby is
> lagging to much behind.
>
> That's one of my concerns about this patch now: it is trying to do too
> much. I think that there is definitely a need for both things:
> applications may be fine to pay the lagging price when remote_apply is
> used by not having an extra error handling layer, or they cannot
> accept a perhaps large of lag and are ready to have an extra error
> handling layer to manage read failures on standbys. So I would
> recommend a split to begin with:
> 1) Addition of remote_apply in synchronous_commit, which would be
> quite a simple patch, and I am convinced that there are benefits in
> having that. Compared to the previous patch sent, a standby message is
> sent back to the master once COMMIT has been applied, accelerating
> things a bit.
> 2) Patch for causal reads, with all its extra parametrization logic
> and stuff to select standby candidates.

Thanks.  Yes, this makes a lot of sense.  I have done some work on
splitting this up and will post the result soon, along with my
responses to your other feedback.

> Here is as well a more detailed review...
>
> Regression tests are failing, rules.sql is generating diffs because
> pg_stat_replication is changed.
>
> CausalReadsWaitForLSN() should be called for 2PC I think, for PREPARE,
> COMMIT PREPARED and ROLLBACK PREPARED. WAL replay for 2PC should also
> call XLogRequestWalReceiverReply() when needed.
>
> The new parameters are missing from postgresql.conf.sample.
>
> +static bool
> +SyncRepCheckEarlyExit(void)
> +{
> Doing this refactoring would actually make sense as a separate patch I
> think, and that would simplify the core patch for causal reads.
>
> +For this reason we say that the causal reads guarantee only holds as
> +long as the absolute difference between the system clocks of the
> +machines is no more than max_clock_skew.  The theory is that NTP makes
> +it possible to reason about the maximum possible clock difference
> +between machines and choose a value that allows for a much larger
> +difference.  However, we do make a best effort attempt to detect
> +misconfigured systems as described above, to catch the case of servers
> +not running ntp a correctly configured ntp daemon, or with a clock so
> +far out of whack that ntp refuses to fix it.
> Just wondering how this reacts when standby and master are on
> different timezones. I know of two ways to measure WAL lag: one is
> what you are doing, by using a timestamp and rely on the two servers
> to be in sync at clock level. The second is in bytes with a WAL
> quantity, though it is less user-friendly to set up, say max_wal_lag
> or similar, symbolized by the number of WAL segments the standby is
> lagging behind, the concerns regarding clock sync across nodes go
> away. To put it short, I find the timestamp approach easy to set up
> and understand for the user, but not really solid as it depends much
> on the state dependency between different hosts, while a difference
> between flush and apply LSN positions is a quite independent concept.
> So basically what would be used as a base comparison is not the
> timestamp of the transaction commit but the flush LSN at the moment
> commit has been replayed.
>
> +   /*
> +    * If a causal_reads_timeout is configured, it is used instead of
> +    * wal_sender_timeout.  Ideally we'd use causal_reads_timeout / 2 +
> +    * allowance for network latency, but since walreceiver can become quite
> +    * bogged down fsyncing WAL we allow more tolerance.  (This could be
> +    * tightened up once standbys hand writing off to the WAL writer).
> +    */
> +   if (causal_reads_timeout != 0)
> +       allowed_time = causal_reads_timeout;
> +   else
> +       allowed_time = wal_sender_timeout;
> I find that surprising, for two reasons:
> 1) it seems to me that causal_read_timeout has in concept no relation
> with WAL sender process control.
> 2) A standby should still be able to receive WAL even if it cannot
> satisfy causal reads to give it a chance to catch up faster the amount
> it is late.
>
> -   SYNCHRONOUS_COMMIT_REMOTE_FLUSH     /* wait for local and remote flush */
> +   SYNCHRONOUS_COMMIT_REMOTE_FLUSH,    /* wait for local and remote flush */
> +   SYNCHRONOUS_COMMIT_REMOTE_APPLY,    /* wait for local flush and remote
> +                                        * apply */
> +   SYNCHRONOUS_COMMIT_CONSISTENT_APPLY /* wait for local flusha and remote
> +                                          apply with causal consistency */
> SYNCHRONOUS_COMMIT_CONSISTENT_APPLY is used nowhere, and there is a
> typo s/flusha/flush a/.
>
> I am still digging into the patch, the available/joining/unavailable
> logic being quite interesting.
> --
> Michael



-- 
Thomas Munro
http://www.enterprisedb.com

pgsql-hackers by date:

From: Tom Lane
Date: 06 March 2016, 22:57:21
Subject: Re: character_not_in_repertoire vs. untranslatable_character

From: Fabrízio de Royes Mello
Date: 07 March 2016, 01:09:26
Subject: gzclose don't set properly the errno in pg_restore

Re: Proposal: "Causal reads" mode for load balancing reads without stale data - Mailing list pgsql-hackers

Previous

Next