Re: Transactions involving multiple postgres foreign servers, take 2 - Mailing list pgsql-hackers
From | Masahiro Ikeda |
---|---|
Subject | Re: Transactions involving multiple postgres foreign servers, take 2 |
Date | |
Msg-id | 5b80c9a3-2ce8-1c2b-65a3-e2b82b95331e@oss.nttdata.com Whole thread Raw |
In response to | Re: Transactions involving multiple postgres foreign servers, take 2 (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Transactions involving multiple postgres foreign servers, take 2
|
List | pgsql-hackers |
On 2021/05/21 13:45, Masahiko Sawada wrote: > On Fri, May 21, 2021 at 12:45 PM Masahiro Ikeda > <ikedamsh@oss.nttdata.com> wrote: >> >> >> >> On 2021/05/21 10:39, Masahiko Sawada wrote: >>> On Thu, May 20, 2021 at 1:26 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: >>>> >>>> >>>> On 2021/05/11 13:37, Masahiko Sawada wrote: >>>>> I've attached the updated patches that incorporated comments from >>>>> Zhihong and Ikeda-san. >>>> >>>> Thanks for updating the patches! >>>> >>>> >>>> I have other comments including trivial things. >>>> >>>> >>>> a. about "foreign_transaction_resolver_timeout" parameter >>>> >>>> Now, the default value of "foreign_transaction_resolver_timeout" is 60 secs. >>>> Is there any reason? Although the following is minor case, it may confuse some >>>> users. >>>> >>>> Example case is that >>>> >>>> 1. a client executes transaction with 2PC when the resolver is processing >>>> FdwXactResolverProcessInDoubtXacts(). >>>> >>>> 2. the resolution of 1st transaction must be waited until the other >>>> transactions for 2pc are executed or timeout. >>>> >>>> 3. if the client check the 1st result value, it should wait until resolution >>>> is finished for atomic visibility (although it depends on the way how to >>>> realize atomic visibility.) The clients may be waited >>>> foreign_transaction_resolver_timeout". Users may think it's stale. >>>> >>>> Like this situation can be observed after testing with pgbench. Some >>>> unresolved transaction remains after benchmarking. >>>> >>>> I assume that this default value refers to wal_sender, archiver, and so on. >>>> But, I think this parameter is more like "commit_delay". If so, 60 seconds >>>> seems to be big value. >>> >>> IIUC this situation seems like the foreign transaction resolution is >>> bottle-neck and doesn’t catch up to incoming resolution requests. But >>> how foreignt_transaction_resolver_timeout relates to this situation? >>> foreign_transaction_resolver_timeout controls when to terminate the >>> resolver process that doesn't have any foreign transactions to >>> resolve. So if we set it several milliseconds, resolver processes are >>> terminated immediately after each resolution, imposing the cost of >>> launching resolver processes on the next resolution. >> >> Thanks for your comments! >> >> No, this situation is not related to the foreign transaction resolution is >> bottle-neck or not. This issue may happen when the workload has very few >> foreign transactions. >> >> If new foreign transaction comes while the transaction resolver is processing >> resolutions via FdwXactResolverProcessInDoubtXacts(), the foreign transaction >> waits until starting next transaction resolution. If next foreign transaction >> doesn't come, the foreign transaction must wait starting resolution until >> timeout. I mentioned this situation. > > Thanks for your explanation. I think that in this case we should set > the latch of the resolver after preparing all foreign transactions so > that the resolver process those transactions without sleep. Yes, your idea is much better. Thanks! >> >> Thanks for letting me know the side effect if setting resolution timeout to >> several milliseconds. I agree. But, why termination is needed? Is there a >> possibility to stale like walsender? > > The purpose of this timeout is to terminate resolvers that are idle > for a long time. The resolver processes don't necessarily need to keep > running all the time for every database. On the other hand, launching > a resolver process per commit would be a high cost. So we have > resolver processes keep running at least for > foreign_transaction_resolver_timeout. Understood. I think it's reasonable. >>>> >>>> >>>> b. about performance bottleneck (just share my simple benchmark results) >>>> >>>> The resolver process can be performance bottleneck easily although I think >>>> some users want this feature even if the performance is not so good. >>>> >>>> I tested with very simple workload in my laptop. >>>> >>>> The test condition is >>>> * two remote foreign partitions and one transaction inserts an entry in each >>>> partitions. >>>> * local connection only. If NW latency became higher, the performance became >>>> worse. >>>> * pgbench with 8 clients. >>>> >>>> The test results is the following. The performance of 2PC is only 10% >>>> performance of the one of without 2PC. >>>> >>>> * with foreign_twophase_commit = requried >>>> -> If load with more than 10TPS, the number of unresolved foreign transactions >>>> is increasing and stop with the warning "Increase >>>> max_prepared_foreign_transactions". >>> >>> What was the value of max_prepared_foreign_transactions? >> >> Now, I tested with 200. >> >> If each resolution is finished very soon, I thought it's enough because >> 8clients x 2partitions = 16, though... But, it's difficult how to know the >> stable values. > > During resolving one distributed transaction, the resolver needs both > one round trip and fsync-ing WAL record for each foreign transaction. > Since the client doesn’t wait for the distributed transaction to be > resolved, the resolver process can be easily bottle-neck given there > are 8 clients. > > If foreign transaction resolution was resolved synchronously, 16 would suffice. OK, thanks. >> >> >>> To speed up the foreign transaction resolution, some ideas have been >>> discussed. As another idea, how about launching resolvers for each >>> foreign server? That way, we resolve foreign transactions on each >>> foreign server in parallel. If foreign transactions are concentrated >>> on the particular server, we can have multiple resolvers for the one >>> foreign server. It doesn’t change the fact that all foreign >>> transaction resolutions are processed by resolver processes. >> >> Awesome! There seems to be another pros that even if a foreign server is >> temporarily busy or stopped due to fail over, other foreign server's >> transactions can be resolved. > > Yes. We also might need to be careful about the order of foreign > transaction resolution. I think we need to resolve foreign> transactions in arrival order at least within a foreign server. I agree it's better. (Although this is my interest...) Is it necessary? Although this idea seems to be for atomic visibility, 2PC can't realize that as you know. So, I wondered that. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
pgsql-hackers by date: