Re: checkpointer continuous flushing - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: checkpointer continuous flushing |
Date | |
Msg-id | 20160322091852.GA3790@awork2.anarazel.de Whole thread Raw |
In response to | Re: checkpointer continuous flushing (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: checkpointer continuous flushing
|
List | pgsql-hackers |
Hi, On 2016-03-21 18:46:58 +0100, Tomas Vondra wrote: > I've repeated the tests, but this time logged details for 5% of the > transaction (instead of aggregating the data for each second). I've also > made the tests shorter - just 12 hours instead of 24, to reduce the time > needed to complete the benchmark. > > Overall, this means ~300M transactions in total for the un-throttled case, > so sample with ~15M transactions available when computing the following > charts. > > I've used the same commits as during the previous testing, i.e. a298a1e0 > (before patches) and 23a27b03 (with patches). > > One interesting difference is that while the "patched" version resulted in > slightly better performance (8122 vs. 8000 tps), the "unpatched" version got > considerably slower (6790 vs. 7725 tps) - that's ~13% difference, so not > negligible. Not sure what's the cause - the configuration was exactly the > same, there's nothing in the log and the machine was dedicated to the > testing. The only explanation I have is that the unpatched code is a bit > more unstable when it comes to this type of stress testing. > > There results (including scripts for generating the charts) are here: > > https://github.com/tvondra/flushing-benchmark-2 > > Attached are three charts - again, those are using CDF to illustrate the > distributions and compare them easily: > > 1) regular-latency.png > > The two curves intersect at ~4ms, where both CDF reach ~85%. For the shorter > transactions, the old code is slightly faster (i.e. apparently there's some > per-transaction overhead). For higher latencies though, the patched code is > clearly winning - there are far fewer transactions over 6ms, which makes a > huge difference. (Notice the x-axis is actually log-scale, so the tail on > the old code is actually much longer than it might appear.) > > 2) throttled-latency.png > > In the throttled case (i.e. when the system is not 100% utilized, so it's > more representative of actual production use), the difference is quite > clearly in favor of the new code. > > 3) throttled-schedule-lag.png > > Mostly just an alternative view on the previous chart, showing how much > later the transactions were scheduled. Again, the new code is winning. Thanks for running these tests! I think this shows that we're in a good shape, and that the commits succeeded in what they were attempting. Very glad to hear that. WRT tablespaces: What I'm planning to do, unless somebody has a better proposal, is to basically rent two big amazon instances, and run pgbench in parallel over N tablespaces. Once with local SSD and once with local HDD storage. Greetings, Andres Freund
pgsql-hackers by date: