Re: Spread checkpoint sync - Mailing list pgsql-hackers
From | Greg Smith |
---|---|
Subject | Re: Spread checkpoint sync |
Date | |
Msg-id | 4D4259D4.5050207@2ndquadrant.com Whole thread Raw |
In response to | Re: Spread checkpoint sync (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Spread checkpoint sync
|
List | pgsql-hackers |
Robert Haas wrote: > During each cluster, the system probably slows way down, and then recovers when > the queue is emptied. So the TPS improvement isn't at all a uniform > speedup, but simply relief from the stall that would otherwise result > from a full queue. > That does seem to be the case here. http://www.2ndquadrant.us/pgbench-results/index.htm now has results from my a long test series, at two database scales that caused many backend fsyncs during earlier tests. Set #5 is the existing server code, #6 is with the patch applied. There are zero backend fsync calls with the patch applied, which isn't surprising given how simple the schema is on this test case. An average of a 14% TPS gain appears at a scale of 500 and a 8% one at 1000; the attached CSV file summarizes the average figures for the archives. The gains do appear to be from smoothing out the dead period that normally occur during the sync phase of the checkpoint. For example, here are the fastest runs at scale=1000/clients=256 with and without the patch: http://www.2ndquadrant.us/pgbench-results/436/index.html (tps=361) http://www.2ndquadrant.us/pgbench-results/486/index.html (tps=380) Here the difference in how much less of a slowdown there is around the checkpoint end points is really obvious, and obviously an improvement. You can see the same thing to a lesser extent at the other end of the scale; here's the fastest runs at scale=500/clients=16: http://www.2ndquadrant.us/pgbench-results/402/index.html (tps=590) http://www.2ndquadrant.us/pgbench-results/462/index.html (tps=643) Where there are still very ugly maximum latency figures here in every case, these periods just aren't as wide with the patch in place. I'm moving onto some brief testing some of the newer kernel behavior here, then returning to testing the other checkpoint spreading ideas on top of this compation patch, presuming something like it will end up being committed first. I think it's safe to say I can throw away the changes to try and alter the fsync absorption code present in what I submitted before, as this scheme does a much better job of avoiding that problem than those earlier queue alteration ideas. I'm glad Robert grabbed the right one from the pile of ideas I threw out for what else might help here. P.S. Yes, I know I have other review work to do as well. Starting on the rest of that tomorrow. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books ,,"Unmodified",,"Compacted Fsync",,, "scale","clients","tps","max_latency","tps","max_latency","TPS Gain","% Gain" 500,16,557,17963.41,631,17116.31,74,13.3% 500,32,587,25838.8,655,24311.54,68,11.6% 500,64,628,35198.39,727,38040.39,99,15.8% 500,128,621,41001.91,687,48195.77,66,10.6% 500,256,632,49610.39,747,46799.48,115,18.2% ,,,,,,, 1000,16,306,39298.95,321,40826.58,15,4.9% 1000,32,314,40120.35,345,27910.51,31,9.9% 1000,64,334,46244.86,358,45138.1,24,7.2% 1000,128,343,72501.57,372,47125.46,29,8.5% 1000,256,321,80588.63,350,83232.14,29,9.0%
pgsql-hackers by date: