Re: what to revert - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: what to revert |
Date | |
Msg-id | CACjxUsMMew5_VefF09=Nz2D+6iUYvo=uhesLhZ5L+3WRf8v7Rg@mail.gmail.com Whole thread Raw |
In response to | Re: what to revert (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: what to revert
Re: what to revert |
List | pgsql-hackers |
On Tue, May 10, 2016 at 11:13 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > On 05/10/2016 10:29 AM, Kevin Grittner wrote: >> On Mon, May 9, 2016 at 9:01 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >>> * It's also seems to me the feature greatly amplifies the >>> variability of the results, somehow. It's not uncommon to see >>> results like this: >>> >>> master-10-new-2 235516 331976 133316 155563 133396 >>> >>> where after the first runs (already fairly variable) the >>> performance tanks to ~50%. This happens particularly with higher >>> client counts, otherwise the max-min is within ~5% of the max. >>> There are a few cases where this happens without the feature >>> (i.e. old master, reverted or disabled), but it's usually much >>> smaller than with it enabled (immediate, 10 or 60). See the >>> 'summary' sheet in the ODS spreadsheet. Just to quantify that with standard deviations: standard deviation - revert scale 1 16 32 64 128 100 386 1874 3661 8100 26587 3000 609 2236 4570 8974 41004 10000 257 4356 1350 891 12909 standard deviation - disabled scale 1 16 32 64 128 100 641 1924 2983 12575 9411 3000 206 2321 5477 2380 45779 10000 2236 10376 11439 9653 10436 >>> I don't know what's the problem here - at first I thought that >>> maybe something else was running on the machine, or that >>> anti-wraparound autovacuum kicked in, but that seems not to be >>> the case. There's nothing like that in the postgres log (also >>> included in the .tgz). >> >> I'm inclined to suspect NUMA effects. It would be interesting to >> try with the NUMA patch and cpuset I submitted a while back or with >> fixes in place for the Linux scheduler bugs which were reported >> last month. Which kernel version was this? > > I can try that, sure. Can you point me to the last versions of the > patches, possibly rebased to current master if needed? The initial thread (for explanation and discussion context) for my attempt to do something about some NUMA problems I ran into is at: http://www.postgresql.org/message-id/flat/1402267501.41111.YahooMailNeo@web122304.mail.ne1.yahoo.com Note that in my tests at the time, the cpuset configuration made a bigger difference than the patch, and both together typically only made about a 2% difference in the NUMA test environment I was using. I would sometimes see a difference as big as 20%, but had no idea how to repeat that. > The kernel is 3.19.0-031900-generic So that kernel is recent enough to have acquired the worst of the scheduling bugs, known to slow down one NASA high-concurrency benchmark by 138x. To quote from the recent paper by Lozi, et al[1]: | The Missing Scheduling Domains bug causes all threads of the | applications to run on a single node instead of eight. In some | cases, the performance impact is greater than the 8x slowdown | that one would expect, given that the threads are getting 8x less | CPU time than they would without the bug (they run on one node | instead of eight). lu, for example, runs 138x faster! | Super-linear slowdowns occur in cases where threads | frequently synchronize using locks or barriers: if threads spin | on a lock held by a descheduled thread, they will waste even more | CPU time, causing cascading effects on the entire application’s | performance. Some applications do not scale ideally to 64 cores | and are thus a bit less impacted by the bug. The minimum slowdown | is 4x. The bug is only encountered if cores are disabled and re-enabled, though, and I have no idea whether that might have happened on your machine. Since you're on a vulnerable kernel version, you might want to be aware of the issue and take care not to trigger the problem. You are only vulnerable to the Group Imbalance bug if you use autogroups. You are only vulnerable to the Scheduling Group Construction bug if you have more than one hop from any core to any memory segment (which seems quite unlikely with 4 sockets and 4 memory nodes). If you are vulnerable to any of the above, it might explain some of the odd variations. Let me know and I'll see if I can find more on workarounds or OS patches. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] Jean-Pierre Lozi, Baptiste Lepers, Justin Funston, Fabien Gaud, Vivien Quéma, Alexandra Fedorova. The Linux Scheduler:a Decade of Wasted Cores. In Proceedings of the 11th European Conference on Computer Systems, EuroSys’16.April, 2016, London, UK. http://www.ece.ubc.ca/~sasha/papers/eurosys16-final29.pdf
pgsql-hackers by date: