Home > mailing lists

Re: what to revert - Mailing list pgsql-hackers

From	Kevin Grittner
Subject	Re: what to revert
Date	May 10, 2016 19:41:55
Msg-id	CACjxUsMMew5_VefF09=Nz2D+6iUYvo=uhesLhZ5L+3WRf8v7Rg@mail.gmail.com Whole thread Raw
In response to	Re: what to revert (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: what to revert Re: what to revert
List	pgsql-hackers

Tree view

On Tue, May 10, 2016 at 11:13 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On 05/10/2016 10:29 AM, Kevin Grittner wrote:
>> On Mon, May 9, 2016 at 9:01 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

>>> * It's also seems to me the feature greatly amplifies the
>>> variability of the results, somehow. It's not uncommon to see
>>> results like this:
>>>
>>>  master-10-new-2    235516     331976    133316    155563    133396
>>>
>>> where after the first runs (already fairly variable) the
>>> performance tanks to ~50%. This happens particularly with higher
>>> client counts, otherwise the max-min is within ~5% of the max.
>>> There are a few cases where this happens without the feature
>>> (i.e. old master, reverted or disabled), but it's usually much
>>> smaller than with it enabled (immediate, 10 or 60). See the
>>> 'summary' sheet in the ODS spreadsheet.

Just to quantify that with standard deviations:

standard deviation - revert
scale        1       16       32      64      128
100        386     1874     3661    8100    26587
3000       609     2236     4570    8974    41004
10000      257     4356     1350     891    12909

standard deviation - disabled
scale        1       16       32      64      128
100        641     1924     2983   12575     9411
3000       206     2321     5477    2380    45779
10000     2236    10376    11439    9653    10436

>>> I don't know what's the problem here - at first I thought that
>>> maybe something else was running on the machine, or that
>>> anti-wraparound autovacuum kicked in, but that seems not to be
>>> the case. There's nothing like that in the postgres log (also
>>> included in the .tgz).
>>
>> I'm inclined to suspect NUMA effects.  It would be interesting to
>> try with the NUMA patch and cpuset I submitted a while back or with
>> fixes in place for the Linux scheduler bugs which were reported
>> last month.  Which kernel version was this?
>
> I can try that, sure. Can you point me to the last versions of the
> patches, possibly rebased to current master if needed?

The initial thread (for explanation and discussion context) for my
attempt to do something about some NUMA problems I ran into is at:

http://www.postgresql.org/message-id/flat/1402267501.41111.YahooMailNeo@web122304.mail.ne1.yahoo.com

Note that in my tests at the time, the cpuset configuration made a
bigger difference than the patch, and both together typically only
made about a 2% difference in the NUMA test environment I was
using.  I would sometimes see a difference as big as 20%, but had
no idea how to repeat that.

> The kernel is 3.19.0-031900-generic

So that kernel is recent enough to have acquired the worst of the
scheduling bugs, known to slow down one NASA high-concurrency
benchmark by 138x.  To quote from the recent paper by Lozi, et
al[1]:

| The  Missing Scheduling Domains bug causes all threads of the
| applications to run on a single node instead of eight. In some
| cases, the performance impact is greater than the 8x slowdown
| that one would expect, given that the threads are getting 8x less
| CPU time than they would without the bug (they run on one node
| instead of eight). lu, for example, runs 138x faster!
| Super-linear slowdowns  occur  in  cases  where  threads
| frequently synchronize using locks or barriers: if threads spin
| on a lock held by a descheduled thread, they will waste even more
| CPU time, causing cascading effects on the entire application’s
| performance. Some applications do not scale ideally to 64 cores
| and are thus a bit less impacted by the bug. The minimum slowdown
| is 4x.

The bug is only encountered if cores are disabled and re-enabled,
though, and I have no idea whether that might have happened on your
machine.  Since you're on a vulnerable kernel version, you might
want to be aware of the issue and take care not to trigger the
problem.

You are only vulnerable to the Group Imbalance bug if you use
autogroups.  You are only vulnerable to the Scheduling Group
Construction bug if you have more than one hop from any core to any
memory segment (which seems quite unlikely with 4 sockets and 4
memory nodes).

If you are vulnerable to any of the above, it might explain some of
the odd variations.  Let me know and I'll see if I can find more on
workarounds or OS patches.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] Jean-Pierre Lozi, Baptiste Lepers, Justin Funston, Fabien Gaud,   Vivien Quéma, Alexandra Fedorova. The Linux
Scheduler:a Decade   of Wasted Cores.  In Proceedings of the 11th European   Conference on Computer Systems,
EuroSys’16.April, 2016,   London, UK.   http://www.ece.ubc.ca/~sasha/papers/eurosys16-final29.pdf

pgsql-hackers by date:

From: Simon Riggs
Date: 10 May 2016, 19:15:53
Subject: Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)

From: Konstantin Knizhnik
Date: 10 May 2016, 19:42:27
Subject: Re: asynchronous and vectorized execution

Re: what to revert - Mailing list pgsql-hackers

Previous

Next