[PERFORM] 10x faster sort performance on Skylake CPU vs Ivy Bridge - Mailing list pgsql-performance
From | Felix Geisendörfer |
---|---|
Subject | [PERFORM] 10x faster sort performance on Skylake CPU vs Ivy Bridge |
Date | |
Msg-id | 79C36278-87E4-4F9C-9C34-FA4ECB2B4B49@felixge.de Whole thread Raw |
Responses |
Re: [PERFORM] 10x faster sort performance on Skylake CPU vs Ivy Bridge
|
List | pgsql-performance |
Hi, I recently came across a performance difference between two machines that surprised me: Postgres Version / OS on both machines: v9.6.3 / MacOS 10.12.5 Machine A: MacBook Pro Mid 2012, 2.7 GHz Intel Core i7 (Ivy Bridge), 8 MB L3 Cache, 16 GB 1600 MHz DDR3 [1] Machine B: MacBook Pro Late 2016, 2.6 GHz Intel Core i7 (Skylake), 6 MB L3 Cache,16 GB 2133 MHz LPDDR3 [2] Query Performance on Machine A: [3] CTE Scan on zulu (cost=40673.620..40742.300 rows=3434 width=56) (actual time=6339.404..6339.462 rows=58 loops=1) CTE zulu -> HashAggregate (cost=40639.280..40673.620 rows=3434 width=31) (actual time=6339.400..6339.434 rows=58 loops=1) Group Key: mike.two, mike.golf -> Unique (cost=37656.690..40038.310 rows=34341 width=64) (actual time=5937.934..6143.161 rows=298104 loops=1) -> Sort (cost=37656.690..38450.560 rows=317549 width=64) (actual time=5937.933..6031.925 rows=316982loops=1) Sort Key: mike.two, mike.lima, mike.echo DESC, mike.quebec Sort Method: quicksort Memory: 56834kB -> Seq Scan on mike (cost=0.000..8638.080 rows=317549 width=64) (actual time=0.019..142.831 rows=316982loops=1) Filter: (golf five NOT NULL) Rows Removed by Filter: 26426 Query Performance on Machine B: [4] CTE Scan on zulu (cost=40621.420..40690.100 rows=3434 width=56) (actual time=853.436..853.472 rows=58 loops=1) CTE zulu -> HashAggregate (cost=40587.080..40621.420 rows=3434 width=31) (actual time=853.433..853.448 rows=58 loops=1) Group Key: mike.two, mike.golf -> Unique (cost=37608.180..39986.110 rows=34341 width=64) (actual time=634.412..761.678 rows=298104 loops=1) -> Sort (cost=37608.180..38400.830 rows=317057 width=64) (actual time=634.411..694.719 rows=316982 loops=1) Sort Key: mike.two, mike.lima, mike.echo DESC, mike.quebec Sort Method: quicksort Memory: 56834kB -> Seq Scan on mike (cost=0.000..8638.080 rows=317057 width=64) (actual time=0.047..85.534 rows=316982loops=1) Filter: (golf five NOT NULL) Rows Removed by Filter: 26426 As you can see, Machine A spends 5889ms on the Sort Node vs 609ms on Machine B when looking at the "Exclusive" time withexplain.depesz.com [3][4]. I.e. Machine B is ~10x faster at sorting than Machine B (for this particular query). My question is: Why? I understand that this is a 3rd gen CPU vs a 6th gen, and that things have gotten faster despite stagnant clock speeds, butseeing a 10x difference still caught me off guard. Does anybody have some pointers to understand where those gains are coming from? Is it the CPU, memory, or both? And in particular,why does Sort benefit so massively from the advancement here (~10x), but Seq Scan, Unique and HashAggregate don'tbenefit as much (~2x)? As you can probably tell, my hardware knowledge is very superficial, so I apologize if this is a stupid question. But I'dgenuinely like to improve my understanding and intuition about these things. Cheers Felix Geisendörfer [1] http://www.everymac.com/systems/apple/macbook_pro/specs/macbook-pro-core-i7-2.7-15-mid-2012-retina-display-specs.html [2] http://www.everymac.com/systems/apple/macbook_pro/specs/macbook-pro-core-i7-2.6-15-late-2016-retina-display-touch-bar-specs.html [3] https://explain.depesz.com/s/hmn [4] https://explain.depesz.com/s/zVe
pgsql-performance by date: