Re: [HACKERS] [PATCH] Incremental sort - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [HACKERS] [PATCH] Incremental sort |
Date | |
Msg-id | c590d275-a847-167c-015e-3f9c5f37314f@2ndquadrant.com Whole thread Raw |
In response to | Re: [HACKERS] [PATCH] Incremental sort (Alexander Korotkov <a.korotkov@postgrespro.ru>) |
Responses |
Re: [HACKERS] [PATCH] Incremental sort
|
List | pgsql-hackers |
On 03/10/2018 06:05 PM, Alexander Korotkov wrote: > On Sat, Mar 10, 2018 at 6:42 PM, Alexander Korotkov > <a.korotkov@postgrespro.ru <mailto:a.korotkov@postgrespro.ru>> wrote: > > ... > > After some investigation of benchmark results, I found 2 sources of > regressions of incremental sort. > > *Case 1: Underlying node scan lose is bigger than incremental sort win* > > ===== 33 [Wed Mar 7 10:14:14 CET 2018] scale:10000000 groups:10 > work_mem:64MB incremental:on max_workers:0 ===== > SELECT * FROM s_1 ORDER BY a, b > QUERY > PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=1588080.84..1588080.84 rows=1 width=20) (actual > time=5874.527..5874.527 rows=0 loops=1) > -> Incremental Sort (cost=119371.51..1488081.45 rows=9999939 > width=20) (actual time=202.842..5653.224 rows=10000000 loops=1) > Sort Key: s_1.a, s_1.b > Presorted Key: s_1.a > Sort Method: external merge Disk: 29408kB > Sort Groups: 11 > -> Index Scan using s_1_a_idx on s_1 (cost=0.43..323385.52 > rows=9999939 width=20) (actual time=0.051..1494.105 rows=10000000 loops=1) > Planning time: 0.269 ms > Execution time: 5877.367 ms > (9 rows) > > ===== 37 [Wed Mar 7 10:15:51 CET 2018] scale:10000000 groups:10 > work_mem:64MB incremental:off max_workers:0 ===== > SELECT * FROM s_1 ORDER BY a, b > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------------------------ > Limit (cost=1656439.93..1656439.93 rows=1 width=20) (actual > time=4741.716..4741.716 rows=0 loops=1) > -> Sort (cost=1531440.69..1556440.54 rows=9999939 width=20) (actual > time=3522.156..4519.278 rows=10000000 loops=1) > Sort Key: s_1.a, s_1.b > Sort Method: external merge Disk: 293648kB > -> Seq Scan on s_1 (cost=0.00..163694.39 rows=9999939 > width=20) (actual time=0.021..650.322 rows=10000000 loops=1) > Planning time: 0.249 ms > Execution time: 4777.088 ms > (7 rows) > > In this case optimizer have decided that "Index Scan + Incremental > Sort" would be cheaper than "Seq Scan + Sort". But it appears that > the amount of time we loose by selecting Index Scan over Seq Scan is > bigger than amount of time we win by selecting Incremental Sort over > Sort. I would note that regular Sort consumes about 10X more disk > space. I bet that all this space has fit to OS cache of test > machine. But optimizer did expect actual IO to take place in this > case. This has lead actual time to be inadequate the costing. > Yes, you're right the temporary file(s) likely fit into RAM in this test (and even if they did not, the storage system is pretty good). > *Case 2: Underlying node is not parallelyzed* > > ===== 178 [Wed Mar 7 11:18:53 CET 2018] scale:10000000 groups:100 > work_mem:8MB incremental:on max_workers:2 ===== > SELECT * FROM s_2 ORDER BY a, b, c > > QUERY PLAN > > ---------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=1179047.88..1179047.88 rows=1 width=20) (actual > time=4819.999..4819.999 rows=0 loops=1) > -> Incremental Sort (cost=89.04..1079047.34 rows=10000054 width=20) > (actual time=0.203..4603.197 rows=10000000 loops=1) > Sort Key: s_2.a, s_2.b, s_2.c > Presorted Key: s_2.a, s_2.b > Sort Method: quicksort Memory: 135kB > Sort Groups: 10201 > -> Index Scan using s_2_a_b_idx on s_2 (cost=0.43..406985.62 > rows=10000054 width=20) (actual time=0.052..1461.177 rows=10000000 loops=1) > Planning time: 0.313 ms > Execution time: 4820.037 ms > (9 rows) > > ===== 182 [Wed Mar 7 11:20:11 CET 2018] scale:10000000 groups:100 > work_mem:8MB incremental:off max_workers:2 ===== > SELECT * FROM s_2 ORDER BY a, b, c > QUERY > PLAN > -------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=1705580.76..1705580.76 rows=1 width=20) (actual > time=3985.818..3985.818 rows=0 loops=1) > -> Gather Merge (cost=649951.66..1622246.98 rows=8333378 width=20) > (actual time=1782.354..3750.868 rows=10000000 loops=1) > Workers Planned: 2 > Workers Launched: 2 > -> Sort (cost=648951.64..659368.36 rows=4166689 width=20) > (actual time=1778.362..2091.253 rows=3333333 loops=3) > Sort Key: s_2.a, s_2.b, s_2.c > Sort Method: external merge Disk: 99136kB > Worker 0: Sort Method: external merge Disk: 96984kB > Worker 1: Sort Method: external merge Disk: 97496kB > -> Parallel Seq Scan on s_2 (cost=0.00..105361.89 > rows=4166689 width=20) (actual time=0.022..233.640 rows=3333333 loops=3) > Planning time: 0.265 ms > Execution time: 4007.591 ms > (12 rows) > > The situation is similar to case #1 except that in the pair "Seq Scan > + Sort" Sort also gets paralellyzed. In the same way as in previous > case, disk writes/reads during external sort are overestimated, > because they actually use OS cache. I would also say that it's not > necessary wrong decision of optimizer, because doing this work in > single backend may consume less resources despite being overall > slower. > Yes, that seems like a likely explanation too. I agree those don't seem like an issue in the Incremental Sort patch, but like a more generic costing problems. Thanks for looking into the benchmark results. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: