concurrent COPY performance - Mailing list pgsql-hackers
From | Stefan Kaltenbrunner |
---|---|
Subject | concurrent COPY performance |
Date | |
Msg-id | 4A37CCB6.4000702@kaltenbrunner.cc Whole thread Raw |
Responses |
Re: concurrent COPY performance
|
List | pgsql-hackers |
Hi! I have been doing some bulk loading testing recently - mostly with a focus on answering why we are "only" getting a (max of) cores/2(up to around 8 cores even less with more) speedup using parallel restore. What I found is that on some fast IO-subsystem we are CPU bottlenecked on concurrent copy which is able to utilize WAL bypass (and scale up to around cores/2) and performance without wal bypass is very bad. In the WAL logged case we are only able to get a 50% speedup using the second process already and we are never able to scale better than 3x (up to 8 cores) and performance degrades even after that point. the profile(loading the lineitem table from the DBT3 benchmark) for that case looks fairly similiar to what I have seen in the past for io-intensive concurrent workloads: Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 samples % symbol name 47546 15.4562 XLogInsert 39598 12.8725 DoCopy 33798 10.9870 CopyReadLine 14191 4.6132 DecodeNumber 12986 4.2215 heap_fill_tuple 12092 3.9308 pg_verify_mbstr_len 9553 3.1055 DecodeDate 9289 3.0197 InputFunctionCall 7972 2.5915 ParseDateTime 7324 2.3809 DecodeDateTime 7290 2.3698 pg_next_dst_boundary 7218 2.3464 heap_form_tuple 5385 1.7505 AllocSetAlloc 4779 1.5536 heap_compute_data_size 4367 1.4196 float4in 3903 1.2688 DetermineTimeZoneOffset 3603 1.1713 pg_mblen 3494 1.1358 pg_atoi 3461 1.1251 .plt 3428 1.1144 date2j 3416 1.1105 pg_mbstrlen_with_len this is for 8 connections on an 8core/16 thread box. on higher connection counts the server is >70% idle and showing even worse througput. in the WAL bypass case I can actually get a performance improvement up to 16 parallel connections however I also only get cores/2 maximum throughput here too. We are actually able to max out the CPU on the server in that case though (no idle time and iowait only in the single digit range). Profiling that workload(loading around 1400000 rows/s) I get: samples % symbol name 2462358 15.3353 DoCopy 2319677 14.4467 CopyReadLine 806269 5.0214 pg_verify_mbstr_len 782594 4.8739 DecodeNumber 712426 4.4369 heap_fill_tuple 642893 4.0039 DecodeDate 609313 3.7947 InputFunctionCall 476887 2.9700 ParseDateTime 456569 2.8435 pg_next_dst_boundary 435812 2.7142 DecodeDateTime 429733 2.6763 heap_form_tuple 361750 2.2529 heap_compute_data_size 301193 1.8758 AllocSetAlloc 268830 1.6742 float4in 238119 1.4830 DetermineTimeZoneOffset 229598 1.4299 pg_atoi 225169 1.4023 .plt 217817 1.3565 pg_mbstrlen_with_len 207041 1.2894 PageAddItem 200024 1.2457 pg_mblen 192237 1.1972 bpchar_input 181552 1.1307 date2j for those interested I have some additional information up on: http://www.kaltenbrunner.cc/blog/index.php?/archives/27-Benchmarking-8.4-Chapter-2bulk-loading.html Stefan
pgsql-hackers by date: