Thread: BUG #9635: Wal sender process is using 100% CPU
The following bug has been logged on the website: Bug reference: 9635 Logged by: Jamie Koceniak Email address: jkoceniak@mediamath.com PostgreSQL version: 9.1.9 Operating system: x86_64-unknown-linux-gnu (Debian 4.7.2-5) 64-bit Description: Periodically throughout the day, we keep seeing the wal sender process utilize 100% of the CPU. We began noticing this after we added 2 new slave servers, going from 2 to 4 slaves. See top results and I also included our wal settings. Thanks! top - 05:03:18 up 174 days, 4:51, 2 users, load average: 5.57, 4.75, 3.16 Tasks: 387 total, 8 running, 379 sleeping, 0 stopped, 0 zombie %Cpu(s): 29.3 us, 4.7 sy, 0.0 ni, 65.3 id, 0.4 wa, 0.0 hi, 0.4 si, 0.0 st MiB Mem: 290797 total, 218532 used, 72264 free, 311 buffers MiB Swap: 7812 total, 1 used, 7811 free, 206978 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 30244 postgres 20 0 8497m 5608 2820 R 100 0.0 1:44.72 postgres: wal sen 14447 postgres 20 0 8497m 5596 2816 R 100 0.0 3:11.27 postgres: wal sen 16075 postgres 20 0 8497m 5600 2820 R 100 0.0 3:32.32 postgres: wal sen 8177 postgres 20 0 8497m 5360 2820 S 36 0.0 0:03.35 postgres: wal sen 4920 postgres 20 0 9647m 9.3g 8.1g S 3 3.3 1097:40 postgres: writer 4923 postgres 20 0 68872 2072 788 S 3 0.0 511:01.76 postgres: archive 4921 postgres 20 0 8496m 18m 17m S 2 0.0 593:36.38 postgres: wal wri 7853 root 20 0 23432 1836 1176 R 1 0.0 0:00.44 top 4916 postgres 20 0 8492m 229m 228m S 0 0.1 598:44.57 /usr/lib/postgres Current Wal settings: name | setting | ------------------------------+-------------+ max_wal_senders | 10 | wal_block_size | 8192 | wal_buffers | 2048 | wal_keep_segments | 5000 | wal_level | hot_standby | wal_receiver_status_interval | 10 | wal_segment_size | 2048 | wal_sender_delay | 1000 | wal_sync_method | fdatasync | wal_writer_delay | 200 |
On 03/19/2014 07:13 PM, jkoceniak@mediamath.com wrote: > The following bug has been logged on the website: > > Bug reference: 9635 > Logged by: Jamie Koceniak > Email address: jkoceniak@mediamath.com > PostgreSQL version: 9.1.9 > Operating system: x86_64-unknown-linux-gnu (Debian 4.7.2-5) 64-bit > Description: > > Periodically throughout the day, we keep seeing the wal sender process > utilize 100% of the CPU. We began noticing this after we added 2 new slave > servers, going from 2 to 4 slaves. See top results and I also included our > wal settings. Thanks! Hmm. There was a bug that caused walsender to go into a busy-loop, when the standby disconnected. But that was fixed in 9.1.4 already. Could you attach to one of the processes with gdb and see what functions are being called? Or strace, or "perf top" or something. - Heikki
SGkgSGVpa2tpLA0KDQpJIHBlcmZvcm1lZCBhbiBzdHJhY2Ugb24gb25lIG9mIHRoZSBwaWRzIHRo YXQgd2FzIGNvbnN1bWluZyAxMDAlIG9mIHRoZSBjcHUuIA0KVGhpcyBpcyBqdXN0IGEgcG9ydGlv biBvZiB0aGUgbG9nLg0KSSBhbHNvIHRyaW1tZWQgb3V0IHNvbWUgZGF0YSBwcm92aWRlZCBpbiB0 aGUgcXVvdGVzIGluIHRoZSByZWFkIGFuZCBzZW5kcywgbm90IHN1cmUgaWYgeW91IG5lZWQgaXQu DQpBIG1ham9yaXR5IG9mIHRoZSBjYWxscyBpbiB0aGUgdHJhY2UgbG9nIHdlcmUgcmVjdmZyb20o KSBhbmQgc2VuZHRvKCkuDQpFQUdBSU4gKFJlc291cmNlIHRlbXBvcmFyaWx5IHVuYXZhaWxhYmxl KSBjYW1lIHVwIHRpbWUgYW5kIHRpbWUgYWdhaW4uDQoNCiMgc3RyYWNlIGxvZyBkdXJpbmcgMTAw JSBjcHUgdXRpbGl6YXRpb24gb24gdGhlIG1hc3Rlcg0KDQpnZXRwcGlkKCkgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgPSA1NDc0DQpyZWN2ZnJvbSgxMCwgMHg3ZjVkZ2FjMWY3NTMsIDUs IDAsIDAsIDApID0gLTEgRUFHQUlOIChSZXNvdXJjZSB0ZW1wb3JhcmlseSB1bmF2YWlsYWJsZSkN CnJlYWQoNywgIiIuLi4sIDEzMTA3MikgPSAxMzEwNzINCnNlbmR0bygxMCwgIiIuLi4sIDMxNzgs IDAsIE5VTEwsIDApID0gMzE3OA0Kc2VuZHRvKDEwLCAiIi4uLiwgNTM4MSwgMCwgTlVMTCwgMCkg PSA1MzgxDQpnZXRwcGlkKCkgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgPSA1NDc0DQpy ZWN2ZnJvbSgxMCwgMHg3ZjVkZ2FjMWY3NTMsIDUsIDAsIDAsIDApID0gLTEgRUFHQUlOIChSZXNv dXJjZSB0ZW1wb3JhcmlseSB1bmF2YWlsYWJsZSkNCnJlYWQoNywgIiIuLi4sIDEzMTA3MikgPSAx MzEwNzINCnNlbmR0bygxMCwgIiIuLi4sIDMzMzgsIDAsIE5VTEwsIDApID0gMzMzOA0KZ2V0cHBp ZCgpICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgID0gNTQ3NA0KcmVjdmZyb20oMTAsICIg IiwgNSwgMCwgTlVMTCwgTlVMTCkgPSA1DQpyZWN2ZnJvbSgxMCwgIiIsIDMyLCAwLCBOVUxMLCBO VUxMKSA9IDMyDQpyZWN2ZnJvbSgxMCwgIiIsIDUsIDAsIE5VTEwsIE5VTEwpID0gNQ0KcmVjdmZy b20oMTAsICIiLi4uLCA0OCwgMCwgTlVMTCwgTlVMTCkgPSA0OA0KZmNudGwoMTAsIEZfR0VURkwp ICAgICAgICAgICAgICAgICAgICAgID0gMHg4MDIgKGZsYWdzIE9fUkRXUnxPX05PTkJMT0NLKQ0K ZmNudGwoMTAsIEZfU0VURkwsIE9fUkRXUikgICAgICAgICAgICAgID0gMA0KZmNudGwoMTAsIEZf U0VURkwsIE9fUkRPTkxZfE9fTk9OQkxPQ0spID0gMA0KcmVjdmZyb20oMTAsIDB4N2Y1ZGdhYzFm NzUzLCA1LCAwLCAwLCAwKSA9IC0xIEVBR0FJTiAoUmVzb3VyY2UgdGVtcG9yYXJpbHkgdW5hdmFp bGFibGUpDQpzZWxlY3QoMTEsIFszIDEwXSwgWzEwXSwgTlVMTCwgezEsIDB9KSAgPSA/IEVSRVNU QVJUTk9IQU5EIChUbyBiZSByZXN0YXJ0ZWQpDQotLS0gU0lHVVNSMSAoVXNlciBkZWZpbmVkIHNp Z25hbCAxKSBAIDAgKDApIC0tLQ0Kd3JpdGUoNiwgIlwwIiwgMSkgICAgICAgICAgICAgICAgICAg ICAgID0gMQ0KcnRfc2lncmV0dXJuKDB4NikgICAgICAgICAgICAgICAgICAgICAgID0gLTEgRUlO VFIgKEludGVycnVwdGVkIHN5c3RlbSBjYWxsKQ0KcmVhZCgzLCAiXDAiLCAxNikgICAgICAgICAg ICAgICAgICAgICAgID0gMQ0KZ2V0cHBpZCgpICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ID0gNTQ3NA0KcmVjdmZyb20oMTAsIDUsIDAsIDAsIDApID0gLTEgRUFHQUlOIChSZXNvdXJjZSB0 ZW1wb3JhcmlseSB1bmF2YWlsYWJsZSkNCnNlbmR0bygxMCwgIiIuLi4sIDI3MTUsIDAsIE5VTEws IDApID0gLTEgRUFHQUlOIChSZXNvdXJjZSB0ZW1wb3JhcmlseSB1bmF2YWlsYWJsZSkNCnJlYWQo MywgMHg3ZmZsOTYxYjA3ODAsIDE2KSAgICAgICAgICAgICA9IC0xIEVBR0FJTiAoUmVzb3VyY2Ug dGVtcG9yYXJpbHkgdW5hdmFpbGFibGUpDQpzZWxlY3QoMTEsIFszIDEwXSwgWzEwXSwgTlVMTCwg ezEsIDB9KSAgPSA/IEVSRVNUQVJUTk9IQU5EIChUbyBiZSByZXN0YXJ0ZWQpDQotLS0gU0lHVVNS MSAoVXNlciBkZWZpbmVkIHNpZ25hbCAxKSBAIDAgKDApIC0tLQ0Kd3JpdGUoNiwgIlwwIiwgMSkg ICAgICAgICAgICAgICAgICAgICAgID0gMQ0KcnRfc2lncmV0dXJuKDB4NikgICAgICAgICAgICAg ICAgICAgICAgID0gLTEgRUlOVFIgKEludGVycnVwdGVkIHN5c3RlbSBjYWxsKQ0KcmVhZCgzLCAi XDAiLCAxNikgICAgICAgICAgICAgICAgICAgICAgID0gMQ0KZ2V0cHBpZCgpICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgID0gNTQ3NA0KDQpJIGN1cnJlbnRseSBkb24ndCBoYXZlIHBlcmYg aW5zdGFsbGVkLg0KSSBjYW4gZ2V0IHRoYXQgaW5zdGFsbGVkIGFuZCBzZW5kIHJlc3VsdHMgaWYg eW91IG5lZWQgaXQuDQoNClRoYW5rcyENCg0KLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCkZy b206IEhlaWtraSBMaW5uYWthbmdhcyBbbWFpbHRvOmhsaW5uYWthbmdhc0B2bXdhcmUuY29tXSAN ClNlbnQ6IFR1ZXNkYXksIE1hcmNoIDI1LCAyMDE0IDQ6NDQgQU0NClRvOiBKYW1pZSBLb2Nlbmlh aw0KQ2M6IHBnc3FsLWJ1Z3NAcG9zdGdyZXNxbC5vcmcNClN1YmplY3Q6IFJlOiBbQlVHU10gQlVH ICM5NjM1OiBXYWwgc2VuZGVyIHByb2Nlc3MgaXMgdXNpbmcgMTAwJSBDUFUNCg0KT24gMDMvMTkv MjAxNCAwNzoxMyBQTSwgamtvY2VuaWFrQG1lZGlhbWF0aC5jb20gd3JvdGU6DQo+IFRoZSBmb2xs b3dpbmcgYnVnIGhhcyBiZWVuIGxvZ2dlZCBvbiB0aGUgd2Vic2l0ZToNCj4NCj4gQnVnIHJlZmVy ZW5jZTogICAgICA5NjM1DQo+IExvZ2dlZCBieTogICAgICAgICAgSmFtaWUgS29jZW5pYWsNCj4g RW1haWwgYWRkcmVzczogICAgICBqa29jZW5pYWtAbWVkaWFtYXRoLmNvbQ0KPiBQb3N0Z3JlU1FM IHZlcnNpb246IDkuMS45DQo+IE9wZXJhdGluZyBzeXN0ZW06ICAgeDg2XzY0LXVua25vd24tbGlu dXgtZ251IChEZWJpYW4gNC43LjItNSkgNjQtYml0DQo+IERlc2NyaXB0aW9uOg0KPg0KPiBQZXJp b2RpY2FsbHkgdGhyb3VnaG91dCB0aGUgZGF5LCB3ZSBrZWVwIHNlZWluZyB0aGUgd2FsIHNlbmRl ciBwcm9jZXNzIA0KPiB1dGlsaXplIDEwMCUgb2YgdGhlIENQVS4gV2UgYmVnYW4gbm90aWNpbmcg dGhpcyBhZnRlciB3ZSBhZGRlZCAyIG5ldyANCj4gc2xhdmUgc2VydmVycywgZ29pbmcgZnJvbSAy IHRvIDQgc2xhdmVzLiBTZWUgdG9wIHJlc3VsdHMgYW5kIEkgYWxzbyANCj4gaW5jbHVkZWQgb3Vy IHdhbCBzZXR0aW5ncy4gVGhhbmtzIQ0KDQpIbW0uIFRoZXJlIHdhcyBhIGJ1ZyB0aGF0IGNhdXNl ZCB3YWxzZW5kZXIgdG8gZ28gaW50byBhIGJ1c3ktbG9vcCwgd2hlbiB0aGUgc3RhbmRieSBkaXNj b25uZWN0ZWQuIEJ1dCB0aGF0IHdhcyBmaXhlZCBpbiA5LjEuNCBhbHJlYWR5Lg0KDQpDb3VsZCB5 b3UgYXR0YWNoIHRvIG9uZSBvZiB0aGUgcHJvY2Vzc2VzIHdpdGggZ2RiIGFuZCBzZWUgd2hhdCBm dW5jdGlvbnMgYXJlIGJlaW5nIGNhbGxlZD8gT3Igc3RyYWNlLCBvciAicGVyZiB0b3AiIG9yIHNv bWV0aGluZy4NCg0KLSBIZWlra2kNCg==
On 05/05/2014 10:37 PM, Jamie Koceniak wrote: > Hi Heikki, > > I performed an strace on one of the pids that was consuming 100% of the cpu. > This is just a portion of the log. > I also trimmed out some data provided in the quotes in the read and sends, not sure if you need it. > A majority of the calls in the trace log were recvfrom() and sendto(). > EAGAIN (Resource temporarily unavailable) came up time and time again. > > # strace log during 100% cpu utilization on the master Hmm, that trace looks normal to me. It doesn't look like it's busy-looping, as I would expect if it's using 100% of the CPU. Can you post the full log, or at least a larger chunk? I don't need the data in the quotes from read() and sends() that you removed, but I would like to see the trace from a longer period of time to see what the pattern is. - Heikki
Hi Heikki, Attached is a trace route for one of the pids that was running at 100% cpu. Note: when the wal_senders hit 100% cpu, our slave servers tend to lag the master for up to around 3-4 minutes until thewal_sender cpu utilization drops. This is very problematic for us since we are trying to use the slaves for reads in ourapplication. Here is perf top results: 39.49% libz.so.1.2.7 [.] 0x000000000000301e 32.97% postgres [.] 0x00000000000cf150 6.50% postgres [.] nocachegetattr 2.63% postgres [.] btint4cmp 1.64% postgres [.] _start 1.40% [unknown] [.] 0x00007f9dfca6a34b 1.06% libc-2.13.so [.] 0x000000000012934b 0.94% postgres [.] slot_getattr 0.85% libcrypto.so.1.0.0 [.] 0x000000000009f48f 0.69% postgres [.] ExecProject 0.67% libz.so.1.2.7 [.] adler32 0.66% postgres [.] GetMemoryChunkSpace 0.58% [kernel] [k] copy_user_generic_string 0.45% postgres [.] heap_fill_tuple 0.38% postgres [.] execTuplesMatch 0.27% postgres [.] FunctionCall2Coll 0.27% postgres [.] ExecProcNode 0.25% postgres [.] heap_tuple_untoast_attr 0.22% postgres [.] MemoryContextAlloc 0.21% postgres [.] heap_form_minimal_tuple 0.21% postgres [.] pfree 0.19% postgres [.] heap_compute_data_size Here is some data using perf record and perf report: + 33.93% 193019 postgres postgres [.] 0x0000000000369e27 + 30.06% 173825 postgres libz.so.1.2.7 [.] 0x00000000000036c0 + 10.21% 60618 postgres [unknown] [.] 0x00007f9dfc2fd529 + 4.81% 27275 postgres postgres [.] nocachegetattr + 2.55% 14443 postgres postgres [.] btint4cmp + 1.22% 6916 postgres postgres [.] ExecProject + 1.20% 6837 postgres postgres [.] _start + 1.10% 6495 postgres libc-2.13.so [.] 0x000000000012b8e9 + 0.84% 4760 postgres postgres [.] heap_fill_tuple + 0.74% 4265 postgres [kernel.kallsyms] [k] copy_user_generic_string + 0.71% 4001 postgres postgres [.] GetMemoryChunkSpace + 0.66% 3881 postgres libcrypto.so.1.0.0 [.] 0x0000000000081ec7 + 0.63% 3673 postgres libz.so.1.2.7 [.] adler32 + 0.43% 2448 postgres postgres [.] slot_getattr + 0.40% 2277 postgres postgres [.] ExecProcNode + 0.40% 2250 postgres postgres [.] heap_form_minimal_tuple + 0.37% 2078 postgres postgres [.] heap_compute_data_size + 0.31% 1739 postgres postgres [.] ExecScan + 0.27% 1779 postgres [kernel.kallsyms] [k] page_fault + 0.24% 1374 postgres postgres [.] LogicalTapeWrite + 0.24% 4206 swapper [kernel.kallsyms] [k] intel_idle + 0.21% 1206 postgres postgres [.] slot_getsomeattrs Also, here are our wal settings on the master. Do these settings appear ok? wal_block_size | 8192 | wal_buffers | 16MB | wal_keep_segments | 5000 | wal_level | hot_standby | wal_receiver_status_interval | 10s | wal_segment_size | 16MB | wal_sender_delay | 1s | wal_sync_method | fdatasync | wal_writer_delay | 200ms | Thanks! -----Original Message----- From: Heikki Linnakangas [mailto:hlinnakangas@vmware.com] Sent: Tuesday, May 06, 2014 4:01 AM To: Jamie Koceniak Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #9635: Wal sender process is using 100% CPU On 05/05/2014 10:37 PM, Jamie Koceniak wrote: > Hi Heikki, > > I performed an strace on one of the pids that was consuming 100% of the cpu. > This is just a portion of the log. > I also trimmed out some data provided in the quotes in the read and sends, not sure if you need it. > A majority of the calls in the trace log were recvfrom() and sendto(). > EAGAIN (Resource temporarily unavailable) came up time and time again. > > # strace log during 100% cpu utilization on the master Hmm, that trace looks normal to me. It doesn't look like it's busy-looping, as I would expect if it's using 100% of the CPU. Can you post the full log, or at least a larger chunk? I don't need the data in the quotes from read() and sends() that youremoved, but I would like to see the trace from a longer period of time to see what the pattern is. - Heikki
Attachment
On 05/06/2014 07:44 PM, Jamie Koceniak wrote: > Hi Heikki, > > Attached is a trace route for one of the pids that was running at 100% cpu. > Note: when the wal_senders hit 100% cpu, our slave servers tend to lag the master for up to around 3-4 minutes until thewal_sender cpu utilization drops. This is very problematic for us since we are trying to use the slaves for reads in ourapplication. > > Here is perf top results: > 39.49% libz.so.1.2.7 [.] 0x000000000000301e > 32.97% postgres [.] 0x00000000000cf150 > 6.50% postgres [.] nocachegetattr > 2.63% postgres [.] btint4cmp > 1.64% postgres [.] _start > 1.40% [unknown] [.] 0x00007f9dfca6a34b > 1.06% libc-2.13.so [.] 0x000000000012934b > 0.94% postgres [.] slot_getattr > 0.85% libcrypto.so.1.0.0 [.] 0x000000000009f48f > 0.69% postgres [.] ExecProject > 0.67% libz.so.1.2.7 [.] adler32 > 0.66% postgres [.] GetMemoryChunkSpace > 0.58% [kernel] [k] copy_user_generic_string > 0.45% postgres [.] heap_fill_tuple > 0.38% postgres [.] execTuplesMatch > 0.27% postgres [.] FunctionCall2Coll > 0.27% postgres [.] ExecProcNode > 0.25% postgres [.] heap_tuple_untoast_attr > 0.22% postgres [.] MemoryContextAlloc > 0.21% postgres [.] heap_form_minimal_tuple > 0.21% postgres [.] pfree > 0.19% postgres [.] heap_compute_data_size Hmm, these results seem to include all processes, not just the WAL sender > Here is some data using perf record and perf report: > > + 33.93% 193019 postgres postgres [.] 0x0000000000369e27 > + 30.06% 173825 postgres libz.so.1.2.7 [.] 0x00000000000036c0 > + 10.21% 60618 postgres [unknown] [.] 0x00007f9dfc2fd529 > + 4.81% 27275 postgres postgres [.] nocachegetattr > + 2.55% 14443 postgres postgres [.] btint4cmp > + 1.22% 6916 postgres postgres [.] ExecProject > + 1.20% 6837 postgres postgres [.] _start > + 1.10% 6495 postgres libc-2.13.so [.] 0x000000000012b8e9 > + 0.84% 4760 postgres postgres [.] heap_fill_tuple > + 0.74% 4265 postgres [kernel.kallsyms] [k] copy_user_generic_string > + 0.71% 4001 postgres postgres [.] GetMemoryChunkSpace > + 0.66% 3881 postgres libcrypto.so.1.0.0 [.] 0x0000000000081ec7 > + 0.63% 3673 postgres libz.so.1.2.7 [.] adler32 > + 0.43% 2448 postgres postgres [.] slot_getattr > + 0.40% 2277 postgres postgres [.] ExecProcNode > + 0.40% 2250 postgres postgres [.] heap_form_minimal_tuple > + 0.37% 2078 postgres postgres [.] heap_compute_data_size > + 0.31% 1739 postgres postgres [.] ExecScan > + 0.27% 1779 postgres [kernel.kallsyms] [k] page_fault > + 0.24% 1374 postgres postgres [.] LogicalTapeWrite > + 0.24% 4206 swapper [kernel.kallsyms] [k] intel_idle > + 0.21% 1206 postgres postgres [.] slot_getsomeattrs Same here. > Also, here are our wal settings on the master. > Do these settings appear ok? > > wal_block_size | 8192 | > wal_buffers | 16MB | > wal_keep_segments | 5000 | > wal_level | hot_standby | > wal_receiver_status_interval | 10s | > wal_segment_size | 16MB | > wal_sender_delay | 1s | > wal_sync_method | fdatasync | > wal_writer_delay | 200ms | Looks reasonable to me. So, the strace output repeats this, with small variations in the number of bytes written in each sendto() call: recvfrom(10, 0x7f9dfec1f753, 5, 0, 0, 0) = -1 EAGAIN (Resource temporarily unavailable) read(7, ..., 131072) = 131072 sendto(10, ..., 3690, 0, NULL, 0) = 3690 sendto(10, ..., 3605, 0, NULL, 0) = 3605 sendto(10, ..., 3669, 0, NULL, 0) = 3669 sendto(10, ..., 3653, 0, NULL, 0) = 3653 sendto(10, ..., 3637, 0, NULL, 0) = 3637 sendto(10, ..., 3621, 0, NULL, 0) = 3621 sendto(10, ..., 3669, 0, NULL, 0) = 3669 sendto(10, ..., 3605, 0, NULL, 0) = 3605 sendto(10, ..., 53, 0, NULL, 0) = 53 getppid() = 5474 The read() reads 128kB of WAL from the file in pg_xlog, and it is then sent to the client. Now, it's pretty surprising that the number of bytes written with sendto() doesn't match the number of bytes read from the file. Unless, you're using SSL with compression. That would also explain the high amount of CPU time spent in libz in the perf output. Try disabling SSL compression, with sslcompression=0 option in primary_conninfo, or disable SSL altogether if the network is trustworthy enough for that. - Heikki
Hi, On 2014-03-19 17:13:56 +0000, jkoceniak@mediamath.com wrote: > Periodically throughout the day, we keep seeing the wal sender process > utilize 100% of the CPU. We began noticing this after we added 2 new slave > servers, going from 2 to 4 slaves. See top results and I also included our > wal settings. Thanks! A typical reason for this is that the standbys are connecting over ssl. Is that the case? If so, intentionally? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
SSB3YXMgYWJsZSB0byByZWNyZWF0ZSB0aGUgaXNzdWUuDQpCYXNpY2FsbHksIHdlIGhhdmUgYSBq b2IgdGhhdCBpbnNlcnRzIDMwKyBtaWxsaW9uIHJlY29yZHMgaW50byBhIHRhYmxlLg0KRHVyaW5n IHRoZSBsb2FkaW5nIG9mIHRoaXMgdGFibGUsIHRoZSB3YWxfc2VuZGVyIGNwdSBzcGlrZXMgdG8g MTAwJSBhbmQgYXBwZWFycyB0byBzdGF5IGF0IDEwMCUgdW50aWwgdGhlIHJlcGxpY2F0aW9uIGlz IGNvbXBsZXRlZCBvbiB0aGUgc2xhdmUuDQpJIHdhcyBhYmxlIHRvIHRlc3QgaW5zZXJ0aW5nIHRo aXMgbXVjaCBkYXRhIG9uIGFub3RoZXIgbWFzdGVyL3NsYXZlIGNvbmZpZ3VyYXRpb24gd2l0aCBz dHJlYW1pbmcgcmVwbGljYXRpb24gYW5kIHNhdyB0aGUgc2FtZSByZXN1bHRzLg0KDQpPdGhlciB0 aGFuIG9mZmxvYWRpbmcgdGhpcyBqb2IsIGlzIHRoZXJlIGFueXRoaW5nIHdlIGNhbiBkbyB0byBk ZWNyZWFzZSB0aGUgcmVwbGljYXRpb24gbGFnPw0KQWxzbywgaXMgdGhpcyBleHBlY3RlZCBiZWhh dmlvciBkdWUgdG8gdGhlIHZvbHVtZSBvZiBkYXRhIHRoYXQgZ290IGluc2VydGVkIGFuZCBuZWVk cyB0byBnZXQgcmVwbGljYXRlZD8NCg0KVGhhbmtzIQ0KDQotLS0tLU9yaWdpbmFsIE1lc3NhZ2Ut LS0tLQ0KRnJvbTogSGVpa2tpIExpbm5ha2FuZ2FzIFttYWlsdG86aGxpbm5ha2FuZ2FzQHZtd2Fy ZS5jb21dIA0KU2VudDogVHVlc2RheSwgTWF5IDA2LCAyMDE0IDEwOjMzIEFNDQpUbzogSmFtaWUg S29jZW5pYWsNCkNjOiBwZ3NxbC1idWdzQHBvc3RncmVzcWwub3JnDQpTdWJqZWN0OiBSZTogW0JV R1NdIEJVRyAjOTYzNTogV2FsIHNlbmRlciBwcm9jZXNzIGlzIHVzaW5nIDEwMCUgQ1BVDQoNCk9u IDA1LzA2LzIwMTQgMDc6NDQgUE0sIEphbWllIEtvY2VuaWFrIHdyb3RlOg0KPiBIaSBIZWlra2ks DQo+DQo+IEF0dGFjaGVkIGlzIGEgdHJhY2Ugcm91dGUgZm9yIG9uZSBvZiB0aGUgcGlkcyB0aGF0 IHdhcyBydW5uaW5nIGF0IDEwMCUgY3B1Lg0KPiBOb3RlOiB3aGVuIHRoZSB3YWxfc2VuZGVycyBo aXQgMTAwJSBjcHUsIG91ciBzbGF2ZSBzZXJ2ZXJzIHRlbmQgdG8gbGFnIHRoZSBtYXN0ZXIgZm9y IHVwIHRvIGFyb3VuZCAzLTQgbWludXRlcyB1bnRpbCB0aGUgd2FsX3NlbmRlciBjcHUgdXRpbGl6 YXRpb24gZHJvcHMuIFRoaXMgaXMgdmVyeSBwcm9ibGVtYXRpYyBmb3IgdXMgc2luY2Ugd2UgYXJl IHRyeWluZyB0byB1c2UgdGhlIHNsYXZlcyBmb3IgcmVhZHMgaW4gb3VyIGFwcGxpY2F0aW9uLg0K Pg0KPiBIZXJlIGlzIHBlcmYgdG9wIHJlc3VsdHM6DQo+IDM5LjQ5JSAgbGliei5zby4xLjIuNyAg ICAgICAgICAgWy5dIDB4MDAwMDAwMDAwMDAwMzAxZQ0KPiAgIDMyLjk3JSAgcG9zdGdyZXMgICAg ICAgICAgICAgICAgWy5dIDB4MDAwMDAwMDAwMDBjZjE1MA0KPiAgICA2LjUwJSAgcG9zdGdyZXMg ICAgICAgICAgICAgICAgWy5dIG5vY2FjaGVnZXRhdHRyDQo+ICAgIDIuNjMlICBwb3N0Z3JlcyAg ICAgICAgICAgICAgICBbLl0gYnRpbnQ0Y21wDQo+ICAgIDEuNjQlICBwb3N0Z3JlcyAgICAgICAg ICAgICAgICBbLl0gX3N0YXJ0DQo+ICAgIDEuNDAlICBbdW5rbm93bl0gICAgICAgICAgICAgICBb Ll0gMHgwMDAwN2Y5ZGZjYTZhMzRiDQo+ICAgIDEuMDYlICBsaWJjLTIuMTMuc28gICAgICAgICAg ICBbLl0gMHgwMDAwMDAwMDAwMTI5MzRiDQo+ICAgIDAuOTQlICBwb3N0Z3JlcyAgICAgICAgICAg ICAgICBbLl0gc2xvdF9nZXRhdHRyDQo+ICAgIDAuODUlICBsaWJjcnlwdG8uc28uMS4wLjAgICAg ICBbLl0gMHgwMDAwMDAwMDAwMDlmNDhmDQo+ICAgIDAuNjklICBwb3N0Z3JlcyAgICAgICAgICAg ICAgICBbLl0gRXhlY1Byb2plY3QNCj4gICAgMC42NyUgIGxpYnouc28uMS4yLjcgICAgICAgICAg IFsuXSBhZGxlcjMyDQo+ICAgIDAuNjYlICBwb3N0Z3JlcyAgICAgICAgICAgICAgICBbLl0gR2V0 TWVtb3J5Q2h1bmtTcGFjZQ0KPiAgICAwLjU4JSAgW2tlcm5lbF0gICAgICAgICAgICAgICAgW2td IGNvcHlfdXNlcl9nZW5lcmljX3N0cmluZw0KPiAgICAwLjQ1JSAgcG9zdGdyZXMgICAgICAgICAg ICAgICAgWy5dIGhlYXBfZmlsbF90dXBsZQ0KPiAgICAwLjM4JSAgcG9zdGdyZXMgICAgICAgICAg ICAgICAgWy5dIGV4ZWNUdXBsZXNNYXRjaA0KPiAgICAwLjI3JSAgcG9zdGdyZXMgICAgICAgICAg ICAgICAgWy5dIEZ1bmN0aW9uQ2FsbDJDb2xsDQo+ICAgIDAuMjclICBwb3N0Z3JlcyAgICAgICAg ICAgICAgICBbLl0gRXhlY1Byb2NOb2RlDQo+ICAgIDAuMjUlICBwb3N0Z3JlcyAgICAgICAgICAg ICAgICBbLl0gaGVhcF90dXBsZV91bnRvYXN0X2F0dHINCj4gICAgMC4yMiUgIHBvc3RncmVzICAg ICAgICAgICAgICAgIFsuXSBNZW1vcnlDb250ZXh0QWxsb2MNCj4gICAgMC4yMSUgIHBvc3RncmVz ICAgICAgICAgICAgICAgIFsuXSBoZWFwX2Zvcm1fbWluaW1hbF90dXBsZQ0KPiAgICAwLjIxJSAg cG9zdGdyZXMgICAgICAgICAgICAgICAgWy5dIHBmcmVlDQo+ICAgIDAuMTklICBwb3N0Z3JlcyAg ICAgICAgICAgICAgICBbLl0gaGVhcF9jb21wdXRlX2RhdGFfc2l6ZQ0KDQpIbW0sIHRoZXNlIHJl c3VsdHMgc2VlbSB0byBpbmNsdWRlIGFsbCBwcm9jZXNzZXMsIG5vdCBqdXN0IHRoZSBXQUwgc2Vu ZGVyDQoNCj4gICBIZXJlIGlzIHNvbWUgZGF0YSB1c2luZyBwZXJmIHJlY29yZCBhbmQgcGVyZiBy ZXBvcnQ6DQo+DQo+ICsgIDMzLjkzJSAgICAgICAgMTkzMDE5ICAgICAgIHBvc3RncmVzICBwb3N0 Z3JlcyAgICAgICAgICAgICAgICAgIFsuXSAweDAwMDAwMDAwMDAzNjllMjcNCj4gKyAgMzAuMDYl ICAgICAgICAxNzM4MjUgICAgICAgcG9zdGdyZXMgIGxpYnouc28uMS4yLjcgICAgICAgICAgICAg Wy5dIDB4MDAwMDAwMDAwMDAwMzZjMA0KPiArICAxMC4yMSUgICAgICAgICA2MDYxOCAgICAgICBw b3N0Z3JlcyAgW3Vua25vd25dICAgICAgICAgICAgICAgICBbLl0gMHgwMDAwN2Y5ZGZjMmZkNTI5 DQo+ICsgICA0LjgxJSAgICAgICAgIDI3Mjc1ICAgICAgIHBvc3RncmVzICBwb3N0Z3JlcyAgICAg ICAgICAgICAgICAgIFsuXSBub2NhY2hlZ2V0YXR0cg0KPiArICAgMi41NSUgICAgICAgICAxNDQ0 MyAgICAgICBwb3N0Z3JlcyAgcG9zdGdyZXMgICAgICAgICAgICAgICAgICBbLl0gYnRpbnQ0Y21w DQo+ICsgICAxLjIyJSAgICAgICAgICA2OTE2ICAgICAgIHBvc3RncmVzICBwb3N0Z3JlcyAgICAg ICAgICAgICAgICAgIFsuXSBFeGVjUHJvamVjdA0KPiArICAgMS4yMCUgICAgICAgICAgNjgzNyAg ICAgICBwb3N0Z3JlcyAgcG9zdGdyZXMgICAgICAgICAgICAgICAgICBbLl0gX3N0YXJ0DQo+ICsg ICAxLjEwJSAgICAgICAgICA2NDk1ICAgICAgIHBvc3RncmVzICBsaWJjLTIuMTMuc28gICAgICAg ICAgICAgIFsuXSAweDAwMDAwMDAwMDAxMmI4ZTkNCj4gKyAgIDAuODQlICAgICAgICAgIDQ3NjAg ICAgICAgcG9zdGdyZXMgIHBvc3RncmVzICAgICAgICAgICAgICAgICAgWy5dIGhlYXBfZmlsbF90 dXBsZQ0KPiArICAgMC43NCUgICAgICAgICAgNDI2NSAgICAgICBwb3N0Z3JlcyAgW2tlcm5lbC5r YWxsc3ltc10gICAgICAgICBba10gY29weV91c2VyX2dlbmVyaWNfc3RyaW5nDQo+ICsgICAwLjcx JSAgICAgICAgICA0MDAxICAgICAgIHBvc3RncmVzICBwb3N0Z3JlcyAgICAgICAgICAgICAgICAg IFsuXSBHZXRNZW1vcnlDaHVua1NwYWNlDQo+ICsgICAwLjY2JSAgICAgICAgICAzODgxICAgICAg IHBvc3RncmVzICBsaWJjcnlwdG8uc28uMS4wLjAgICAgICAgIFsuXSAweDAwMDAwMDAwMDAwODFl YzcNCj4gKyAgIDAuNjMlICAgICAgICAgIDM2NzMgICAgICAgcG9zdGdyZXMgIGxpYnouc28uMS4y LjcgICAgICAgICAgICAgWy5dIGFkbGVyMzINCj4gKyAgIDAuNDMlICAgICAgICAgIDI0NDggICAg ICAgcG9zdGdyZXMgIHBvc3RncmVzICAgICAgICAgICAgICAgICAgWy5dIHNsb3RfZ2V0YXR0cg0K PiArICAgMC40MCUgICAgICAgICAgMjI3NyAgICAgICBwb3N0Z3JlcyAgcG9zdGdyZXMgICAgICAg ICAgICAgICAgICBbLl0gRXhlY1Byb2NOb2RlDQo+ICsgICAwLjQwJSAgICAgICAgICAyMjUwICAg ICAgIHBvc3RncmVzICBwb3N0Z3JlcyAgICAgICAgICAgICAgICAgIFsuXSBoZWFwX2Zvcm1fbWlu aW1hbF90dXBsZQ0KPiArICAgMC4zNyUgICAgICAgICAgMjA3OCAgICAgICBwb3N0Z3JlcyAgcG9z dGdyZXMgICAgICAgICAgICAgICAgICBbLl0gaGVhcF9jb21wdXRlX2RhdGFfc2l6ZQ0KPiArICAg MC4zMSUgICAgICAgICAgMTczOSAgICAgICBwb3N0Z3JlcyAgcG9zdGdyZXMgICAgICAgICAgICAg ICAgICBbLl0gRXhlY1NjYW4NCj4gKyAgIDAuMjclICAgICAgICAgIDE3NzkgICAgICAgcG9zdGdy ZXMgIFtrZXJuZWwua2FsbHN5bXNdICAgICAgICAgW2tdIHBhZ2VfZmF1bHQNCj4gKyAgIDAuMjQl ICAgICAgICAgIDEzNzQgICAgICAgcG9zdGdyZXMgIHBvc3RncmVzICAgICAgICAgICAgICAgICAg Wy5dIExvZ2ljYWxUYXBlV3JpdGUNCj4gKyAgIDAuMjQlICAgICAgICAgIDQyMDYgICAgICAgIHN3 YXBwZXIgIFtrZXJuZWwua2FsbHN5bXNdICAgICAgICAgW2tdIGludGVsX2lkbGUNCj4gKyAgIDAu MjElICAgICAgICAgIDEyMDYgICAgICAgcG9zdGdyZXMgIHBvc3RncmVzICAgICAgICAgICAgICAg ICAgWy5dIHNsb3RfZ2V0c29tZWF0dHJzDQoNClNhbWUgaGVyZS4NCg0KPiBBbHNvLCBoZXJlIGFy ZSBvdXIgd2FsIHNldHRpbmdzIG9uIHRoZSBtYXN0ZXIuDQo+IERvIHRoZXNlIHNldHRpbmdzIGFw cGVhciBvaz8NCj4NCj4gd2FsX2Jsb2NrX3NpemUgICAgICAgICAgICAgICAgICB8IDgxOTIgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfA0KPiAgIHdhbF9idWZmZXJzICAgICAg ICAgICAgICAgICAgICAgfCAxNk1CICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg IHwNCj4gICB3YWxfa2VlcF9zZWdtZW50cyAgICAgICAgICAgICAgIHwgNTAwMCAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICB8DQo+ICAgd2FsX2xldmVsICAgICAgICAgICAgICAg ICAgICAgICB8IGhvdF9zdGFuZGJ5ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgfA0KPiAg IHdhbF9yZWNlaXZlcl9zdGF0dXNfaW50ZXJ2YWwgICAgfCAxMHMgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgIHwNCj4gICB3YWxfc2VnbWVudF9zaXplICAgICAgICAgICAgICAg IHwgMTZNQiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB8DQo+ICAgd2FsX3Nl bmRlcl9kZWxheSAgICAgICAgICAgICAgICB8IDFzICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgfA0KPiAgIHdhbF9zeW5jX21ldGhvZCAgICAgICAgICAgICAgICAgfCBmZGF0 YXN5bmMgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHwNCj4gICB3YWxfd3JpdGVyX2Rl bGF5ICAgICAgICAgICAgICAgIHwgMjAwbXMgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICB8DQoNCkxvb2tzIHJlYXNvbmFibGUgdG8gbWUuDQoNClNvLCB0aGUgc3RyYWNlIG91dHB1 dCByZXBlYXRzIHRoaXMsIHdpdGggc21hbGwgdmFyaWF0aW9ucyBpbiB0aGUgbnVtYmVyIG9mIGJ5 dGVzIHdyaXR0ZW4gaW4gZWFjaCBzZW5kdG8oKSBjYWxsOg0KDQpyZWN2ZnJvbSgxMCwgMHg3Zjlk ZmVjMWY3NTMsIDUsIDAsIDAsIDApID0gLTEgRUFHQUlOIChSZXNvdXJjZSB0ZW1wb3JhcmlseSB1 bmF2YWlsYWJsZSkgcmVhZCg3LCAuLi4sIDEzMTA3MikgPSAxMzEwNzIgc2VuZHRvKDEwLCAuLi4s IDM2OTAsIDAsIE5VTEwsIDApID0gMzY5MCBzZW5kdG8oMTAsIC4uLiwgMzYwNSwgMCwgTlVMTCwg MCkgPSAzNjA1IHNlbmR0bygxMCwgLi4uLCAzNjY5LCAwLCBOVUxMLCAwKSA9IDM2Njkgc2VuZHRv KDEwLCAuLi4sIDM2NTMsIDAsIE5VTEwsIDApID0gMzY1MyBzZW5kdG8oMTAsIC4uLiwgMzYzNywg MCwgTlVMTCwgMCkgPSAzNjM3IHNlbmR0bygxMCwgLi4uLCAzNjIxLCAwLCBOVUxMLCAwKSA9IDM2 MjEgc2VuZHRvKDEwLCAuLi4sIDM2NjksIDAsIE5VTEwsIDApID0gMzY2OSBzZW5kdG8oMTAsIC4u LiwgMzYwNSwgMCwgTlVMTCwgMCkgPSAzNjA1IHNlbmR0bygxMCwgLi4uLCA1MywgMCwgTlVMTCwg MCkgPSA1Mw0KZ2V0cHBpZCgpICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgID0gNTQ3NA0K DQpUaGUgcmVhZCgpIHJlYWRzIDEyOGtCIG9mIFdBTCBmcm9tIHRoZSBmaWxlIGluIHBnX3hsb2cs IGFuZCBpdCBpcyB0aGVuIHNlbnQgdG8gdGhlIGNsaWVudC4gTm93LCBpdCdzIHByZXR0eSBzdXJw cmlzaW5nIHRoYXQgdGhlIG51bWJlciBvZiBieXRlcyB3cml0dGVuIHdpdGggc2VuZHRvKCkgZG9l c24ndCBtYXRjaCB0aGUgbnVtYmVyIG9mIGJ5dGVzIHJlYWQgZnJvbSB0aGUgZmlsZS4gVW5sZXNz LCB5b3UncmUgdXNpbmcgU1NMIHdpdGggY29tcHJlc3Npb24uIFRoYXQgd291bGQgYWxzbyBleHBs YWluIHRoZSBoaWdoIGFtb3VudCBvZiBDUFUgdGltZSBzcGVudCBpbiBsaWJ6IGluIHRoZSBwZXJm IG91dHB1dC4NCg0KVHJ5IGRpc2FibGluZyBTU0wgY29tcHJlc3Npb24sIHdpdGggc3NsY29tcHJl c3Npb249MCBvcHRpb24gaW4gcHJpbWFyeV9jb25uaW5mbywgb3IgZGlzYWJsZSBTU0wgYWx0b2dl dGhlciBpZiB0aGUgbmV0d29yayBpcyB0cnVzdHdvcnRoeSBlbm91Z2ggZm9yIHRoYXQuDQoNCi0g SGVpa2tpDQo=
Hi, On 2014-05-07 03:02:22 +0000, Jamie Koceniak wrote: > The ssl option is on in postgres.conf but we don't have any hostssl lines or cert configured as an authentication methodin our pg_hba.conf file. Add sslmode=disable to the standby's primary_conninfo to be certain. The profile suggests ssl is active. 'host' allows ssl connections, it just doesn't force them to be used. Also, zlib seems to be a contention point. If that profile is from a standby, could it be that you're concurrently generating base backups? Or compressing files in a archive_command? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi Andres, The ssl option is on in postgres.conf but we don't have any hostssl lines o= r cert configured as an authentication method in our pg_hba.conf file. In our pg_hba.conf we have configured each replication host as trust: host replication all xx.xxx.xxx.xx/32 trust I replied in a previous email that we have a job that inserts 30+ million r= ecords into a table. During the loading of this table, the wal_sender cpu spikes to 100% and app= ears to stay at 100% until the replication is completed on the slave. I have been able to reproduce this. Other than offloading this job, is the= re anything we can do to improve the replication lag? Thanks! -----Original Message----- From: Andres Freund [mailto:andres@2ndquadrant.com]=20 Sent: Tuesday, May 06, 2014 3:42 PM To: Jamie Koceniak Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #9635: Wal sender process is using 100% CPU Hi, On 2014-03-19 17:13:56 +0000, jkoceniak@mediamath.com wrote: > Periodically throughout the day, we keep seeing the wal sender process=20 > utilize 100% of the CPU. We began noticing this after we added 2 new=20 > slave servers, going from 2 to 4 slaves. See top results and I also=20 > included our wal settings. Thanks! A typical reason for this is that the standbys are connecting over ssl. Is = that the case? If so, intentionally? Greetings, Andres Freund --=20 Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services