Re: BUG #9635: Wal sender process is using 100% CPU - Mailing list pgsql-bugs
From | Jamie Koceniak |
---|---|
Subject | Re: BUG #9635: Wal sender process is using 100% CPU |
Date | |
Msg-id | 84CE67AB46ACDA41B18094177DD0853D212A5A@ORD2MBX04C.mex05.mlsrvr.com Whole thread Raw |
In response to | Re: BUG #9635: Wal sender process is using 100% CPU (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: BUG #9635: Wal sender process is using 100% CPU
|
List | pgsql-bugs |
Hi Heikki, Attached is a trace route for one of the pids that was running at 100% cpu. Note: when the wal_senders hit 100% cpu, our slave servers tend to lag the master for up to around 3-4 minutes until thewal_sender cpu utilization drops. This is very problematic for us since we are trying to use the slaves for reads in ourapplication. Here is perf top results: 39.49% libz.so.1.2.7 [.] 0x000000000000301e 32.97% postgres [.] 0x00000000000cf150 6.50% postgres [.] nocachegetattr 2.63% postgres [.] btint4cmp 1.64% postgres [.] _start 1.40% [unknown] [.] 0x00007f9dfca6a34b 1.06% libc-2.13.so [.] 0x000000000012934b 0.94% postgres [.] slot_getattr 0.85% libcrypto.so.1.0.0 [.] 0x000000000009f48f 0.69% postgres [.] ExecProject 0.67% libz.so.1.2.7 [.] adler32 0.66% postgres [.] GetMemoryChunkSpace 0.58% [kernel] [k] copy_user_generic_string 0.45% postgres [.] heap_fill_tuple 0.38% postgres [.] execTuplesMatch 0.27% postgres [.] FunctionCall2Coll 0.27% postgres [.] ExecProcNode 0.25% postgres [.] heap_tuple_untoast_attr 0.22% postgres [.] MemoryContextAlloc 0.21% postgres [.] heap_form_minimal_tuple 0.21% postgres [.] pfree 0.19% postgres [.] heap_compute_data_size Here is some data using perf record and perf report: + 33.93% 193019 postgres postgres [.] 0x0000000000369e27 + 30.06% 173825 postgres libz.so.1.2.7 [.] 0x00000000000036c0 + 10.21% 60618 postgres [unknown] [.] 0x00007f9dfc2fd529 + 4.81% 27275 postgres postgres [.] nocachegetattr + 2.55% 14443 postgres postgres [.] btint4cmp + 1.22% 6916 postgres postgres [.] ExecProject + 1.20% 6837 postgres postgres [.] _start + 1.10% 6495 postgres libc-2.13.so [.] 0x000000000012b8e9 + 0.84% 4760 postgres postgres [.] heap_fill_tuple + 0.74% 4265 postgres [kernel.kallsyms] [k] copy_user_generic_string + 0.71% 4001 postgres postgres [.] GetMemoryChunkSpace + 0.66% 3881 postgres libcrypto.so.1.0.0 [.] 0x0000000000081ec7 + 0.63% 3673 postgres libz.so.1.2.7 [.] adler32 + 0.43% 2448 postgres postgres [.] slot_getattr + 0.40% 2277 postgres postgres [.] ExecProcNode + 0.40% 2250 postgres postgres [.] heap_form_minimal_tuple + 0.37% 2078 postgres postgres [.] heap_compute_data_size + 0.31% 1739 postgres postgres [.] ExecScan + 0.27% 1779 postgres [kernel.kallsyms] [k] page_fault + 0.24% 1374 postgres postgres [.] LogicalTapeWrite + 0.24% 4206 swapper [kernel.kallsyms] [k] intel_idle + 0.21% 1206 postgres postgres [.] slot_getsomeattrs Also, here are our wal settings on the master. Do these settings appear ok? wal_block_size | 8192 | wal_buffers | 16MB | wal_keep_segments | 5000 | wal_level | hot_standby | wal_receiver_status_interval | 10s | wal_segment_size | 16MB | wal_sender_delay | 1s | wal_sync_method | fdatasync | wal_writer_delay | 200ms | Thanks! -----Original Message----- From: Heikki Linnakangas [mailto:hlinnakangas@vmware.com] Sent: Tuesday, May 06, 2014 4:01 AM To: Jamie Koceniak Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #9635: Wal sender process is using 100% CPU On 05/05/2014 10:37 PM, Jamie Koceniak wrote: > Hi Heikki, > > I performed an strace on one of the pids that was consuming 100% of the cpu. > This is just a portion of the log. > I also trimmed out some data provided in the quotes in the read and sends, not sure if you need it. > A majority of the calls in the trace log were recvfrom() and sendto(). > EAGAIN (Resource temporarily unavailable) came up time and time again. > > # strace log during 100% cpu utilization on the master Hmm, that trace looks normal to me. It doesn't look like it's busy-looping, as I would expect if it's using 100% of the CPU. Can you post the full log, or at least a larger chunk? I don't need the data in the quotes from read() and sends() that youremoved, but I would like to see the trace from a longer period of time to see what the pattern is. - Heikki
Attachment
pgsql-bugs by date: