Re: rebellious pg stats collector (reopened case) - Mailing list pgsql-admin
From | Laszlo Nagy |
---|---|
Subject | Re: rebellious pg stats collector (reopened case) |
Date | |
Msg-id | 494B614C.5060808@shopzeus.com Whole thread Raw |
In response to | Re: rebellious pg stats collector (reopened case) (Alvaro Herrera <alvherre@commandprompt.com>) |
Responses |
Re: rebellious pg stats collector (reopened case)
|
List | pgsql-admin |
Alvaro Herrera wrote: > Laszlo Nagy wrote: > >> PostgreSQL 8.3.5, the system is now stable (uptime > 10 days). >> PostgreSQL stats collector uses 100% CPU forever: >> > > Could you grab a few stack traces from it and post them? Also possibly > useful, leave strace running on the pgstat process for a while and post > the output somewhere. > Very interesting results. Before I used 'truss' on the process, it was: 78816 pgsql 1 106 0 22400K 7100K CPU6 6 24.2H 99.02% postgres After I started "truss -p 78815" the first message I got was: SIGNAL 17 (SIGSTOP) and the it waited for some seconds. Then my screen was filled with messages like this: poll({7/POLLIN|POLLERR},1,2000) = 1 (0x1) recvfrom(7,"\^A\0\0\0\M-X\^C\0\0\M^T\f\r\0\n"...,1000,0x0,NULL,0x0) = 984 (0x3d8) poll({7/POLLIN|POLLERR},1,2000) = 1 (0x1) recvfrom(7,"\^A\0\0\0\M-X\^C\0\0\M^T\f\r\0\n"...,1000,0x0,NULL,0x0) = 984 (0x3d8) poll({7/POLLIN|POLLERR},1,2000) = 1 (0x1) recvfrom(7,"\^A\0\0\0\M^X\^A\0\0\M^T\f\r\0"...,1000,0x0,NULL,0x0) = 408 (0x198) poll({7/POLLIN|POLLERR},1,2000) = 1 (0x1) recvfrom(7,"\b\0\0\0@\0\0\0\0\0\0\0\0\0\0\0"...,1000,0x0,NULL,0x0) = 64 (0x40) poll({7/POLLIN|POLLERR},1,2000) = 1 (0x1) recvfrom(7,"\b\0\0\0@\0\0\0\0\0\0\0\0\0\0\0"...,1000,0x0,NULL,0x0) = 64 (0x40) poll({7/POLLIN|POLLERR},1,2000) = 1 (0x1) recvfrom(7,"\^A\0\0\0\M-X\0\0\0\M^O\v\b\0\^B"...,1000,0x0,NULL,0x0) = 216 (0xd8) poll({7/POLLIN|POLLERR},1,2000) = 1 (0x1) recvfrom(7,"\^A\0\0\0\M-x\^A\0\0\0\0\0\0\^E"...,1000,0x0,NULL,0x0) = 504 (0x1f8) poll({7/POLLIN|POLLERR},1,2000) = 1 (0x1) recvfrom(7,"\^A\0\0\0\M-X\0\0\0\M^T\f\r\0\^B"...,1000,0x0,NULL,0x0) = 216 (0xd8) poll({7/POLLIN|POLLERR},1,2000) = 1 (0x1) recvfrom(7,"\^A\0\0\0\M-x\^A\0\0\0\0\0\0\^E"...,1000,0x0,NULL,0x0) = 504 (0x1f8) poll({7/POLLIN|POLLERR},1,2000) = 1 (0x1) recvfrom(7,"\^A\0\0\0\M-X\0\0\0\M^T\f\r\0\^B"...,1000,0x0,NULL,0x0) = 216 (0xd8) poll({7/POLLIN|POLLERR},1,2000) = 1 (0x1) recvfrom(7,"\^A\0\0\0\M-x\^A\0\0\0\0\0\0\^E"...,1000,0x0,NULL,0x0) = 504 (0x1f8) poll({7/POLLIN|POLLERR},1,2000) = 1 (0x1) recvfrom(7,"\b\0\0\0@\0\0\0\0\0\0\0\0\0\0\0"...,1000,0x0,NULL,0x0) = 64 (0x40) poll({7/POLLIN|POLLERR},1,2000) = 1 (0x1) I terminated the debugger with Ctrl+C. Then I started it again: #truss -p 78815 SIGNAL 17 (SIGSTOP) poll({7/POLLIN|POLLERR},1,2000) = 0 (0x0) getppid(0x1,0x1,0x7d0,0x8014606cc,0xffffffff80a579c0,0x7fffffffceb8) = 95860 (0x17674) open("global/pgstat.tmp",O_WRONLY|O_CREAT|O_TRUNC,0666) = 4 (0x4) fstat(4,{mode=-rw------- ,inode=73223196,size=0,blksize=4096}) = 0 (0x0) write(4,"\M^W\M-<\M-%\^A\^F\n\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\M-Z\^C\0\0\0\0\0\0\M-d\M^E\M-0u"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0+\0\0\0\0\0\0\0\^N\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\^C\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0T\^T\^N\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\^R\M-(\M^[\v\M-^\M-0A\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\^N\0\0\0\0\0\0\0\\n\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0T\M-A\n\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0~\M-d\M-fp\n\M-^\M-0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\^E\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"$\0\0\0\0\0\0+\^V\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0Z\M-F%"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0|\^B\0\0\0\0\0\0<\M-I\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0T\M-f\t\^V\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\M-H'\M-4\M-N\n\M-^\M-0A"...,4096) = 4096 (0x1000) write(4,"\0\0\^N\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\b\0\0\0\0\0L\^B\0\0\0\0\0\0\M^^"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0dD\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\09\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0>\M-h\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0T%;\r\0\0\0\0\0 \0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\M-I\\j\n\M-^\M-0A\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0Y\240\^A\0\0\0\0\0\^C\M-3"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\^N\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\^A\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0P\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\^A\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\^P\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\M-P\^R\0\0\0"...,4096) = 4096 (0x1000) write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,3119) = 3119 (0xc2f) close(4) = 0 (0x0) rename("global/pgstat.tmp","global/pgstat.stat") = 0 (0x0) process exit, rval = 0 I did NOT send the SIGSTOP signal to the process. The pgstat collector was running at 100%CPU for 24 hours, when I attached the debugger, and attaching the debugger somehow caused the process to get a SIGSTOP, but the process did not exit. When I started the debugger for the second time, it got a SIGSTOP again, and this made the collector exiting the endless loop (instead of stopping it). What is the next step? Laszlo
pgsql-admin by date: