Re: PGError: server closed the connection unexpectedly - Mailing list pgsql-general
From | Dave Steinberg |
---|---|
Subject | Re: PGError: server closed the connection unexpectedly |
Date | |
Msg-id | 43C03DD8.2010604@redterror.net Whole thread Raw |
In response to | Re: PGError: server closed the connection unexpectedly (Doug McNaught <doug@mcnaught.org>) |
Responses |
Re: PGError: server closed the connection unexpectedly
|
List | pgsql-general |
>>My biggest problem is the lack of any real error message on the server. >> I don't see anything wrong in the system logs, and there's no core >>file in the /var/postgresql directory. > > > Are you sure core files are enabled; i.e. the server is running with > 'ulimit -c unlimited' ? Yes: $ whoami _postgresql $ ulimit -c unlimited >> I did a 'vacuumdb -afz' just as >>a shot in the dark, without affect. Pretty much all I see in the logs >>is this: >> >><USER%DB x.y.z.a(51478) 487>LOG: unexpected EOF on client connection > > > This means a client is dying or closing its connection prematurely, > and would seem to be a different problem. It shouldn't ever cause the > server to crash. If the server is crashing - I have absolutely no info to suggest that, except possibly this: at a later time I did a tcpdump of one of these sessions, for lack of better ideas, and saw this near the end: <snip lots of tcpdump output> 16:32:19.523842 clam.int.geekisp.com.15245 > morningsun.int.geekisp.com.postgresql: P 17389:18036(647) ack 9323 win 16384 <nop,nop,timestamp 133396667 3702263885> (DF) 16:32:19.524289 morningsun.int.geekisp.com.postgresql > clam.int.geekisp.com.15245: P 9323:9343(20) ack 18036 win 17376 <nop,nop,timestamp 3702263885 133396667> (DF) 16:32:19.562544 clam.int.geekisp.com.15245 > morningsun.int.geekisp.com.postgresql: P 18036:18041(5) ack 9343 win 16384 <nop,nop,timestamp 133396667 3702263885> (DF) ******16:32:19.566662 morningsun.int.geekisp.com.postgresql > clam.int.geekisp.com.15245: F 9343:9343(0) ack 18041 win 17376 <nop,nop,timestamp 3702263885 133396667> (DF) 16:32:19.567043 clam.int.geekisp.com.15245 > morningsun.int.geekisp.com.postgresql: . ack 9344 win 16384 <nop,nop,timestamp 133396667 3702263885> (DF) 16:32:19.649445 clam.int.geekisp.com.15245 > morningsun.int.geekisp.com.postgresql: P 18041:18851(810) ack 9344 win 16384 <nop,nop,timestamp 133396667 3702263885> (DF) 16:32:19.649468 morningsun.int.geekisp.com.postgresql > clam.int.geekisp.com.15245: R 545360451:545360451(0) win 0 (DF) 16:32:19.649652 clam.int.geekisp.com.15245 > morningsun.int.geekisp.com.postgresql: F 18851:18851(0) ack 9344 win 16384 <nop,nop,timestamp 133396667 3702263885> (DF) 16:32:19.649665 morningsun.int.geekisp.com.postgresql > clam.int.geekisp.com.15245: R 545360451:545360451(0) win 0 (DF) The '*****' above is my own, to highlight the interesting part. Morningsun is the server, clam is the ruby client. Based on this, I think the server *might* be dying first (hard to tell), since morningsun is the one who sends the FIN packet and thereby closes the connection. (Sorry for the horrible wrapping). Again, having trouble making sure. >>Googling turned up a few reports suggesting bad hardware, or corrupted >>indexes, but I don't think that's the case here. >> >>Any starting points or ideas would be greatly appreciated. > > > Make sure the server is able to dump a core file, and perhaps crank up > the logging level. Just to verify that a core dump is possible for the postgresql user, I wrote a tiny C program that just calls abort(). Sure enough, running it gives me a core file. I cranked the debug level up to 5 (man that's a lot), and here's what I think is the relevent chunk: <geekispv2%geekisp-v2-test 192.168.4.38(31992) 9155>STATEMENT: UPDATE job_members SET "start_time" = NULL, "exit_code" = NULL, "create_time" = '2006-01-07 17:10:31', "job_id" = 30, "command" = '--- !ruby/object:GeekISP::ShellCommand args: command: /bin/ls /tmp/ exit_code: logger: !ruby/object:Logger datetime_format: level: 0 logdev: !ruby/object:Logger::LogDevice dev: !ruby/object:File {} filename: /mnt/scratch/dave/control_panel/trunk/config/../log/test.log shift_age: 0 shift_size: 1048576 progname: run_as: stderr: stdin: stdout: ', "finish_time" = NULL, "last_update_time" = NULL, "host_id" = 4, "status" = 'in_progress', "output" = NULL WHERE job_member_id = 105 <geekispv2%geekisp-v2-test 192.168.4.38(31992) 9155>DEBUG: name: unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 10290432/1/0, nestlvl: 1, children: <> <geekispv2%geekisp-v2-test 192.168.4.38(31992) 9155>STATEMENT: UPDATE job_members SET "start_time" = NULL, "exit_code" = NULL, "create_time" = '2006-01-07 17:10:31', "job_id" = 30, "command" = '--- !ruby/object:GeekISP::ShellCommand args: command: /bin/ls /tmp/ exit_code: logger: !ruby/object:Logger datetime_format: level: 0 logdev: !ruby/object:Logger::LogDevice dev: !ruby/object:File {} filename: /mnt/scratch/dave/control_panel/trunk/config/../log/test.log shift_age: 0 shift_size: 1048576 progname: run_as: stderr: stdin: stdout: ', "finish_time" = NULL, "last_update_time" = NULL, "host_id" = 4, "status" = 'in_progress', "output" = NULL WHERE job_member_id = 105 <geekispv2%geekisp-v2-test 192.168.4.38(31992) 9155>DEBUG: proc_exit(0) <geekispv2%geekisp-v2-test 192.168.4.38(31992) 9155>DEBUG: shmem_exit(0) <geekispv2%geekisp-v2-test 192.168.4.38(31992) 9155>LOG: disconnection: session time: 0:00:00.84 user=geekispv2 database=geekisp-v2-test host=192.168.4.38 port=31992 <geekispv2%geekisp-v2-test 192.168.4.38(31992) 9155>DEBUG: exit(0) <% 7303>DEBUG: reaping dead processes <% 7303>DEBUG: server process (PID 9155) exited with exit code 0 That looks to me like a clean and normal exit. This is pointing more and more towards the client in the ruby case, isn't it? Regards, -- Dave Steinberg http://www.geekisp.com/ http://www.steinbergcomputing.com/
pgsql-general by date: