Thread: Segmentation Fault
Last night one of my databases broke down temporary because of a segmentation fault. It has only happended this time and the database was fully recovered afterwards, but I was wondering what I can do anything to prevent it from happening again It happened while the backup was running (pg_dump & pgdumpall) Here are some details from the logs etc. The system is running Ubuntu Linux and I'm using the PostgreSQL package from the dapper repository: uname -a Linux db 2.6.15-26-amd64-server #1 SMP Fri Jul 7 20:02:26 UTC 2006 x86_64 GNU/Linux select version() PostgreSQL 8.1.4 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.0.gcc-opt (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5) pgsql log 2006-08-16 00:38:22 CEST - LOG: server process (PID 4792) was terminated by sig nal 11 2006-08-16 00:38:22 CEST - LOG: terminating any other active server processes 2006-08-16 00:38:22 CEST - WARNING: terminating connection because of crash of another server process 2006-08-16 00:38:22 CEST - DETAIL: The postmaster has commanded this server pro cess to roll back the current transaction and exit, because another server proce ss exited abnormally and possibly corrupted shared memory. 2006-08-16 00:38:22 CEST - HINT: In a moment you should be able to reconnect to the database and repeat your command. DETAIL and HINT repeated for every connection 2006-08-16 00:38:23 CEST - LOG: all server processes terminated; reinitializing 2006-08-16 00:38:23 CEST - LOG: database system was interrupted at 2006-08-16 0 0:36:21 CEST 2006-08-16 00:38:23 CEST - LOG: checkpoint record is at 5/4F9FDC00 2006-08-16 00:38:23 CEST - LOG: redo record is at 5/4F9B3558; undo record is at 0/0; shutdown FALSE 2006-08-16 00:38:23 CEST - LOG: next transaction ID: 5408607; next OID: 30199 2006-08-16 00:38:23 CEST - LOG: next MultiXactId: 1; next MultiXactOffset: 0 2006-08-16 00:38:23 CEST - LOG: database system was not properly shut down; aut omatic recovery in progress 2006-08-16 00:38:23 CEST - FATAL: the database system is starting up 2006-08-16 00:38:23 CEST - LOG: redo starts at 5/4F9B3558 2006-08-16 00:38:23 CEST - LOG: record with zero length at 5/4FB63C18 2006-08-16 00:38:23 CEST - LOG: redo done at 5/4FB63BE8 2006-08-16 00:38:26 CEST - LOG: database system is ready 2006-08-16 00:38:26 CEST - LOG: transaction ID wrap limit is 1073864149, limite d by database "db" At 00:36:21 this was happening in the pgsql log 2006-08-16 00:36:21 CEST - LOG: duration: 14673.110 ms statement: EXECUTE <unn amed> [PREPARE: select * from app.insert_unitstat($1,$2,$3,$4,$5,$6,$7, $8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21) as result] 2006-08-16 00:36:21 CEST - LOG: duration: 8730.029 ms statement: EXECUTE <unna med> [PREPARE: select * from app.insert_unitstat($1,$2,$3,$4,$5,$6,$7,$ 8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21) as result] 2006-08-16 00:36:21 CEST - LOG: duration: 5982.330 ms statement: EXECUTE <unna med> [PREPARE: select * from app.insert_unitstat($1,$2,$3,$4,$5,$6,$7,$ 8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21) as result] 2006-08-16 00:36:21 CEST - LOG: duration: 10404.601 ms statement: EXECUTE <unn amed> [PREPARE: select * from app.insert_unitstat($1,$2,$3,$4,$5,$6,$7, $8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21) as result] These statements are called in a plpgsql function and the function is called via JDBC using postgresql-8.1-407.jdbc3.jar dmesg [2425253.737383] postmaster[4792]: segfault at 00002aaab6f0e000 rip 00002aaaab73795b rsp 00007fffff8c9228 error 4 Any suggestions ? Thanks, Poul
In response to "Poul Møller Hansen" <freebsd@pbnet.dk>: > Last night one of my databases broke down temporary because of a > segmentation fault. > It has only happended this time and the database was fully recovered > afterwards, > but I was wondering what I can do anything to prevent it from happening > again > > It happened while the backup was running (pg_dump & pgdumpall) > Here are some details from the logs etc. > > The system is running Ubuntu Linux and I'm using the PostgreSQL package > from the dapper repository: > uname -a > Linux db 2.6.15-26-amd64-server #1 SMP Fri Jul 7 20:02:26 UTC 2006 > x86_64 GNU/Linux > > select version() > PostgreSQL 8.1.4 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.0.gcc-opt > (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5) > > pgsql log > 2006-08-16 00:38:22 CEST - LOG: server process (PID 4792) was > terminated by sig > nal 11 Sig 11 are frequently the result of hardware problems. Make sure the system has enough cooling and consistent power. Stress test the RAM, MMU, and other components to ensure that they will function reliably under load. -- Bill Moran Collaborative Fusion Inc.
> dmesg > [2425253.737383] postmaster[4792]: segfault at 00002aaab6f0e000 rip > 00002aaaab73795b rsp 00007fffff8c9228 error 4 > > > Any suggestions ? Do you trust that machine's RAM? Can you try running memtest86 for some extended period of time? (just to make sure it's not a hardware issue) Bye, Chris. -- Chris Mair http://www.1006.org
>> dmesg >> [2425253.737383] postmaster[4792]: segfault at 00002aaab6f0e000 rip >> 00002aaaab73795b rsp 00007fffff8c9228 error 4 >> >> >> Any suggestions ? >> > > Do you trust that machine's RAM? > Can you try running memtest86 for some extended period of time? > > (just to make sure it's not a hardware issue) > > Well even that it's not cheap hardware one can never be sure that it's ok. It's a production server, so I guess it has to be a night job... Thanks, Poul
=?ISO-8859-1?Q?Poul_M=F8ller_Hansen?= <freebsd@pbnet.dk> writes: > Last night one of my databases broke down temporary because of a > segmentation fault. > At 00:36:21 this was happening in the pgsql log > 2006-08-16 00:36:21 CEST - LOG: duration: 14673.110 ms statement: > EXECUTE <unn > amed> [PREPARE: select * from app.insert_unitstat($1,$2,$3,$4,$5,$6,$7, > $8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21) as result] > ... > These statements are called in a plpgsql function and the function is > called via JDBC Given that you're using duration logging and JDBC, I wonder whether you didn't trip over this recently-identified bug: http://archives.postgresql.org/pgsql-hackers/2006-08/msg00815.php Patch is here: http://archives.postgresql.org/pgsql-committers/2006-08/msg00278.php regards, tom lane
> Given that you're using duration logging and JDBC, I wonder whether you > didn't trip over this recently-identified bug: > http://archives.postgresql.org/pgsql-hackers/2006-08/msg00815.php > Patch is here: > http://archives.postgresql.org/pgsql-committers/2006-08/msg00278.php > Sorry but didn't, but I noticed this: Also I must notice that the segfault only occur if log_min_duration_statement is set to 0 It's currently 1000, so will the patch help ? Regards, Poul
=?ISO-8859-1?Q?Poul_M=F8ller_Hansen?= <freebsd@pbnet.dk> writes: >> Given that you're using duration logging and JDBC, I wonder whether you >> didn't trip over this recently-identified bug: >> http://archives.postgresql.org/pgsql-hackers/2006-08/msg00815.php >> Patch is here: >> http://archives.postgresql.org/pgsql-committers/2006-08/msg00278.php > Sorry but didn't, but I noticed this: > Also I must notice that the segfault only occur if > log_min_duration_statement is set to 0 I don't believe that statement actually ... it might have chanced to act that way in one or two trials for Sergey, but since the bug essentially consists in access to already-freed-and-perhaps-reused memory, it's not very predictable whether it will fail visibly or not. In any case the problem could occur for any duration-logging attempt. regards, tom lane