Thread: Segmentation Fault

Segmentation Fault

From

Poul Møller Hansen

Date:

16 August 2006, 09:33:40

Last night one of my databases broke down temporary because of a
segmentation fault.
It has only happended this time and the database was fully recovered
afterwards,
but I was wondering what I can do anything to prevent it from happening
again

It happened while the backup was running (pg_dump & pgdumpall)
Here are some details from the logs etc.

The system is running Ubuntu Linux and I'm using the PostgreSQL package
from the dapper repository:
uname -a
Linux db 2.6.15-26-amd64-server #1 SMP Fri Jul 7 20:02:26 UTC 2006
x86_64 GNU/Linux

select version()
PostgreSQL 8.1.4 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.0.gcc-opt
(GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5)

pgsql log
2006-08-16 00:38:22 CEST - LOG:  server process (PID 4792) was
terminated by sig
nal 11
2006-08-16 00:38:22 CEST - LOG:  terminating any other active server
processes
2006-08-16 00:38:22 CEST - WARNING:  terminating connection because of
crash of
another server process
2006-08-16 00:38:22 CEST - DETAIL:  The postmaster has commanded this
server pro
cess to roll back the current transaction and exit, because another
server proce
ss exited abnormally and possibly corrupted shared memory.
2006-08-16 00:38:22 CEST - HINT:  In a moment you should be able to
reconnect to
 the database and repeat your command.

DETAIL and HINT repeated for every connection

2006-08-16 00:38:23 CEST - LOG:  all server processes terminated;
reinitializing
2006-08-16 00:38:23 CEST - LOG:  database system was interrupted at
2006-08-16 0
0:36:21 CEST
2006-08-16 00:38:23 CEST - LOG:  checkpoint record is at 5/4F9FDC00
2006-08-16 00:38:23 CEST - LOG:  redo record is at 5/4F9B3558; undo
record is at
 0/0; shutdown FALSE
2006-08-16 00:38:23 CEST - LOG:  next transaction ID: 5408607; next OID:
30199
2006-08-16 00:38:23 CEST - LOG:  next MultiXactId: 1; next
MultiXactOffset: 0
2006-08-16 00:38:23 CEST - LOG:  database system was not properly shut
down; aut
omatic recovery in progress
2006-08-16 00:38:23 CEST - FATAL:  the database system is starting up
2006-08-16 00:38:23 CEST - LOG:  redo starts at 5/4F9B3558
2006-08-16 00:38:23 CEST - LOG:  record with zero length at 5/4FB63C18
2006-08-16 00:38:23 CEST - LOG:  redo done at 5/4FB63BE8
2006-08-16 00:38:26 CEST - LOG:  database system is ready
2006-08-16 00:38:26 CEST - LOG:  transaction ID wrap limit is
1073864149, limite
d by database "db"

At 00:36:21 this was happening in the pgsql log
2006-08-16 00:36:21 CEST - LOG:  duration: 14673.110 ms  statement:
EXECUTE <unn
amed>  [PREPARE:  select * from app.insert_unitstat($1,$2,$3,$4,$5,$6,$7,
$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21)  as result]
2006-08-16 00:36:21 CEST - LOG:  duration: 8730.029 ms  statement:
EXECUTE <unna
med>  [PREPARE:  select * from app.insert_unitstat($1,$2,$3,$4,$5,$6,$7,$
8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21)  as result]
2006-08-16 00:36:21 CEST - LOG:  duration: 5982.330 ms  statement:
EXECUTE <unna
med>  [PREPARE:  select * from app.insert_unitstat($1,$2,$3,$4,$5,$6,$7,$
8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21)  as result]
2006-08-16 00:36:21 CEST - LOG:  duration: 10404.601 ms  statement:
EXECUTE <unn
amed>  [PREPARE:  select * from app.insert_unitstat($1,$2,$3,$4,$5,$6,$7,
$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21)  as result]

These statements are called in a plpgsql function and the function is
called via JDBC
using postgresql-8.1-407.jdbc3.jar

dmesg
[2425253.737383] postmaster[4792]: segfault at 00002aaab6f0e000 rip
00002aaaab73795b rsp 00007fffff8c9228 error 4


Any suggestions ?

Thanks,
 Poul

Re: Segmentation Fault

From

Bill Moran

Date:

16 August 2006, 09:40:31

In response to "Poul Møller Hansen" <freebsd@pbnet.dk>:

> Last night one of my databases broke down temporary because of a
> segmentation fault.
> It has only happended this time and the database was fully recovered
> afterwards,
> but I was wondering what I can do anything to prevent it from happening
> again
>
> It happened while the backup was running (pg_dump & pgdumpall)
> Here are some details from the logs etc.
>
> The system is running Ubuntu Linux and I'm using the PostgreSQL package
> from the dapper repository:
> uname -a
> Linux db 2.6.15-26-amd64-server #1 SMP Fri Jul 7 20:02:26 UTC 2006
> x86_64 GNU/Linux
>
> select version()
> PostgreSQL 8.1.4 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.0.gcc-opt
> (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5)
>
> pgsql log
> 2006-08-16 00:38:22 CEST - LOG:  server process (PID 4792) was
> terminated by sig
> nal 11

Sig 11 are frequently the result of hardware problems.  Make sure the
system has enough cooling and consistent power.  Stress test the RAM, MMU,
and other components to ensure that they will function reliably under
load.

--
Bill Moran
Collaborative Fusion Inc.

Re: Segmentation Fault

From

Chris Mair

Date:

16 August 2006, 09:41:12

> dmesg
> [2425253.737383] postmaster[4792]: segfault at 00002aaab6f0e000 rip
> 00002aaaab73795b rsp 00007fffff8c9228 error 4
>
>
> Any suggestions ?

Do you trust that machine's RAM?
Can you try running memtest86 for some extended period of time?

(just to make sure it's not a hardware issue)

Bye, Chris.


--

Chris Mair
http://www.1006.org

Re: Segmentation Fault

From

Poul Møller Hansen

Date:

16 August 2006, 10:37:50

>> dmesg
>> [2425253.737383] postmaster[4792]: segfault at 00002aaab6f0e000 rip
>> 00002aaaab73795b rsp 00007fffff8c9228 error 4
>>
>>
>> Any suggestions ?
>>
>
> Do you trust that machine's RAM?
> Can you try running memtest86 for some extended period of time?
>
> (just to make sure it's not a hardware issue)
>
>
Well even that it's not cheap hardware one can never be sure that it's ok.
It's a production server, so I guess it has to be a night job...

Thanks,
 Poul

Re: Segmentation Fault

From

Tom Lane

Date:

16 August 2006, 11:31:49

=?ISO-8859-1?Q?Poul_M=F8ller_Hansen?= <freebsd@pbnet.dk> writes:
> Last night one of my databases broke down temporary because of a
> segmentation fault.

> At 00:36:21 this was happening in the pgsql log
> 2006-08-16 00:36:21 CEST - LOG:  duration: 14673.110 ms  statement:
> EXECUTE <unn
> amed>  [PREPARE:  select * from app.insert_unitstat($1,$2,$3,$4,$5,$6,$7,
> $8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20,$21)  as result]
> ...
> These statements are called in a plpgsql function and the function is
> called via JDBC

Given that you're using duration logging and JDBC, I wonder whether you
didn't trip over this recently-identified bug:
http://archives.postgresql.org/pgsql-hackers/2006-08/msg00815.php
Patch is here:
http://archives.postgresql.org/pgsql-committers/2006-08/msg00278.php

            regards, tom lane

Re: Segmentation Fault

From

Poul Møller Hansen

Date:

16 August 2006, 14:33:40

> Given that you're using duration logging and JDBC, I wonder whether you
> didn't trip over this recently-identified bug:
> http://archives.postgresql.org/pgsql-hackers/2006-08/msg00815.php
> Patch is here:
> http://archives.postgresql.org/pgsql-committers/2006-08/msg00278.php
>

Sorry but didn't, but I noticed this:

Also I must notice that the segfault only occur if
log_min_duration_statement is set to 0

It's currently 1000, so will the patch help ?


Regards,
 Poul

Re: Segmentation Fault

From

Tom Lane

Date:

16 August 2006, 17:21:34

=?ISO-8859-1?Q?Poul_M=F8ller_Hansen?= <freebsd@pbnet.dk> writes:
>> Given that you're using duration logging and JDBC, I wonder whether you
>> didn't trip over this recently-identified bug:
>> http://archives.postgresql.org/pgsql-hackers/2006-08/msg00815.php
>> Patch is here:
>> http://archives.postgresql.org/pgsql-committers/2006-08/msg00278.php

> Sorry but didn't, but I noticed this:
> Also I must notice that the segfault only occur if
> log_min_duration_statement is set to 0

I don't believe that statement actually ... it might have chanced to act
that way in one or two trials for Sergey, but since the bug essentially
consists in access to already-freed-and-perhaps-reused memory, it's not
very predictable whether it will fail visibly or not.  In any case the
problem could occur for any duration-logging attempt.

            regards, tom lane