Thread: 73.5 and uw 713
Hi all, I've upgraded my system from 7.3.4 to 7.3.5 yesterday and have already experienced to crash during vacuum full. I have'nt recompiled with debug yet but it's a sigsegv in function repair_frag in vacuum.c Does it ring a bell? Regards -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: ohp@pyrenet.fr ------------------------------------------------------------------------------ Make your life a dream, make your dream a reality. (St Exupery)
ohp@pyrenet.fr writes: > I've upgraded my system from 7.3.4 to 7.3.5 yesterday and have already > experienced to crash during vacuum full. > I have'nt recompiled with debug yet but it's a sigsegv in function > repair_frag in vacuum.c Considering that vacuum.c hasn't changed in that branch since 7.3beta4, it's highly unlikely that this represents a regression between 7.3.4 and 7.3.5. Pre-existing bug, maybe ... regards, tom lane
Is there ay way I can help with this debugging? On Mon, 8 Dec 2003, Tom Lane wrote: > Date: Mon, 08 Dec 2003 14:03:42 -0500 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: ohp@pyrenet.fr > Cc: pgsql-hackers list <pgsql-hackers@postgresql.org> > Subject: Re: [HACKERS] 73.5 and uw 713 > > ohp@pyrenet.fr writes: > > I've upgraded my system from 7.3.4 to 7.3.5 yesterday and have already > > experienced to crash during vacuum full. > > I have'nt recompiled with debug yet but it's a sigsegv in function > > repair_frag in vacuum.c > > Considering that vacuum.c hasn't changed in that branch since 7.3beta4, > it's highly unlikely that this represents a regression between 7.3.4 and > 7.3.5. Pre-existing bug, maybe ... > > regards, tom lane > -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: ohp@pyrenet.fr ------------------------------------------------------------------------------ Make your life a dream, make your dream a reality. (St Exupery)
ohp@pyrenet.fr writes: > Is there ay way I can help with this debugging? Can you speculate on what might have caused the crash? Is the crash reproducible? When the backend crashed, it should have produced a core file (assuming your system is configured to do so). Can you post the stacktrace you can get from this core file (preferably after you've recompiled PG with debugging symbols) and post it to the list? -Neil
Hi Neil and Tom On Mon, 8 Dec 2003, Neil Conway wrote: > Date: Mon, 08 Dec 2003 22:44:42 -0500 > From: Neil Conway <neilc@samurai.com> > To: ohp@pyrenet.fr > Cc: Tom Lane <tgl@sss.pgh.pa.us>, > pgsql-hackers list <pgsql-hackers@postgresql.org> > Subject: Re: [HACKERS] 73.5 and uw 713 > > ohp@pyrenet.fr writes: > > Is there ay way I can help with this debugging? > > Can you speculate on what might have caused the crash? > On a second tought, this has been compiled with the new SCO compiler... The one Larry removed -Kno_host for. Dunno if it's related. > Is the crash reproducible? > Yes.. On certain databases it'll ALWAYS crash > When the backend crashed, it should have produced a core file > (assuming your system is configured to do so). Can you post the > stacktrace you can get from this core file (preferably after you've > recompiled PG with debugging symbols) and post it to the list? That's the problem, I had another crash this night at 2:30 am (I vacuumdb -a all databases at that time) I decided to recompile everything with -debug turned on and could'nt reproduce any more. It remembers me the crash I had with pg_dump last summer... If I can manage to get a good stack trace, I'll post it> > -Neil > > -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: ohp@pyrenet.fr ------------------------------------------------------------------------------ Make your life a dream, make your dream a reality. (St Exupery)
All right Tom, I managed to get a trace: Script started on Tue Dec 9 23:02:13 2003 # debug -c base/2308232/core.2509 39 /usr/local/pgsql/bin/postmaster Avertissement: Fichier image mémoire tronqué Erreur: Impossible de trouver le segment de mémoire associé à l'adresse 0xbfffd00c dans le processus p1 Image mémoire de postmaster (processus p1) créée Erreur: Top stack frame invalid, program counter out of range Avertissement: Stack adjusted to start with previous frame FICHIER IMAGE MEMOIRE [AllocSetFree dans aset.c] 11 (segv code[SEGV_MAPERR] address[0x8426000]) SIGNALE dans p1 782: int fidx = AllocSetFreeIndex(chunk->size); debug> stack Suivi de pile correspondant à p1, Programme postmaster *[0] AllocSetFree(context=0x2, pointer=0x80469e4, présumé: 0x80ea706) [aset.c@782][1] ?() [0xbffae010] script done on Tue Dec 9 23:06:05 2003 Not sure it helps... Regards On Mon, 8 Dec 2003, Tom Lane wrote: > Date: Mon, 08 Dec 2003 15:49:05 -0500 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: ohp@pyrenet.fr > Subject: Re: [HACKERS] 73.5 and uw 713 > > > Is there ay way I can help with this debugging? > > Well, for starters, how about that debug backtrace? > > regards, tom lane > -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: ohp@pyrenet.fr ------------------------------------------------------------------------------ Make your life a dream, make your dream a reality. (St Exupery)
Hi Tom, At last I have a much better trace for the vacuum full bug. Can some one help me on this one? Image mémoire de postmaster (processus p1) créée FICHIER IMAGE MEMOIRE [swapn dans qsort.c] 11 (segv code[SEGV_MAPERR] address[0x8420000]) SIGNALE dans p1 0xbffae03f (swapn+47:) movl (%esi),%eax debug> Suivi de pile correspondant à p1, Programme postmaster *[0] swapn(0x2, 0x831b758, 0x831b770) [0xbffae03f][1] qst(0x80448cc, 0x831b758, 0x831b788) [0xbffadca2][2] qsort(0x831b758,0x18, 0x2, 0x80eb9f8) [0xbffae17f][3] repair_frag(vacrelstats=0x83122bc, onerel=0x82cf56c, vacuum_pages=0x8046a64,fraged_pages=0x8046a54, nindexes=1, Irel=0x83672e0) [vacuum.c@2227][4] full_vacuum_rel(onerel=0x82cf56c,vacstmt=0x83104b4) [vacuum.c@955][5] vacuum_rel(relid=16408, vacstmt=0x83104b4, expected_relkind=114(or 'r')) [vacuum.c@827][6] vacuum(vacstmt=0x83104b4) [vacuum.c@290][7] ProcessUtility(parsetree=0x83104b4,dest=Remote, completionTag="") [utility.c@gram.y@713][8] pg_exec_query_string(query_string=0x831020c,dest=Remote, parse_context=0x830e204) [postgres.c@gram.y@789][9] PostgresMain(argc=4,argv=0x8046d78, username="ohp") [postgres.c@gram.y@2013][10] DoBackend(port=0x829e500) [postmaster.c@2310][11]BackendStartup(port=0x829e500) [postmaster.c@1932][12] ServerLoop( présumé: 0x1, 0x8297af8, 0x1) [postmaster.c@1009][13] PostmasterMain(argc=1, argv=0x8297af8) [postmaster.c@788][14] main(argc=1, argv=0x8047c44,0x8047c4c) [main.c@210][15] _start() [0x806ad1c] debug> On Mon, 8 Dec 2003, Tom Lane wrote: > Date: Mon, 08 Dec 2003 14:03:42 -0500 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: ohp@pyrenet.fr > Cc: pgsql-hackers list <pgsql-hackers@postgresql.org> > Subject: Re: [HACKERS] 73.5 and uw 713 > > ohp@pyrenet.fr writes: > > I've upgraded my system from 7.3.4 to 7.3.5 yesterday and have already > > experienced to crash during vacuum full. > > I have'nt recompiled with debug yet but it's a sigsegv in function > > repair_frag in vacuum.c > > Considering that vacuum.c hasn't changed in that branch since 7.3beta4, > it's highly unlikely that this represents a regression between 7.3.4 and > 7.3.5. Pre-existing bug, maybe ... > > regards, tom lane > -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: ohp@pyrenet.fr ------------------------------------------------------------------------------ Make your life a dream, make your dream a reality. (St Exupery)