Thread: PostgreSQL server terminated by signal 11
<div class="Section1"><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial">Hi,</span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Arial">My PostgreSQL server running on a Linux machine is terminated by signal 11 whenever I try to createsome indexes on a table, which contains quite a lot of data. However I succeeded in creating some other indexes withouthaving the PostgreSQL server terminated:</span></font><p class="MsoNormal"><font face="Arial" size="2"><span lang="EN-CA"style="font-size: 10.0pt;font-family:Arial"> </span></font><p class="MsoNormal"><font face="Courier" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">agora=> CREATE INDEX IDX_GSLOG_EVENTTIME</span></font><p class="MsoNormal"><font face="Courier"size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">agora-> ON GSLOG_EVENT (EVENT_DATE_CREATED);</span></font><p class="MsoNormal"><font face="Courier"size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">CREATE INDEX</span></font><p class="MsoNormal"><font face="Courier" size="2"><span lang="EN-CA"style="font-size: 10.0pt;font-family:Courier">Time: 152908.797 ms</span></font><p class="MsoNormal"><font face="Courier" size="2"><span lang="EN-CA"style="font-size: 10.0pt;font-family:Courier">agora=> explain analyze select max(event_date_created) from gslog_event;</span></font><p class="MsoNormal"><fontface="Courier" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier"> QUERY PLAN </span></font><p class="MsoNormal"><font face="Courier" size="2"><spanlang="EN-CA" style="font-size: 10.0pt;font-family:Courier">----------------------------------------------------------------------------------------------------------------------------------------------------------------------</span></font><p class="MsoNormal"><fontface="Courier" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier"> Result (cost=3.80..3.81 rows=1 width=0) (actual time=0.218..0.221 rows=1 loops=1)</span></font><pclass="MsoNormal"><font face="Courier" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier"> InitPlan</span></font><p class="MsoNormal"><font face="Courier" size="2"><span lang="EN-CA"style="font-size: 10.0pt;font-family:Courier"> -> Limit (cost=0.00..3.80 rows=1 width=8) (actual time=0.197..0.200 rows=1 loops=1)</span></font><pclass="MsoNormal"><font face="Courier" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier"> -> Index Scan Backward using idx_gslog_eventtime on gslog_event (cost=0.00..39338251.59rows=10348246 width=8) (actual time=0.188..0.188 rows=1 loops=1)</span></font><p class="MsoNormal"><fontface="Courier" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier"> Filter: (event_date_created IS NOT NULL)</span></font><p class="MsoNormal"><fontface="Courier" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier"> Total runtime: 0.324 ms</span></font><p class="MsoNormal"><font face="Courier" size="2"><spanlang="EN-CA" style="font-size: 10.0pt;font-family:Courier">(6 rows)</span></font><p class="MsoNormal"><font face="Courier" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier"> </span></font><p class="MsoNormal"><font face="Courier" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">Time: 41.085 ms</span></font><p class="MsoNormal"><font face="Courier" size="2"><span lang="EN-CA"style="font-size: 10.0pt;font-family:Courier">agora=> CREATE INDEX IDX_GSLOG_EVENT_SPREAD_PROTOCOL_NAME</span></font><p class="MsoNormal"><fontface="Courier" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">agora-> ON GSLOG_EVENT (EVENT_DATE_CREATED)</span></font><p class="MsoNormal"><font face="Courier"size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">agora-> WHERE EVENT_NAME::text <> 'player-login'::text</span></font><p class="MsoNormal"><fontface="Courier" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">agora-> AND PLAYER_USERNAME IS NOT NULL</span></font><p class="MsoNormal"><font face="Courier"size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">agora-> AND GAME_CLIENT_VERSION IS NULL;</span></font><p class="MsoNormal"><font face="Courier"size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">server closed the connection unexpectedly</span></font><p class="MsoNormal"><font face="Courier"size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier"> This probably means the server terminated abnormally</span></font><p class="MsoNormal"><fontface="Courier" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier"> before or while processing the request.</span></font><p class="MsoNormal"><font face="Courier"size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">The connection to the server was lost. </span></font><font face="Courier" size="2"><span style="font-size:10.0pt;font-family:Courier">Attemptingreset: Failed.</span></font><p class="MsoNormal"><font face="Arial"size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Arial">The PostgreSQL log file doesn’t give more information about what went wrong, except that the serverprocess has been terminated:</span></font><p class="MsoNormal"><font face="Arial" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Arial"> </span></font><p class="MsoNormal"><font face="Courier" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">LOG: server process (PID 22270) was terminated by signal 11</span></font><p class="MsoNormal"><fontface="Courier" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">LOG: terminating any other active server processes</span></font><p class="MsoNormal"><font face="Courier"size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">LOG: all server processes terminated; reinitializing</span></font><p class="MsoNormal"><fontface="Courier" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">FATAL: the database system is starting up</span></font><p class="MsoNormal"><font face="Courier"size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">LOG: database system was interrupted at 2006-07-27 15:29:27 GMT</span></font><p class="MsoNormal"><fontface="Courier" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">LOG: checkpoint record is at 249/179D44A8</span></font><p class="MsoNormal"><font face="Courier"size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">LOG: redo record is at 249/179D44A8; undo record is at 0/0; shutdown FALSE</span></font><p class="MsoNormal"><fontface="Courier" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">LOG: next transaction ID: 543712876; next OID: 344858</span></font><p class="MsoNormal"><fontface="Courier" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">LOG: next MultiXactId: 2; next MultiXactOffset: 3</span></font><p class="MsoNormal"><font face="Courier"size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">LOG: database system was not properly shut down; automatic recovery in progress</span></font><pclass="MsoNormal"><font face="Courier" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">LOG: redo starts at 249/179D44EC</span></font><p class="MsoNormal"><font face="Courier" size="2"><spanlang="EN-CA" style="font-size: 10.0pt;font-family:Courier">LOG: record with zero length at 249/179E4888</span></font><p class="MsoNormal"><font face="Courier"size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Courier">LOG: redo done at 249/179E2DFC</span></font><p class="MsoNormal"><font face="Courier" size="2"><spanlang="EN-CA" style="font-size: 10.0pt;font-family:Courier">LOG: database system is ready</span></font><p class="MsoNormal"><font face="Courier" size="2"><spanlang="EN-CA" style="font-size: 10.0pt;font-family:Courier">LOG: transaction ID wrap limit is 2147484146, limited by database "postgres"</span></font><pclass="MsoNormal"><font face="Arial" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Arial">I checked the memory installed on the machine, running memtest86 during more than one day; no errorfound. I checked bad blocks on every hard drive installed in this machine, using e2fsck -c /dev/hdxx; no bad blockfound. </span></font><p class="MsoNormal"><font face="Arial" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Arial">I’ve already dropped the table, inserted data, and tried to create all the indexes. The serversystematically crashed when creating some specific indexes. The only idea I have for the moment would be to setupanother machine with the same database environment. Other idea(s)?</span></font><p class="MsoNormal"><font face="Arial"size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Arial">Thanks</span></font><p class="MsoNormal"><font face="Arial" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Arial">--</span></font><span lang="EN-CA"></span><p class="MsoNormal"><font face="Arial" size="2"><spanlang="EN-CA" style="font-size: 10.0pt;font-family:Arial">Daniel CAUNE</span></font><span lang="EN-CA"></span><p class="MsoNormal"><font face="Arial" size="2"><spanlang="EN-CA" style="font-size: 10.0pt;font-family:Arial">Ubisoft Online Technology</span></font><span lang="EN-CA"></span><p class="MsoNormal"><font face="Arial"size="2"><span lang="EN-CA" style="font-size: 10.0pt;font-family:Arial">(514) 490 2040 ext. 3613</span></font><span lang="EN-CA"></span><p class="MsoNormal"><font face="TimesNew Roman" size="3"><span lang="EN-CA" style="font-size:12.0pt"> </span></font></div>
On Thu, 27 Jul 2006, Daniel Caune wrote: > My PostgreSQL server running on a Linux machine is terminated by signal > 11 whenever I try to create some indexes on a table, which contains > quite a lot of data. However I succeeded in creating some other indexes > without having the PostgreSQL server terminated: Daniel, I would guess this is more appropriate for the -admin list so I cc'd it. I think you are most likely running out of memory or running up against a ulimit on memory. I would first check my ulimit settings on the postgres user and see if they are a bit small. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
"Daniel Caune" <daniel.caune@ubisoft.com> writes: > My PostgreSQL server running on a Linux machine is terminated by signal > 11 whenever I try to create some indexes on a table, which contains > quite a lot of data. Judging from your examples it's got something to do with the partial index WHERE clause. What PG version is this exactly? If you leave out different parts of the WHERE, does it still crash? Does the crash happen immediately after you give the command, or does it run for awhile? It might be worth getting a stack trace from the failure (best way is to attach to the running backend with gdb, provoke the crash, and do "bt" --- search for "gdb" in the archives if you need details). regards, tom lane
> De : Tom Lane [mailto:tgl@sss.pgh.pa.us] > Envoyé : jeudi, juillet 27, 2006 16:06 > À : Daniel Caune > Cc : pgsql-sql@postgresql.org > Objet : Re: [SQL] PostgreSQL server terminated by signal 11 > > "Daniel Caune" <daniel.caune@ubisoft.com> writes: > > My PostgreSQL server running on a Linux machine is terminated by signal > > 11 whenever I try to create some indexes on a table, which contains > > quite a lot of data. > > Judging from your examples it's got something to do with the partial > index WHERE clause. What PG version is this exactly? If you leave out > different parts of the WHERE, does it still crash? Does the crash > happen immediately after you give the command, or does it run for > awhile? It might be worth getting a stack trace from the failure > (best way is to attach to the running backend with gdb, provoke the > crash, and do "bt" --- search for "gdb" in the archives if you need > details). > > regards, tom lane The postgres server version is 8.1.4. Yes, if leave the WHERE clause a simple index, I don't encounter any problem: CREATE INDEX IDX_GSLOG_EVENTTIME ON GSLOG_EVENT (EVENT_DATE_CREATED); Anyway, I'm not sure, Tom, that is only related to the WHERE clause as crash occur with composite index too, such as: CREATE INDEX IDX_GSLOG_EVENT_PLAYER_EVENT ON GSLOG_EVENT (PLAYER_USERNAME, EVENT_NAME); The crash may happen a while after sending the command. For example, supposing I reboot the Linux machine and I immediatelyrun the command (i.e. most of memory is unused), it takes more than five minutes before crash occurs. At suchtime the memory usage is the following (top every second): Mem: 2075860k total, 1787600k used, 288260k free, 6300k buffers Swap: 369452k total, 0k used, 369452k free, 1748032k cached When reconnecting to the new postgres respawn, it takes approximately the same time for having it crashing, whatever thenumber of times I proceed like this. I did some other tests trying to detect any common denominator that may make the postgres server crashing. Here some resultsare: select max(length(game_client_version)) from gslog_event; => [CRASH] select max(length(game_client_version)) from gslog_event where game_client_version is not null; => [OK, max = 28] select count(*) from gslog_event where length(game_client_version) >= 0; => [OK, count = 4463726] select count(*) from gslog_event where upper(game_client_version) = 'FARCRYPC1.33'; => [OK, count = 576318] select count(*) from gslog_event where lower(player_username) = 'lythanhphu'; => [CRASH] I was thinking about nullable value, but finally, you know what? I have strictly no idea! :-) I'll look at the archive for running postgres with gdb and provide more accurate information. Thanks, -- Daniel
> -----Message d'origine----- > De : Tom Lane [mailto:tgl@sss.pgh.pa.us] > Envoyé : jeudi, juillet 27, 2006 16:06 > À : Daniel Caune > Cc : pgsql-sql@postgresql.org > Objet : Re: [SQL] PostgreSQL server terminated by signal 11 > > "Daniel Caune" <daniel.caune@ubisoft.com> writes: > > My PostgreSQL server running on a Linux machine is terminated by signal > > 11 whenever I try to create some indexes on a table, which contains > > quite a lot of data. > > Judging from your examples it's got something to do with the partial > index WHERE clause. What PG version is this exactly? If you leave out > different parts of the WHERE, does it still crash? Does the crash > happen immediately after you give the command, or does it run for > awhile? It might be worth getting a stack trace from the failure > (best way is to attach to the running backend with gdb, provoke the > crash, and do "bt" --- search for "gdb" in the archives if you need > details). > > regards, tom lane Quite a long time I didn't use gdb! :-) Anyway I proceed as described hereafter; correct me if I was wrong. > ps -eaf | grep postgres postgres 2792 2789 0 21:50 pts/2 00:00:00 su postgres postgres 2793 2792 0 21:50 pts/2 00:00:00 bash postgres 2902 1 7 22:17 ? 00:01:10 postgres: dbo agora [local] idle postgres 2952 1 2 22:32 ? 00:00:00 /usr/lib/postgresql/8.1/bin/postmaster -D /var/lib/postgresql/8.1/main -cunix_socket_directory=/var/run/postgresql -c config_file=/etc/postgresql/8.1/main/postgresql.conf -c hba_file=/etc/postgresql/8.1/main/pg_hba.conf-c ident_file=/etc/postgresql/8.1/main/pg_ident.conf postgres 2954 2952 0 22:32 ? 00:00:00 postgres: writer process postgres 2955 2952 0 22:32 ? 00:00:00 postgres: stats buffer process postgres 2956 2955 0 22:32 ? 00:00:00 postgres: stats collector process I connected to the postgres server using psql and I retrieved the backend pid by executing the statement "SELECT pg_backend_pid();" I started gdb under the UNIX account postgres and I attached the backend process providing the pid returned by the statement. I run the command responsible for creating the index and I entered "continue" in gdb for executing the command. After awhile, the server crashes: Program received signal SIGSEGV, Segmentation fault. 0x08079e2a in slot_attisnull () (gdb) Continuing. Program terminated with signal SIGSEGV, Segmentation fault. The program no longer exists. I can't do "bt" since the program no longer exists. How can I provide more information, stack trace, and so on? -- Daniel
"Daniel Caune" <daniel.caune@ubisoft.com> writes: > I run the command responsible for creating the index and I entered "continue" in gdb for executing the command. Aftera while, the server crashes: > Program received signal SIGSEGV, Segmentation fault. > 0x08079e2a in slot_attisnull () > (gdb) > Continuing. > Program terminated with signal SIGSEGV, Segmentation fault. > The program no longer exists. > I can't do "bt" since the program no longer exists. I think you typed one carriage return too many and the thing re-executed the last command, ie, the continue. Try it again. The lack of arguments shown for slot_attisnull suggests that all we're going to get is a list of function names, without line numbers or argument values. If that's not enough to figure out the problem, can you rebuild with --enable-debug to get a more useful stack trace? regards, tom lane
On Thu, 27 Jul 2006 19:00:27 -0400 "Daniel Caune" <daniel.caune@ubisoft.com> wrote: > I run the command responsible for creating the index and I entered "continue" in gdb for executing the command. Aftera while, the server crashes: > > Program received signal SIGSEGV, Segmentation fault. > 0x08079e2a in slot_attisnull () That's a pretty small function. I don't see much room for error. This diff in src/backend/access/common/heaptuple.c seems like the most likely place to catch it. RCS file: /cvsroot/pgsql/src/backend/access/common/heaptuple.c,v retrieving revision 1.110 diff -u -p -u -r1.110 heaptuple.c --- heaptuple.c 14 Jul 2006 14:52:16 -0000 1.110 +++ heaptuple.c 27 Jul 2006 23:37:54 -0000 @@ -1470,8 +1470,13 @@ slot_getsomeattrs(TupleTableSlot *slot, bool slot_attisnull(TupleTableSlot *slot, int attnum) { - HeapTuple tuple = slot->tts_tuple; - TupleDesc tupleDesc = slot->tts_tupleDescriptor; + HeapTuple tuple; + TupleDesc tupleDesc; + + assert(slot != NULL); + + tuple = slot->tts_tuple; + tupleDesc = slot->tts_tupleDescriptor; /* * system attributes are handled by heap_attisnull Of course, you still have to find out what's calling it with slot set to NULL if that turns out to be the problem. It may also be that slot is not NULL but set to garbage. You could also add a notice there. Two, in fact. One to display the address of slot and one to display the value of slot->tts_tuple or slot->tts_tupleDescriptor. If the first shows a non NULL value and the second causes your crash that tells you that the value of slot is probably trashed before calling the function. Do this in conjunction with Tom Lanes suggestion of "--enable-debug" for more information. -- D'Arcy J.M. Cain <darcy@druid.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
> -----Message d'origine----- > De : pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] > De la part de Tom Lane > Envoyé : jeudi 27 juillet 2006 19:26 > À : Daniel Caune > Cc : pgsql-admin@postgresql.org; pgsql-sql@postgresql.org > Objet : Re: [SQL] PostgreSQL server terminated by signal 11 > > "Daniel Caune" <daniel.caune@ubisoft.com> writes: > > I run the command responsible for creating the index and I entered > "continue" in gdb for executing the command. After a while, the server > crashes: > > > Program received signal SIGSEGV, Segmentation fault. > > 0x08079e2a in slot_attisnull () > > (gdb) > > Continuing. > > > Program terminated with signal SIGSEGV, Segmentation fault. > > The program no longer exists. > > > I can't do "bt" since the program no longer exists. > > I think you typed one carriage return too many and the thing re-executed > the last command, ie, the continue. Try it again. > OK, I'll try that tomorrow morning. Perhaps can I set a conditional breakpoint to function slot_attisnull when parameterslot is null (or slot->tts_tupleDescriptor is null). > The lack of arguments shown for slot_attisnull suggests that all we're > going to get is a list of function names, without line numbers or > argument values. If that's not enough to figure out the problem, can > you rebuild with --enable-debug to get a more useful stack trace? > Well, I installed PostgreSQL using apt-get but that won't be a problem to get the source from the CVS repository and to builda postgres binary using the option you provide to me. Just let me the time to do that. :-) Thanks, -- Daniel
> -----Message d'origine----- > De : pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] > De la part de D'Arcy J.M. Cain > Envoyé : jeudi 27 juillet 2006 19:49 > À : Daniel Caune > Cc : tgl@sss.pgh.pa.us; pgsql-admin@postgresql.org; pgsql- > sql@postgresql.org > Objet : Re: [SQL] PostgreSQL server terminated by signal 11 > > On Thu, 27 Jul 2006 19:00:27 -0400 > "Daniel Caune" <daniel.caune@ubisoft.com> wrote: > > I run the command responsible for creating the index and I entered > "continue" in gdb for executing the command. After a while, the server > crashes: > > > > Program received signal SIGSEGV, Segmentation fault. > > 0x08079e2a in slot_attisnull () > > That's a pretty small function. I don't see much room for error. This > diff in src/backend/access/common/heaptuple.c seems like the most > likely place to catch it. > > RCS file: /cvsroot/pgsql/src/backend/access/common/heaptuple.c,v > retrieving revision 1.110 > diff -u -p -u -r1.110 heaptuple.c > --- heaptuple.c 14 Jul 2006 14:52:16 -0000 1.110 > +++ heaptuple.c 27 Jul 2006 23:37:54 -0000 > @@ -1470,8 +1470,13 @@ slot_getsomeattrs(TupleTableSlot *slot, > bool > slot_attisnull(TupleTableSlot *slot, int attnum) > { > - HeapTuple tuple = slot->tts_tuple; > - TupleDesc tupleDesc = slot->tts_tupleDescriptor; > + HeapTuple tuple; > + TupleDesc tupleDesc; > + > + assert(slot != NULL); > + > + tuple = slot->tts_tuple; > + tupleDesc = slot->tts_tupleDescriptor; > > /* > * system attributes are handled by heap_attisnull > > Of course, you still have to find out what's calling it with slot set > to NULL if that turns out to be the problem. It may also be that slot > is not NULL but set to garbage. You could also add a notice there. > Two, in fact. One to display the address of slot and one to display > the value of slot->tts_tuple or slot->tts_tupleDescriptor. If the > first shows a non NULL value and the second causes your crash that > tells you that the value of slot is probably trashed before > calling the function. > Yes, I was afraid to go that deeper, but it's time! :-)) Actually it seems, from the source code, that a null slot->tts_tuple won't lead to a segmentation fault in function slot_attisnull,while slot and slot->tts_tupleDescriptor will. I will trace the function trying to discover what goes wrongbehind the scene. > Do this in conjunction with Tom Lane suggestion of "--enable-debug" for > more information. > OK -- Daniel
Daniel CAUNE <d.caune@free.fr> writes: > Actually it seems, from the source code, that a null slot->tts_tuple > won't lead to a segmentation fault in function slot_attisnull, while > slot and slot->tts_tupleDescriptor will. I'll bet on D'Arcy's theory that slot is being passed in as NULL. Exactly why remains to be seen ... we need that stack trace! regards, tom lane
> -----Message d'origine----- > De : Tom Lane [mailto:tgl@sss.pgh.pa.us] > Envoyé : jeudi, juillet 27, 2006 19:26 > À : Daniel Caune > Cc : pgsql-admin@postgresql.org; pgsql-sql@postgresql.org > Objet : Re: [SQL] PostgreSQL server terminated by signal 11 > > "Daniel Caune" <daniel.caune@ubisoft.com> writes: > > I run the command responsible for creating the index and I entered > "continue" in gdb for executing the command. After a while, the server > crashes: > > > Program received signal SIGSEGV, Segmentation fault. > > 0x08079e2a in slot_attisnull () > > (gdb) > > Continuing. > > > Program terminated with signal SIGSEGV, Segmentation fault. > > The program no longer exists. > > > I can't do "bt" since the program no longer exists. > > I think you typed one carriage return too many and the thing re-executed > the last command, ie, the continue. Try it again. > You were right. Program received signal SIGSEGV, Segmentation fault. 0x08079e2a in slot_attisnull () (gdb) bt #0 0x08079e2a in slot_attisnull () #1 0x0807a1d0 in slot_getattr () #2 0x080c6c73 in FormIndexDatum () #3 0x080c6ef1 in IndexBuildHeapScan () #4 0x0809b44d in btbuild () #5 0x0825dfdd in OidFunctionCall3 () #6 0x080c4f95 in index_build () #7 0x080c68eb in index_create () #8 0x08117e36 in DefineIndex () #9 0x081db4ee in ProcessUtility () #10 0x081d8449 in PostgresMain () #11 0x081d99d5 in PortalRun () #12 0x081d509e in pg_parse_query () #13 0x081d6c33 in PostgresMain () #14 0x081aae91 in ClosePostmasterPorts () #15 0x081ac14c in PostmasterMain () #16 0x08168f22 in main () -- Daniel
"Daniel Caune" <daniel.caune@ubisoft.com> writes: > Program received signal SIGSEGV, Segmentation fault. > 0x08079e2a in slot_attisnull () > (gdb) bt > #0 0x08079e2a in slot_attisnull () > #1 0x0807a1d0 in slot_getattr () > #2 0x080c6c73 in FormIndexDatum () > #3 0x080c6ef1 in IndexBuildHeapScan () > #4 0x0809b44d in btbuild () > #5 0x0825dfdd in OidFunctionCall3 () > #6 0x080c4f95 in index_build () > #7 0x080c68eb in index_create () > #8 0x08117e36 in DefineIndex () Hmph. gdb is lying to you, because slot_getattr doesn't call slot_attisnull. This isn't too unusual in a non-debug build, because the symbol table is incomplete (no mention of non-global functions). Given that this doesn't happen right away, but only after it's been processing for awhile, we can assume that FormIndexDatum has been successfully iterated many times already, which seems to eliminate theories like the slot or the keycol value being bogus. I'm pretty well convinced now that we're looking at a problem with corrupted data. Can you do a SELECT * FROM (or COPY FROM) the table without error? regards, tom lane
> De : Tom Lane [mailto:tgl@sss.pgh.pa.us] > Envoyé : vendredi, juillet 28, 2006 09:38 > À : Daniel Caune > Cc : pgsql-admin@postgresql.org; pgsql-sql@postgresql.org > Objet : Re: [SQL] PostgreSQL server terminated by signal 11 > > "Daniel Caune" <daniel.caune@ubisoft.com> writes: > > Program received signal SIGSEGV, Segmentation fault. > > 0x08079e2a in slot_attisnull () > > (gdb) bt > > #0 0x08079e2a in slot_attisnull () > > #1 0x0807a1d0 in slot_getattr () > > #2 0x080c6c73 in FormIndexDatum () > > #3 0x080c6ef1 in IndexBuildHeapScan () > > #4 0x0809b44d in btbuild () > > #5 0x0825dfdd in OidFunctionCall3 () > > #6 0x080c4f95 in index_build () > > #7 0x080c68eb in index_create () > > #8 0x08117e36 in DefineIndex () > > Hmph. gdb is lying to you, because slot_getattr doesn't call > slot_attisnull. > This isn't too unusual in a non-debug build, because the symbol table is > incomplete (no mention of non-global functions). > > Given that this doesn't happen right away, but only after it's been > processing for awhile, we can assume that FormIndexDatum has been > successfully iterated many times already, which seems to eliminate > theories like the slot or the keycol value being bogus. I'm pretty well > convinced now that we're looking at a problem with corrupted data. Can > you do a SELECT * FROM (or COPY FROM) the table without error? > > regards, tom lane The statement "copy gslog_event to stdout;" leads to "ERROR: invalid memory alloc request size 4294967293" after awhile. (...) 354964834 2006-07-19 10:53:42.813+00 (...) 354964835 2006-07-19 10:53:44.003+00 (...) ERROR: invalid memory alloc request size 4294967293 I tried then "select * from gslog_event where gslog_event_id >= 354964834 and gslog_event_id <= 354964900;": 354964834 | 2006-07-19 10:53:42.813+00 | (...) 354964835 | 2006-07-19 10:53:44.003+00 | (...) 354964837 | 2006-07-19 10:53:44.113+00 | (...) 354964838 | 2006-07-19 10:53:44.223+00 | (...) (...) (66 rows) The statement "select * from gslog_event;" leads to "Killed"... Ouch! The psql client just exits (the postgres server crashestoo)! The statement "select * from gslog_event where gslog_event_id <= 354964834;" passed. I did other tests on some other tables that contain less data but that seem also corrupted: copy player to stdout ERROR: invalid memory alloc request size 1918988375 select * from player where id >=771042 and id<=771043; ERROR: invalid memory alloc request size 1918988375 select max(length(username)) from player; ERROR: invalid memory alloc request size 1918988375 select max(length(username)) from player where id <= 771042; max ----- 15 select max(length(username)) from player where id >= 771050; max ----- 15 select max(length(username)) from player where id >= 771044 and id <= 771050; max ----- 13 Finally: select * from player where id=771043; ERROR: invalid memory alloc request size 1918988375 select id from player where id=771043; id -------- 771043 (1 row) agora=> select username from player where id=771043; ERROR: invalid memory alloc request size 1918988375 I'm also pretty much convinced that there are some corrupted data, especially varchar row. Before dropping corrupted rows,is there a way to read part of corrupted data? Thanks Tom for your great support. I'm just afraid that I wasted your time... Anyway I'll write a FAQ that provides someinformation about this kind of problem we have faced. Regards, -- Daniel
"Daniel Caune" <daniel.caune@ubisoft.com> writes: > The statement "copy gslog_event to stdout;" leads to "ERROR: invalid memory alloc request size 4294967293" after awhile. > ... > I did other tests on some other tables that contain less data but that seem also corrupted: This is a bit scary as it suggests a systemic problem. You should definitely try to find out exactly what the corruption looks like. It's usually not hard to home in on where the first corrupted row is --- you do SELECT ctid, * FROM tab LIMIT n; and determine the largest value of n that won't trigger a failure. The corrupted region is then just after the last ctid you see. You can look at those blocks with "pg_filedump -i -f" and see if anything pops out. Check the PG archives for previous discussions of dealing with corrupted data. regards, tom lane