Thread: segfault on rollback
Hello -hackers, I recently have seen the segfault with Postgres 8.1.4. The situation in which the segfault occur is quite complicated (a lot of commands in one transaction from JDBC), but Here I show the gdb core dump of it. From looking in my application, it seems that the core dump occurs when the application issue the ROLLBACK command in the end of large transaction. (when I do Connection.rollback() from JDBC) Also I must notice that the segfault only occur if log_min_duration_statement is set to 0 if it is set to -1, the segfault do not occur. Also I must say that I specially removed all the C functions calls from the transaction, to be sure that I'm not corrupting the memory myself. Program received signal SIGSEGV, Segmentation fault. 0xb7dbd6db in strlen () from /lib/tls/libc.so.6 (gdb) bt #0 0xb7dbd6db in strlen () from /lib/tls/libc.so.6 #1 0xb7d91ef1 in vfprintf () from /lib/tls/libc.so.6 #2 0xb7db0700 in vsnprintf () from /lib/tls/libc.so.6 #3 0x08179c47 in appendStringInfoVA (str=0xbfd639f0, fmt=0x83d43c0 "duration: %ld.%03ld ms statement: %sEXECUTE %s [PREPARE: %s]", args=0xbfd63a24 "Y") at stringinfo.c:125 #4 0x08290471 in errmsg ( fmt=0x834b0c0 "duration: %ld.%03ld ms statement: %sEXECUTE %s [PREPARE: %s]") at elog.c:647 #5 0x08202bd2 in exec_execute_message (portal_name=0x8421a6c "", max_rows=1) at postgres.c:1825 #6 0x0820498a in PostgresMain (argc=4, argv=0x83d2754, username=0x83d2658 "cas_user_tmp") at postgres.c:3268 #7 0x081d3528 in BackendRun (port=0x83eaed0) at postmaster.c:2856 #8 0x081d2b3f in BackendStartup (port=0x83eaed0) at postmaster.c:2500 #9 0x081d0bc9 in ServerLoop () at postmaster.c:1230 #10 0x081d0474 in PostmasterMain (argc=1, argv=0x83bb000) at postmaster.c:941 #11 0x081821d8 in main (argc=1, argv=0x83bb000) at main.c:265 (gdb) bt full #0 0xb7dbd6db in strlen () from /lib/tls/libc.so.6 No symbol table info available. #1 0xb7d91ef1 in vfprintf () from /lib/tls/libc.so.6 No symbol table info available. #2 0xb7db0700 in vsnprintf () from /lib/tls/libc.so.6 No symbol table info available. #3 0x08179c47 in appendStringInfoVA (str=0xbfd639f0, fmt=0x83d43c0 "duration: %ld.%03ld ms statement: %sEXECUTE %s [PREPARE: %s]", args=0xbfd63a24 "Y") at stringinfo.c:125 avail = 255 nprinted = 256 #4 0x08290471 in errmsg ( fmt=0x834b0c0 "duration: %ld.%03ld ms statement: %sEXECUTE %s [PREPARE: %s]") at elog.c:647 args = 0xbfd63a24 "Y" success = -65 '©' fmtbuf = 0x83d43c0 "duration: %ld.%03ld ms statement:%sEXECUTE %s [PREPARE: %s]" buf = { data = 0x83d42b4 "duration: 89.073 ms statement: EXECUTE <unnamed> [PREPARE: ", '\177' <repeats 138 times>..., len = 0, maxlen = 256, cursor = 0} edata = (ErrorData *) 0x83ab3a0 oldcontext = 0x83d1f00 ---Type <return> to continue, or q <return> to quit--- #5 0x08202bd2 in exec_execute_message (portal_name=0x8421a6c "", max_rows=1) at postgres.c:1825 usecs = 89073 dest = DestRemoteExecute receiver = (DestReceiver *) 0x8421b78 portal = 0x842c114 completed= 1 '\001' completionTag = "ROLLBACK\000:ж©Ф\033\030\bэ:ж©m\032B\b\004\000\000\000%\n \b╓\201?\b\000\001\000\000Х:ж©Ё\031\030\b\000\000\000\001э:ж©\004\000\000\000\224\035\030\b" start_t = {tv_sec = 1155432847,tv_usec = 699143} stop_t = {tv_sec = 1155432847, tv_usec = 788216} save_log_duration = 0 '\0' save_log_min_duration_statement = 0 save_log_statement_stats = 0 '\0' execute_is_fetch = 0 '\0' #6 0x0820498a in PostgresMain (argc=4, argv=0x83d2754, username=0x83d2658 "cas_user_tmp") at postgres.c:3268 portal_name= 0x8421a6c "" max_rows = 1 flag = -1 dbname = 0x83dd198 "cas" userDoption = 0x0 ---Type <return> to continue, or q <return> to quit--- secure = 0 '\0' errs = 0 debug_flag = -1 guc_names = (List *) 0x0 guc_values = (List *) 0x0 ctx = PGC_USERSET gucsource = PGC_S_CLIENT am_superuser = 0 '\0' tmp = 0x83d232c "" firstchar = 69 stack_base = 0 '\0' input_message= {data = 0x8421a6c "", len = 5, maxlen = 256, cursor = 5} local_sigjmp_buf = {{__jmpbuf = {138325712,137883360, 1, -1076478888, -1076479248, 136332736}, __mask_was_saved = 1, __saved_mask = {__val = { 1073347075, 4294967294, 0, 0, 0, 0, 0, 7004, 0, 8, 0, 0, 138225460, 138224272, 138223360, 3218488264, 4294967295,138223360, 3085359680, 3084618868, 138225440, 138224272, 3084378772, 138326168, 3085312352, 200,138249720, 138241528, 3085367328, 137883360, 1, 3218488344}}}} send_rfq = 0 '\0' #7 0x081d3528 in BackendRun (port=0x83eaed0) at postmaster.c:2856 status = 0 remote_host = "127.0.0.1\000ж╥\000ЮФ╥\004\aГ╥\004\aГ╥xAж©bЦщ╥\004\aГ╥\00---Type<return> to continue, or q <return> to quit--- 0\000\000\000hAж©╠\201*\b<\032Г╥\000\000\000\000\004\aГ╥\002\024\000\000\000\024\000\000\000\000\000\000\000ЮФ╥╪\177=\bо]3\b\000ЮФ╥ ХФ╥\000шФ╥hAж©\b\212з╥ ХФ╥\005\000\000\000\000\000\000\000xEж©И╗ш╥\000ЮФ╥ЬAж©дБщ╥\210Aж©\000ЮФ╥\004\aГ╥\224Бв╥ЬAж©bЦщ╥\004\aГ╥\000\000\000\000\002\024\000\000\005\000\000\000<\032Г╥\000\000\000\000\004\aГ╥\003\024\000\000\000\024\000\000\000\000\000\000"... remote_port = "32989\000ж©╥\220У╥ГEш╥FD\a\bЭDж©И╗ш╥\000ЮФ" remote_ps_data = "127.0.0.1(32989)\000╚х\233pq©\233\034чФ╥p<ж©.фУ╥\000\000\000\000\000\000\000\000╦qЖ╥\000\000\000\000\000\000\000\000\003\000\000\000P<ж©ьoЖ╥\000\000\000\000\b©;\bL=ж©\"еУ╥\b©;\b\001\000\000\000дGж©ю!=\b╦\031ж╥Жiж╥╦qЖ╥╪tЖ╥\000\000\000\000дGж©\001\000\000\000\000ЮФ╥юlЖ╥ю!=\bл=ж©O\000Д╥ю!=\b\000\000\000\200\001\000\000\220\000\000\000\000 1\001\000Ю\236>\b\004\000\000\000\003\000\000\000ЬФ╥\000ЮФ╥ ЬФ╥юlЖ╥═=ж©"... av = (char **) 0x83d2754 maxac= 10 ac = 4 protobuf = "-v196608\000<ж©\000\024\000\000ЪЪЪЪ\000\000\000\000ПТ$\234\000\000\000" i = 4 #8 0x081d2b3f in BackendStartup (port=0x83eaed0) at postmaster.c:2500 bn = (Backend *) 0x83dd0c8 pid = 0 #9 0x081d0bc9 in ServerLoop () at postmaster.c:1230 timeout = {tv_sec = 59, tv_usec = 948000} ---Type <return> to continue, or q <return> to quit--- selres = 1 i = 0 port = (Port *) 0x83eaed0 rmask = {fds_bits = {8, 0 <repeats 31 times>}} readmask = {fds_bits = {24, 0 <repeats 31 times>}} nSockets= 5 now = 1155432807 last_touch_time = 1155432686 earlier = {tv_sec = 1155432686, tv_usec =49004} later = {tv_sec = 1155432765, tv_usec = 301709} #10 0x081d0474 in PostmasterMain (argc=1, argv=0x83bb000) at postmaster.c:941 opt = -1 status = 0 userDoption= 0x0 i = 64 #11 0x081821d8 in main (argc=1, argv=0x83bb000) at main.c:265 pw = (struct passwd *) 0xb7f66cc0 pw_name_persist= 0xb7e6e000 "\\}\021" Any ideas ? Or I should try to narrower the problem ?... (it is not very easy for my application)... Initially I thought that the problem can be due to JDBC, but it doesn't seem so... In any case, since JDBC for PG is pure java and is using libpq, Posgres shouldn't crash. Thanks, Regards, Sergey ******************************************************************* Sergey E. Koposov Max Planck Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: math@sai.msu.ru
"Sergey E. Koposov" <math@sai.msu.ru> writes: > I recently have seen the segfault with Postgres 8.1.4. I'm betting that portal->sourceText has already been deallocated when exec_execute_message tries to print the log message. Getting an actual segfault from that would very probably be hard to reproduce, but if you build with --enable-cassert it should not be too hard to reproduce corruption of the log message, ie, display of garbage instead of "ROLLBACK" as the command text. Please try that and see if you can generate a self-contained test case. regards, tom lane
On Sat, 12 Aug 2006, Tom Lane wrote: > "Sergey E. Koposov" <math@sai.msu.ru> writes: >> I recently have seen the segfault with Postgres 8.1.4. > > I'm betting that portal->sourceText has already been deallocated when > exec_execute_message tries to print the log message. Getting an actual > segfault from that would very probably be hard to reproduce, but if you > build with --enable-cassert it should not be too hard to reproduce > corruption of the log message, ie, display of garbage instead of > "ROLLBACK" as the command text. Please try that and see if you can > generate a self-contained test case. I succeeded to get the self contained case, but in java (it is just 15 lines, but it crashes the backend). (probably I should forward the mail pgsql-jdbc...) import java.sql.*; public class xx { public static void main(String args[]) throws Exception { Class.forName("org.postgresql.Driver"); Connection dbcon = DriverManager.getConnection("jdbc:postgresql://localhost:5432/template1","postgres",""); dbcon.setAutoCommit(false); Statement stmt = dbcon.createStatement(); stmt.execute("create table xx(a int, bdouble precision)"); dbcon.rollback(); } } I'm not sure that it is possible to reproduce without java, since JDBC do its own query preparing and a lot of other stuff internally which should be non trivial to reproduce without jdbc... Regards, Sergey ******************************************************************* Sergey E. Koposov Max Planck Institute for Astronomy/Sternberg Astronomical Institute Tel: +49-6221-528-349 Web: http://lnfm1.sai.msu.ru/~math E-mail: math@sai.msu.ru
"Sergey E. Koposov" <math@sai.msu.ru> writes: > I'm also sending The output from strace (during the execution of the test > program which I have sent before) showing what jdbc is sending to the backend. Great, I was just going to ask for that. I can reproduce it in HEAD pretty trivially using libpq: res = PQexecParams(conn, "BEGIN", 0, /* no params */ NULL, /* let the backend deduce param type */ NULL, NULL, /* don't needparam lengths since text */ NULL, /* default to all text params */ 0); /* ask for text results */ if (PQresultStatus(res) != PGRES_COMMAND_OK) { fprintf(stderr, "BEGIN failed: %s", PQerrorMessage(conn)); PQclear(res); exit_nicely(conn); } printf("cmd status = '%s'\n", PQcmdStatus(res)); PQclear(res); res = PQexecParams(conn, "ROLLBACK", 0, /* no params */ NULL, /* let the backend deduce param type */ NULL, NULL, /* don'tneed param lengths since text */ NULL, /* default to all text params */ 0); /* ask for text results */ if (PQresultStatus(res) != PGRES_COMMAND_OK) { fprintf(stderr, "ROLLBACK failed: %s", PQerrorMessage(conn)); PQclear(res); exit_nicely(conn); } printf("cmd status = '%s'\n", PQcmdStatus(res)); PQclear(res); It turns out that COMMIT dies just the same way. The problem is that any transaction-exiting command destroys portals at the finish_xact_command step, but the log_duration code is expecting the portal to still be there afterwards. We could "fix" this by emitting the log message before calling finish_xact_command, but I think that would result in seriously underreporting the time required for a COMMIT. Probably the right fix is to copy the data we might need out of the Portal before committing. Or, if people don't like the overhead for that, we could reduce the amount of info included in the log message ... but probably that would make it unhelpful :-( regards, tom lane
I wrote: > We could "fix" this by emitting the log message before calling > finish_xact_command, but I think that would result in seriously > underreporting the time required for a COMMIT. Probably the right fix > is to copy the data we might need out of the Portal before committing. Patch applied. This reminds me that even though the V3-protocol code has been in there since 7.4, we don't have nearly complete test coverage on it. It'd be a good idea to have some basic tests included somewhere in the standard or contrib regression tests. Any thoughts about an easy way to do it? regards, tom lane
Tom Lane wrote: > I wrote: > > We could "fix" this by emitting the log message before calling > > finish_xact_command, but I think that would result in seriously > > underreporting the time required for a COMMIT. Probably the right fix > > is to copy the data we might need out of the Portal before committing. > > Patch applied. > > This reminds me that even though the V3-protocol code has been in there > since 7.4, we don't have nearly complete test coverage on it. It'd be a > good idea to have some basic tests included somewhere in the standard or > contrib regression tests. Any thoughts about an easy way to do it? Do we have anything on core which speaks the v3 protocol? Since libpq doesn't expose functions to do all the stuff the v3 protocol can do, it would be difficult. Maybe the Python project on pgFoundry (I don't think it uses libpq, does it?), or maybe JDBC, can help here. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> This reminds me that even though the V3-protocol code has been in there >> since 7.4, we don't have nearly complete test coverage on it. It'd be a >> good idea to have some basic tests included somewhere in the standard or >> contrib regression tests. Any thoughts about an easy way to do it? > Do we have anything on core which speaks the v3 protocol? Since libpq > doesn't expose functions to do all the stuff the v3 protocol can do, it > would be difficult. libpq doesn't expose everything in v3, but it exposes quite a lot --- and will expose more once the Describe patch gets in. I think the main thing you still can't do is issue a Bind as a separate operation from an Execute, and that's not something I feel is real important for testing the backend. The main problem is to have something that actually calls the V3-protocol functions, which psql doesn't. I was thinking of suggesting that even making psql use PQexecParams instead of PQexec would help for testing purposes. But from a raw performance point of view that's probably a loss; also it would be the final blow to using a modern psql to connect to a pre-7.4 server. regards, tom lane
On Sun, Aug 13, 2006 at 07:01:13PM -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Tom Lane wrote: > >> This reminds me that even though the V3-protocol code has been in there > >> since 7.4, we don't have nearly complete test coverage on it. It'd be a > >> good idea to have some basic tests included somewhere in the standard or > >> contrib regression tests. Any thoughts about an easy way to do it? > > > Do we have anything on core which speaks the v3 protocol? Since libpq > > doesn't expose functions to do all the stuff the v3 protocol can do, it > > would be difficult. > > libpq doesn't expose everything in v3, but it exposes quite a lot --- > and will expose more once the Describe patch gets in. I think the main > thing you still can't do is issue a Bind as a separate operation from an > Execute, and that's not something I feel is real important for testing > the backend. The main problem is to have something that actually calls > the V3-protocol functions, which psql doesn't. > > I was thinking of suggesting that even making psql use PQexecParams > instead of PQexec would help for testing purposes. But from a raw > performance point of view that's probably a loss; also it would be the > final blow to using a modern psql to connect to a pre-7.4 server. How much cruft would it add to psql to allow use of V3 as an option? Presumably no one would use it in real life, but it would make it easier to test things. If it's not too invasive to psql it'd be a win to go that route over adding extra code just to test V3. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461