Thread: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
"invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Frank van Vugt
Date:
L.S. Situation: - PostgreSQL v8.0.0beta5 running on x86 hardware - a database schema that uses before/after/deferred triggers to do additional processing on stuff like orders and invoices based on the order- and invoice--lines; the deferred triggers are basically of the format 'update order set ..... from (select aggregates from order_line where order_id = ...) where id = ...' and use an updated_on field to make sure they do the necessary work only once The program causing the problem is a conversion tool that fills a database thus firing the various triggers. This process is fairly lengthy (couple of hours) and this morning I noticed that the conversion tool bailed out due to a failed transaction commit, the log mentions: PL/pgSQL function "tr_purchaseorder_def" line 42 at SQL statement ERROR: invalid memory alloc request size 3248857576 The statement on line 42 is the earlier mentioned update statement. Obviously, since the backend keeps running, it won't be easy to get my hands on the backtrace from when this error occurs and I'm wondering about the best way to proceed in order to produce some helpfull input to the developers. In order to find out whether this was some hardware glitch, I immediately restarted the process and just now it ended with exactly the same error on exactly the same spot. Any hints on how to dig deeper ? -- Best, Frank.
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Tom Lane
Date:
Frank van Vugt <ftm.van.vugt@foxi.nl> writes: > ERROR: invalid memory alloc request size 3248857576 > Obviously, since the backend keeps running, it won't be easy to get my hands > on the backtrace from when this error occurs and I'm wondering about the best > way to proceed in order to produce some helpfull input to the developers. Provide a reproducible test case maybe? Also, building with --enable-cassert might help track it down. regards, tom lane
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Frank van Vugt
Date:
> > I'm wondering > > about the best way to proceed in order to produce some helpfull input to > > the developers. > > Provide a reproducible test case maybe? Sorry, won't be that easy. I didn't encounter the problem with earlier (smaller) data-sets and even if I would put in the hours needed in order to strip the schema and change the conversion tool both with the uncertainty that I'd be reproducing the proper sequence of (trigger) events, I wouldn't be able to provide the dataset itself (i.e. non-disclosure) > Also, building with --enable-cassert might help track it down. Will do that for starters. Hold on, I thought that this error message would be generated from numerous places in the code but it isn't. I'll recompile with debugging enabled and put breakpoints on the appropriate lines to produce a backtrace. <some work is done> Ok, using --enable-cassert might not be such a good idea for now, since it triggers a trap very early on in the conversion: TRAP: FailedAssertion("!(((ntp)->t_data)->t_infomask & 0x0010)", File: "catcache.c", Line: 1729) So for now I have disabled assertions again and will focus on creating the backtrace for the memory alloc problem first. When needed / wanted I can always try and go over any assertion failures later. Will be able to post a backtrace in a few hours, I hope. -- Best, Frank.
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Tom Lane
Date:
Frank van Vugt <ftm.van.vugt@foxi.nl> writes: > Ok, using --enable-cassert might not be such a good idea for now, since it > triggers a trap very early on in the conversion: > TRAP: FailedAssertion("!(((ntp)->t_data)->t_infomask & 0x0010)", File: > "catcache.c", Line: 1729) [ raised eyebrow... ] That should definitely not be happening either. In fact, I find that considerably more surprising than your original report. I'd suggest chasing this first. regards, tom lane
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Frank van Vugt
Date:
> [ raised eyebrow... ] That should definitely not be happening either. > In fact, I find that considerably more surprising than your original > report. I'd suggest chasing this first. It was already well on its way, so.... Here's what's going on when the memory alloc problem occurs: (gdb) b mcxt.c:502 Breakpoint 1 at 0x81c5667: file mcxt.c, line 502. (gdb) b mcxt.c:523 Breakpoint 2 at 0x81c56af: file mcxt.c, line 523. (gdb) b mcxt.c:548 Breakpoint 3 at 0x81c572f: file mcxt.c, line 548. (gdb) b mcxt.c:612 Breakpoint 4 at 0x81c57ab: file mcxt.c, line 612. (gdb) cont Continuing. Breakpoint 1, MemoryContextAlloc (context=0x86a6fa8, size=3248857576) at mcxt.c:502 502 elog(ERROR, "invalid memory alloc request size %lu", (gdb) bt #0 MemoryContextAlloc (context=0x86a6fa8, size=3248857576) at mcxt.c:502 #1 0x81cb6ad in CopySnapshot (snapshot=0x8735038) at tqual.c:1205 #2 0x8101414 in postquel_start (es=0x87b52f8, fcache=0x86aa990) at functions.c:317 #3 0x810175f in postquel_execute (es=0x87b52f8, fcinfo=0xbfffe498, fcache=0x86aa990) at functions.c:464 #4 0x8101998 in fmgr_sql (fcinfo=0xbfffe498) at functions.c:633 #5 0x80fd261 in ExecMakeFunctionResult (fcache=0x86aa7d8, econtext=0x86aa920, isNull=0xbfffe5d3 "", isDone=0x0) at execQual.c:1038 #6 0x80fd9c1 in ExecEvalFunc (fcache=0x86aa7d8, econtext=0x86aa920, isNull=0xbfffe5d3 "", isDone=0x0) at execQual.c:1455 #7 0x80feee1 in ExecEvalExprSwitchContext (expression=0x86aa7d8, econtext=0x86aa920, isNull=0xbfffe5d3 "", isDone=0x0) at execQual.c:2708 #8 0x8134cf3 in evaluate_expr (expr=0x86f1388, result_type=23) at clauses.c:2390 #9 0x81345c7 in evaluate_function (funcid=8615860, result_type=23, args=0x86f1370, func_tuple=0x417d98b0, context=0xbfffe788) at clauses.c:2007 #10 0x81344cc in simplify_function (funcid=8615860, result_type=23, args=0x86f1370, allow_inline=1, context=0xbfffe788) at clauses.c:1913 #11 0x8133c2d in eval_const_expressions_mutator (node=0x87894b0, context=0xbfffe788) at clauses.c:1245 #12 0x81359e3 in expression_tree_mutator (node=0x87881c8, mutator=0x8133b6c <eval_const_expressions_mutator>, context=0xbfffe788) at clauses.c:3198 #13 0x8134060 in eval_const_expressions_mutator (node=0x87881a0, context=0xbfffe788) at clauses.c:1574 #14 0x8135959 in expression_tree_mutator (node=0x86f05c8, mutator=0x8133b6c <eval_const_expressions_mutator>, context=0xbfffe788) at clauses.c:3163 #15 0x8134343 in eval_const_expressions_mutator (node=0x86f05c8, context=0xbfffe788) at clauses.c:1756 #16 0x813597f in expression_tree_mutator (node=0x86f1058, mutator=0x8133b6c <eval_const_expressions_mutator>, context=0xbfffe788) at clauses.c:3184 #17 0x8134343 in eval_const_expressions_mutator (node=0x86f1058, context=0xbfffe788) at clauses.c:1756 #18 0x8133b49 in eval_const_expressions (node=0x86f1058) at clauses.c:1152 #19 0x812d925 in preprocess_expression (parse=0x8734248, expr=0x86f1058, kind=1) at planner.c:415 #20 0x812d6e5 in subquery_planner (parse=0x8734248, tuple_fraction=0) at planner.c:240 #21 0x812d5bd in planner (parse=0x8734248, isCursor=0 '\000', cursorOptions=0, boundParams=0x0) at planner.c:129 #22 0x815bb4f in pg_plan_query (querytree=0x8734248, boundParams=0x0) at postgres.c:647 #23 0x815bbeb in pg_plan_queries (querytrees=0x86f10c8, boundParams=0x0, needSnapshot=0 '\000') at postgres.c:715 #24 0x810b428 in _SPI_prepare_plan ( src=0x87c6a08 "UPDATE purchaseorder SET price_total_val = total_val, valuta_id = total_valuta_id, status_id = CASE WHEN count_lines = 0 OR count_pol_quotation > 0 THEN po_stat('PO_QUOTATION') WHEN count_pol_ontransp"..., plan=0xbfffe8a4) at spi.c:1276 #25 0x810a291 in SPI_prepare ( src=0x87c6a08 "UPDATE purchaseorder SET price_total_val = total_val, valuta_id = total_valuta_id, status_id = CASE WHEN count_lines = 0 OR count_pol_quotation > 0 THEN po_stat('PO_QUOTATION') WHEN count_pol_ontransp"..., nargs=1, argtypes=0x87b3480) at spi.c:407 #26 0x41500e02 in ?? () #27 0x41500fce in ?? () #28 0x414ffc2e in ?? () #29 0x414ffa4b in ?? () #30 0x414ff98b in ?? () #31 0x414ff36f in ?? () #32 0x414fbecc in ?? () #33 0x80ebadc in ExecCallTriggerFunc (trigdata=0xbfffec10, finfo=0x422da3b0, per_tuple_context=0x8769ba0) at trigger.c:1149 #34 0x80ec860 in AfterTriggerExecute (event=0x41b84b38, rel=0x418097f0, trigdesc=0x422d8ef0, finfo=0x422da200, per_tuple_context=0x8769ba0) at trigger.c:1979 #35 0x80eca7f in afterTriggerInvokeEvents (events=0x87053c8, firing_id=9977, delete_ok=1 '\001') at trigger.c:2175 #36 0x80ecc66 in AfterTriggerEndXact () at trigger.c:2376 #37 0x8090806 in CommitTransaction () at xact.c:1450 #38 0x8090c6d in CommitTransactionCommand () at xact.c:1938 #39 0x815ceba in finish_xact_command () at postgres.c:1843 #40 0x815bebf in exec_simple_query (query_string=0x82ebbd0 "COMMIT") at postgres.c:965 #41 0x815e0db in PostgresMain (argc=4, argv=0x82a5478, username=0x82a5448 "megaconv") at postgres.c:2986 #42 0x813b85e in BackendRun (port=0x82b8538) at postmaster.c:2817 #43 0x813b121 in BackendStartup (port=0x82b8538) at postmaster.c:2453 #44 0x81399b4 in ServerLoop () at postmaster.c:1198 #45 0x81394c5 in PostmasterMain (argc=3, argv=0x82a4038) at postmaster.c:917 #46 0x8111e11 in main (argc=3, argv=0xbffff914) at main.c:268 #47 0x400d4577 in __libc_start_main () from /lib/libc.so.6 In the mean time, I have compiled postgres on another machine (different version of slackware) with assertions on, but being a fresh install I needed to re-run the first phase of conversion which basically is a copy tables / create indexes loop. However, during the read of *some* of the tables I see this : Program received signal SIGUSR1, User defined signal 1. 0x40109ac1 in kill () from /lib/libc.so.6 (gdb) where #0 0x40109ac1 in kill () from /lib/libc.so.6 #1 0x0818e9e1 in SendPostmasterSignal (reason=PMSIGNAL_PASSWORD_CHANGE) at pmsignal.c:69 #2 0x081906a8 in SIInsertDataEntry (segP=0xdbd, data=0x837aa94) at sinvaladt.c:229 #3 0x0818f7e2 in SendSharedInvalidMessage (msg=0x837aa94) at sinval.c:122 #4 0x081ffd18 in ProcessInvalidationMessages (hdr=0x837a73c, func=0x818f7c0 <SendSharedInvalidMessage>) at inval.c:344 #5 0x082001d6 in AtEOXact_Inval (isCommit=1 '\001') at inval.c:674 #6 0x080a2548 in CommitTransaction () at xact.c:1543 #7 0x080a2af5 in CommitTransactionCommand () at xact.c:1938 #8 0x0819caef in finish_xact_command () at postgres.c:1843 #9 0x0819b795 in exec_simple_query (query_string=0x834ee8c "create index lijst03_table_idx15 on lijst03_table(fn16)") at postgres.c:965 #10 0x0819dce0 in PostgresMain (argc=4, argv=0x8308b0c, username=0x8308adc "megaconv") at postgres.c:2986 #11 0x08171f31 in BackendRun (port=0x83177f0) at postmaster.c:2817 #12 0x08171991 in BackendStartup (port=0x83177f0) at postmaster.c:2453 #13 0x0816ff6f in ServerLoop () at postmaster.c:1198 #14 0x0816f4a3 in PostmasterMain (argc=3, argv=0x8307788) at postmaster.c:917 #15 0x0813d22d in main (argc=3, argv=0x8307788) at main.c:268 #16 0x400f5d06 in __libc_start_main () from /lib/libc.so.6 I'm able to issue 'cont' to move on, and it's not like the server load is too high or something. Will proceed to the first assertion TRAP mentioned earlier. -- Best, Frank.
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Tom Lane
Date:
Frank van Vugt <ftm.van.vugt@foxi.nl> writes: > Here's what's going on when the memory alloc problem occurs: AFAICS this can only be explained as a memory stomp on the static data structures SerializableSnapshotData or LatestSnapshotData. That seems a tad improbable. I wonder if what you have is a build problem. Have you tried a full "make distclean", configure, rebuild from scratch? regards, tom lane
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Frank van Vugt
Date:
> > Here's what's going on when the memory alloc problem occurs: > AFAICS this can only be explained as a memory stomp on the static data > structures SerializableSnapshotData or LatestSnapshotData. That seems a > tad improbable. I wonder if what you have is a build problem. Have you > tried a full "make distclean", configure, rebuild from scratch? Yep, I always do a distclean when changing configuration options, I've been bitten before ;) Mind you, I am using the earlier discussed patch to avoid the glibc bug : ****************************************************** $> more oracle_compat.diff --- src/backend/utils/adt/oracle_compat.c_orig Mon Nov 29 12:25:20 2004 +++ src/backend/utils/adt/oracle_compat.c Mon Nov 29 12:26:49 2004 @@ -43,7 +43,7 @@ * We assume if we have these two functions, we have their friends too, and * can use the wide-character method. */ -#if defined(HAVE_WCSTOMBS) && defined(HAVE_TOWLOWER) +#if defined(HAVE_WCSTOMBS) && defined(HAVE_TOWLOWER) && FALSE #define USE_WIDE_UPPER_LOWER #endif ****************************************************** I could reproduce this case tomorrow and inspect some variables or call abort() if you think that could be of any help? As far as the other problem is concerned, I just finished the first half of the conversion on a different machine that was build with assertions enabled and saw a number of reports like the ones below. What's causing these? I'll start the second half in a minute, which should reproduce the TRAP pretty soon. Will post a backtrace of that asap as well. Program received signal SIGUSR1, User defined signal 1. 0x40109ac1 in kill () from /lib/libc.so.6 (gdb) where #0 0x40109ac1 in kill () from /lib/libc.so.6 #1 0x0818e9e1 in SendPostmasterSignal (reason=PMSIGNAL_PASSWORD_CHANGE) at pmsignal.c:69 #2 0x081906a8 in SIInsertDataEntry (segP=0xdbd, data=0x83e4af8) at sinvaladt.c:229 #3 0x0818f7e2 in SendSharedInvalidMessage (msg=0x83e4af8) at sinval.c:122 #4 0x081ffcd8 in ProcessInvalidationMessages (hdr=0x83dd6a4, func=0x818f7c0 <SendSharedInvalidMessage>) at inval.c:343 #5 0x082001d6 in AtEOXact_Inval (isCommit=1 '\001') at inval.c:674 #6 0x080a2548 in CommitTransaction () at xact.c:1543 #7 0x080a2af5 in CommitTransactionCommand () at xact.c:1938 #8 0x0819caef in finish_xact_command () at postgres.c:1843 #9 0x0819b795 in exec_simple_query (query_string=0x834ee8c "drop table lijst45_table") at postgres.c:965 #10 0x0819dce0 in PostgresMain (argc=4, argv=0x8308b0c, username=0x8308adc "megaconv") at postgres.c:2986 #11 0x08171f31 in BackendRun (port=0x83177f0) at postmaster.c:2817 #12 0x08171991 in BackendStartup (port=0x83177f0) at postmaster.c:2453 #13 0x0816ff6f in ServerLoop () at postmaster.c:1198 #14 0x0816f4a3 in PostmasterMain (argc=3, argv=0x8307788) at postmaster.c:917 #15 0x0813d22d in main (argc=3, argv=0x8307788) at main.c:268 #16 0x400f5d06 in __libc_start_main () from /lib/libc.so.6 Program received signal SIGUSR1, User defined signal 1. 0x40109ac1 in kill () from /lib/libc.so.6 (gdb) where #0 0x40109ac1 in kill () from /lib/libc.so.6 #1 0x0818e9e1 in SendPostmasterSignal (reason=PMSIGNAL_PASSWORD_CHANGE) at pmsignal.c:69 #2 0x081906a8 in SIInsertDataEntry (segP=0xdbd, data=0x8408e20) at sinvaladt.c:229 #3 0x0818f7e2 in SendSharedInvalidMessage (msg=0x8408e20) at sinval.c:122 #4 0x081ffcd8 in ProcessInvalidationMessages (hdr=0x8407844, func=0x818f7c0 <SendSharedInvalidMessage>) at inval.c:343 #5 0x082001d6 in AtEOXact_Inval (isCommit=1 '\001') at inval.c:674 #6 0x080a2548 in CommitTransaction () at xact.c:1543 #7 0x080a2af5 in CommitTransactionCommand () at xact.c:1938 #8 0x0819caef in finish_xact_command () at postgres.c:1843 #9 0x0819b795 in exec_simple_query ( query_string=0x834f0a4 "create table lijst55_table(invoice_id int,serial_id int,payment_cond text,old_debtor_id text,debtor_descr text,korting numeric,magazijncode int,pakbon_nr text,totaal_bruto numeric,totaal_netto numeric"...) at postgres.c:965 #10 0x0819dce0 in PostgresMain (argc=4, argv=0x8308b0c, username=0x8308adc "megaconv") at postgres.c:2986 #11 0x08171f31 in BackendRun (port=0x83177f0) at postmaster.c:2817 #12 0x08171991 in BackendStartup (port=0x83177f0) at postmaster.c:2453 #13 0x0816ff6f in ServerLoop () at postmaster.c:1198 #14 0x0816f4a3 in PostmasterMain (argc=3, argv=0x8307788) at postmaster.c:917 #15 0x0813d22d in main (argc=3, argv=0x8307788) at main.c:268 #16 0x400f5d06 in __libc_start_main () from /lib/libc.so.6 (gdb) Program received signal SIGUSR1, User defined signal 1. 0x40109ac1 in kill () from /lib/libc.so.6 (gdb) where #0 0x40109ac1 in kill () from /lib/libc.so.6 #1 0x0818e9e1 in SendPostmasterSignal (reason=PMSIGNAL_PASSWORD_CHANGE) at pmsignal.c:69 #2 0x081906a8 in SIInsertDataEntry (segP=0xdbd, data=0x842bb40) at sinvaladt.c:229 #3 0x0818f7e2 in SendSharedInvalidMessage (msg=0x842bb40) at sinval.c:122 #4 0x081ffcd8 in ProcessInvalidationMessages (hdr=0x842514c, func=0x818f7c0 <SendSharedInvalidMessage>) at inval.c:343 #5 0x082001d6 in AtEOXact_Inval (isCommit=1 '\001') at inval.c:674 #6 0x080a2548 in CommitTransaction () at xact.c:1543 #7 0x080a2af5 in CommitTransactionCommand () at xact.c:1938 #8 0x0819caef in finish_xact_command () at postgres.c:1843 #9 0x0819b795 in exec_simple_query (query_string=0x834ee8c "drop table lijst61_table") at postgres.c:965 #10 0x0819dce0 in PostgresMain (argc=4, argv=0x8308b0c, username=0x8308adc "megaconv") at postgres.c:2986 #11 0x08171f31 in BackendRun (port=0x83177f0) at postmaster.c:2817 #12 0x08171991 in BackendStartup (port=0x83177f0) at postmaster.c:2453 #13 0x0816ff6f in ServerLoop () at postmaster.c:1198 #14 0x0816f4a3 in PostmasterMain (argc=3, argv=0x8307788) at postmaster.c:917 #15 0x0813d22d in main (argc=3, argv=0x8307788) at main.c:268 #16 0x400f5d06 in __libc_start_main () from /lib/libc.so.6 (gdb) cont -- Best, Frank.
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Alvaro Herrera
Date:
On Fri, Dec 03, 2004 at 01:03:43AM +0100, Frank van Vugt wrote: > As far as the other problem is concerned, I just finished the first half of > the conversion on a different machine that was build with assertions enabled > and saw a number of reports like the ones below. What's causing these? This happens when a transaction finishes that creates, drops, or modifies a user. If it happens at a different time, something is wrong. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "Oh, great altar of passive entertainment, bestow upon me thy discordant images at such speed as to render linear thought impossible" (Calvin a la TV)
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Tom Lane
Date:
Frank van Vugt <ftm.van.vugt@foxi.nl> writes: > I could reproduce this case tomorrow and inspect some variables or call > abort() if you think that could be of any help? Do you have hardware watchpoints on that machine? What I'd be inclined to try is tracing through the transaction to see where SerializableSnapshotData.xcnt or LatestSnapshotData.xcnt gets clobbered from. It'd be mighty tedious if you have to fall back on a software watchpoint though. Also, I gather from the xact.c traceback that the transaction looks like begin; savepoint a; ... commit; -- without releasing the savepoint Does anything happen before the SAVEPOINT? Now that you see which plpgsql function is failing, do you have a better shot at making a self-contained example? regards, tom lane
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Tom Lane
Date:
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Fri, Dec 03, 2004 at 01:03:43AM +0100, Frank van Vugt wrote: >> As far as the other problem is concerned, I just finished the first half of >> the conversion on a different machine that was build with assertions enabled >> and saw a number of reports like the ones below. What's causing these? > This happens when a transaction finishes that creates, drops, or > modifies a user. If it happens at a different time, something is wrong. Given that the call site is SIInsertDataEntry, it's clear that SendPostmasterSignal is actually being told PMSIGNAL_WAKEN_CHILDREN ... the gdb traceback is reporting the wrong value for the parameter, which is no big surprise for values in registers. Which makes this a perfectly normal event; it's just trying to flush the SI messaging queue before it overflows. Nothing to see here, move along. regards, tom lane
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Frank van Vugt
Date:
> > #1 0x0818e9e1 in SendPostmasterSignal (reason=PMSIGNAL_PASSWORD_CHANGE) > > at pmsignal.c:69 > > > > As far as the other problem is concerned, I just finished the first half > > of the conversion on a different machine that was build with assertions > > enabled and saw a number of reports like the ones below. What's causing > > these? > > This happens when a transaction finishes that creates, drops, or > modifies a user. If it happens at a different time, something is wrong. The code for the first part of the conversion does not contain any statements for create/alter/drop user, it doesn't even fire triggers. Basically it's just looping over the following set of statements for a number of textfiles: - read, validate and post-process the contents of a text file - write the result into another text file - (try to) drop the table - create the table - grant select on this table to public - copy table from a textfile That's it, so there is no explicit user-handling at all. Anything particular I can take a look at? -- Best, Frank.
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Frank van Vugt
Date:
> > I could reproduce this case tomorrow and inspect some variables or call > > abort() if you think that could be of any help? > > Do you have hardware watchpoints on that machine? Mmm, don't know for sure, will have a look tomorrow. > What I'd be inclined > to try is tracing through the transaction to see where > SerializableSnapshotData.xcnt or LatestSnapshotData.xcnt gets clobbered > from. It'd be mighty tedious if you have to fall back on a software > watchpoint though. A hint on the expression(s) to use would be appreciated ! > Also, I gather from the xact.c traceback that the transaction looks like > > begin; > savepoint a; > ... > commit; -- without releasing the savepoint > > Does anything happen before the SAVEPOINT? Actually, the SQL involved is v7.4.6, so there are no savepoints to begin with (or arguably, there's nothing 'before' a savepoint ;)) > Now that you see which plpgsql function is failing, do you have a better > shot at making a self-contained example? Not really, but if tracing the transaction won't reveal anything else I guess I will have no choice..... ;-( -- Best, Frank.
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Frank van Vugt
Date:
Ok, here are the results of the assertion TRAP (mind you, same error on the same line, but a different machine than the original) database log: TRAP: FailedAssertion("!(((ntp)->t_data)->t_infomask & 0x0010)", File: "catcache.c", Line: 1729) gdb: Program received signal SIGABRT, Aborted. 0x40109ac1 in kill () from /lib/libc.so.6 (gdb) where #0 0x40109ac1 in kill () from /lib/libc.so.6 #1 0x401096eb in raise () from /lib/libc.so.6 #2 0x4010b127 in abort () from /lib/libc.so.6 #3 0x08208036 in ExceptionalCondition (conditionName=0x40212d24 "8,\023", errorType=0x6 <Address 0x6 out of bounds>, fileName=0x0, lineNumber=1729) at assert.c:51 #4 0x081ff88f in build_dummy_tuple (cache=0x41469048, nkeys=1, skeys=0xbfffe410) at catcache.c:1735 #5 0x081fe917 in SearchCatCache (cache=0x41469048, v1=0, v2=0, v3=0, v4=0) at catcache.c:1240 #6 0x08173908 in pgstat_vacuum_tabstat () at pgstat.c:817 #7 0x081156d9 in vacuum (vacstmt=0x8342b5c) at vacuum.c:272 #8 0x0819f7cc in PortalRunUtility (portal=0x835bf2c, query=0x8342bb8, dest=0x8342a3c, completionTag=0xbfffebd0 "") at pquery.c:934 #9 0x0819fb01 in PortalRunMulti (portal=0x835bf2c, dest=0x8342a3c, altdest=0x8342a3c, completionTag=0xbfffebd0 "") at pquery.c:1001 #10 0x0819f174 in PortalRun (portal=0x835bf2c, count=2147483647, dest=0x8342a3c, altdest=0x8342a3c, completionTag=0xbfffebd0 "") at pquery.c:617 #11 0x0819b760 in exec_simple_query (query_string=0x8342824 "vacuum full analyse ean_available") at postgres.c:933 #12 0x0819dce0 in PostgresMain (argc=4, argv=0x8308b04, username=0x8308adc "vugtf") at postgres.c:2986 #13 0x08171f31 in BackendRun (port=0x83177f0) at postmaster.c:2817 #14 0x08171991 in BackendStartup (port=0x83177f0) at postmaster.c:2453 #15 0x0816ff6f in ServerLoop () at postmaster.c:1198 #16 0x0816f4a3 in PostmasterMain (argc=3, argv=0x8307788) at postmaster.c:917 #17 0x0813d22d in main (argc=3, argv=0x8307788) at main.c:268 #18 0x400f5d06 in __libc_start_main () from /lib/libc.so.6 Obviously, continuing was not an option ;) -- Best, Frank.
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Tom Lane
Date:
Frank van Vugt <ftm.van.vugt@foxi.nl> writes: >> What I'd be inclined >> to try is tracing through the transaction to see where >> SerializableSnapshotData.xcnt or LatestSnapshotData.xcnt gets clobbered >> from. It'd be mighty tedious if you have to fall back on a software >> watchpoint though. > A hint on the expression(s) to use would be appreciated ! (gdb) watch SerializableSnapshotData.xcnt Hardware watchpoint 1: SerializableSnapshotData.xcnt (gdb) watch LatestSnapshotData.xcnt Hardware watchpoint 2: LatestSnapshotData.xcnt (gdb) cont Continuing. The only "authorized" place to set these is the assignment near the bottom of GetSnapshotData(), which would normally get hit about once per command. regards, tom lane
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Tom Lane
Date:
Frank van Vugt <ftm.van.vugt@foxi.nl> writes: >> Does anything happen before the SAVEPOINT? > Actually, the SQL involved is v7.4.6, Not with that backtrace, it isn't. regards, tom lane
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Frank van Vugt
Date:
> >> Does anything happen before the SAVEPOINT? > > > > Actually, the SQL involved is v7.4.6, > > Not with that backtrace, it isn't. Excuse me? I'm running v8.0.0beta5, but the sql-statements that are in my source-tree are all still 'based on v7.4.6', i.e. there is no occurence of the word 'savepoint' in the whole tree. -- Best, Frank.
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Frank van Vugt
Date:
> > A hint on the expression(s) to use would be appreciated ! > (gdb) watch SerializableSnapshotData.xcnt I could have thought of that ;) (I misinterpreted what you meant by SerializableSnapshotData.xcnt, wouldn't have asked otherwise) Anyway, I just confirmed that hardware watchpoints are available, so I'll follow this up first thing in the morning. -- Best, Frank.
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Tom Lane
Date:
Frank van Vugt <ftm.van.vugt@foxi.nl> writes: >>>> Does anything happen before the SAVEPOINT? > >>>> Actually, the SQL involved is v7.4.6, >> >> Not with that backtrace, it isn't. > Excuse me? > I'm running v8.0.0beta5, Oh, I thought you meant the backend was 7.4.6. > but the sql-statements that are in my source-tree are > all still 'based on v7.4.6', i.e. there is no occurence of the word > 'savepoint' in the whole tree. Hmm. The line number referenced for CommitTransactionCommand() is definitely in code that is only entered after a savepoint, but I see it is an identical code path to some other arms of the switch() command. Probably the compiler decided to merge those arms. You might want to back off the compiler optimization level a step so you can get more readable tracebacks ... regards, tom lane
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Frank van Vugt
Date:
> Hmm. The line number referenced for CommitTransactionCommand() is > definitely in code that is only entered after a savepoint, but I see > it is an identical code path to some other arms of the switch() > command. Probably the compiler decided to merge those arms. You might > want to back off the compiler optimization level a step so you can get > more readable tracebacks ... Yup, will do that as well. Will read any comments you may have on the TRAP backtrace in a couple of hours, need to take myself offline for a while ;) -- Best, Frank.
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Tom Lane
Date:
Frank van Vugt <ftm.van.vugt@foxi.nl> writes: > Ok, here are the results of the assertion TRAP (mind you, same error on the > same line, but a different machine than the original) That's another can't-happen problem. As best I can tell, this suggests that the tuple descriptor associated with the RELOID catalog cache has gotten clobbered (it should have the HASOID bit set, but an Assert that says it does is failing). Either you've found some code path that clobbers random bits of memory, or you've got flaky hardware. But the same error on two different machines makes the second theory a bit hard to swallow. What's the hardware platform exactly? Is it 64-bit, or anything at all off the beaten path? regards, tom lane
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Tom Lane
Date:
Frank van Vugt <ftm.van.vugt@foxi.nl> writes: > Meanwhile, one of the application developers here bumped into a way to > reproduce what looks like the same memory alloc problem (exactly the same > point in exactly the same trigger) using our application software > only, Oh good. Can you construct a self-contained test case then? > Here are both the query-set and the corresponding backtrace. The query set's not very interesting without a database to try it against :-( > I then got a firm set of results, all of which were looking like this: > Hardware access (read/write) watchpoint 1: SerializableSnapshotData.xcnt > Value = 1 > Hardware access (read/write) watchpoint 2: LatestSnapshotData.xcnt > Value = 1 > All were located at sinval.c:888 This is the expected case. The failure in CopySnapshot has got to indicate that somebody set one or the other field to some bizarrely large value, though. I take it you didn't run the watchpointed backend far enough to get the memory-alloc error? regards, tom lane
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Frank van Vugt
Date:
> > Meanwhile, one of the application developers here bumped into a way to > > reproduce what looks like the same memory alloc problem (exactly the same > > point in exactly the same trigger) using our application software > > only, > > Oh good. Can you construct a self-contained test case then? Will try to do just that when the rest won't work. > > All were located at sinval.c:888 > > This is the expected case. The failure in CopySnapshot has got to > indicate that somebody set one or the other field to some bizarrely > large value, though. I take it you didn't run the watchpointed backend > far enough to get the memory-alloc error? Oh, but I did..... All the breaks are at sinval.c:888 and at some point the memory-alloc simply occurs. Do you mean you want a backtrace of the last break at line 888 just before the error ? -- Best, Frank.
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Tom Lane
Date:
Frank van Vugt <ftm.van.vugt@foxi.nl> writes: >> This is the expected case. The failure in CopySnapshot has got to >> indicate that somebody set one or the other field to some bizarrely >> large value, though. I take it you didn't run the watchpointed backend >> far enough to get the memory-alloc error? > Oh, but I did..... [ scratches head ... ] AFAICS, the CopySnapshot that gets the error must be trying to copy one or the other of those snapshots. Maybe you should look at the address passed to CopySnapshot and see what it's really pointing at, and where it came from. regards, tom lane
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Frank van Vugt
Date:
> [ scratches head ... ] AFAICS, the CopySnapshot that gets the error > must be trying to copy one or the other of those snapshots. Maybe you > should look at the address passed to CopySnapshot and see what it's > really pointing at, and where it came from. Ok, stepping through the transaction didn't reveal any change in the CopySnapshot address, except for the last one when the error occurs, which is when the deferred trigger is executing an update statement. I guess the only thing left for me is to try and create a script that will reproduce the problem.... ;( Please allow me to describe the sequence of events of the one thing that comes to mind when I take another good look at the functions/triggers and try to think about what could be causing this: - db - a deferred trigger is defined on insert/update/delete of purchaseorder_line (pol) - this trigger is meant to update the corresponding purchaseorder(s) - begin transaction - select purchaseorder_line (pol) X for update (here it is determined whether or not to spawn the line) - call function spawn_pol() on pol X (inside this function marked as volatile) - select pol X for update (in this case for the second time, it's called from other places too) - insert new pol Y - update old pol X (with regards to the spawned amount) - update pol Y (with regards to a number of other fields) - commit transaction - this will fire the earlier deferred trigger on pol - update purchaseorder (po) to which pols X and Y belong, using aggregates (it is this update statement where the error occurs) The only thing I can think of is that the handling of pol X and creation of pol Y from within spawn_pol() is somehow messing things up, but...... If this doesn't ring any alarm bells, then I'll start working on a script first thing next Monday. -- Best, Frank.
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Tom Lane
Date:
Frank van Vugt <ftm.van.vugt@foxi.nl> writes: > The only thing I can think of is that the handling of pol X and creation of > pol Y from within spawn_pol() is somehow messing things up, but...... Certainly the mere firing of a deferred trigger isn't the problem; we do that all the time. What struck me about the traceback you showed is that it involved an attempt by the planner to constant-fold an SQL-language function that was used in the trigger function's query (and was presumably marked IMMUTABLE and had constant arguments in the query). Because the trigger function is plpgsql, this could happen only the first time the trigger is fired in a particular session (unless you are using EXECUTE to invoke the update command?). If the problem is related to the planner constant-folding environment, then the first-time-only limitation would help make it hard to reproduce. regards, tom lane
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Frank van Vugt
Date:
> > The only thing I can think of is that the handling of pol X and creation > > of pol Y from within spawn_pol() is somehow messing things up, but...... > Certainly the mere firing of a deferred trigger isn't the problem; we do > that all the time. Me too ;) I was more trying to emphase the multiple select for update from both in- as well as outside the functions, but you'd probably have caught that already if it could have been the culprit. > What struck me about the traceback <cut> > Because the trigger function is plpgsql, this could happen only the > first time the trigger is fired in a particular session I've tried to run all immutable functions used at least once before running the query-set, this made no difference, same error on the same location. > (unless you are using EXECUTE to invoke the update command?) No, no, it's a plain call. -- Best, Frank.
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Frank van Vugt
Date:
Hi Tom, > Because the trigger function is plpgsql, this could happen only the > first time the trigger is fired in a particular session (unless you are > using EXECUTE to invoke the update command?). If the problem is related > to the planner constant-folding environment, then the first-time-only > limitation would help make it hard to reproduce. It looks like your suspicion was right.... I've isolated one particulare case, simply run the script attached. If you remove the comment before the extra call of spawn_pol() at the bottom of the script, the error is avoided. Hopefully, solving this will also solve the other cases of mem-alloc error I've been seeing ;) I'm not using spawn_pol/sol() during the earlier mentioned conversion, but my hope is that the error generated during conversion will turn out to be related and caused by the constant-folding of some other function. But I'll know soon enough if it isn't ;) Looking forward to your assesment. -- Best, Frank.
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Tom Lane
Date:
Frank van Vugt <ftm.van.vugt@foxi.nl> writes: > Looking forward to your assesment. Looks like the problem is associated with DEFERRED AFTER triggers: we don't normally set a snapshot for TransactionStmt commands, including COMMIT, but there needs to be a snapshot set when running trigger functions. The attached patch fixes the test case you sent; can you try it against your other problem? regards, tom lane *** src/backend/commands/trigger.c.orig Sun Nov 14 14:37:05 2004 --- src/backend/commands/trigger.c Mon Dec 6 13:20:43 2004 *************** *** 2365,2370 **** --- 2365,2378 ---- Assert(afterTriggers->query_depth == -1); /* + * If there are any triggers to fire, make sure we have set a snapshot + * for them to use. (Since PortalRunUtility doesn't set a snap for + * COMMIT, we can't assume ActiveSnapshot is valid on entry.) + */ + if (afterTriggers->events.head != NULL) + ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); + + /* * Run all the remaining triggers. Loop until they are all gone, * just in case some trigger queues more for us to do. */
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Frank van Vugt
Date:
Hi Tom, > Looks like the problem is associated with DEFERRED AFTER triggers > The attached patch fixes the test case you sent; can you > try it against your other problem? I patched RC1 with this and tried the conversion again. Though still running, it's passed the point where the mem-alloc error occured earlier, so it would seem we're in the clear on this one. Thanks for the quick fix ! I guess this patch is already in the RC2 tree? -- Best, Frank.
Re: "invalid memory alloc request size " in deferred trigger causes transaction to fail, but the backend keeps running
From
Tom Lane
Date:
Frank van Vugt <ftm.van.vugt@foxi.nl> writes: > I guess this patch is already in the RC2 tree? No, I was waiting for confirmation ... regards, tom lane