Thread: Memory leak with palloc
Hello, Let's say I want to run select count(column1) as column1, count(column2) as column2, ... count(column228) as column228 from aview; coulumni are really function calls that return a computed text that is palloc'd. Postgres seems to free the palloc'd memory at the end of the select statement, which means that the above query may of may not succeed depending on machine size, database size and amount of memory allocated. Is there a way (and if there isn't, should there be) to tell postgres either: 1. Please release this memory when you're done with this record. or 2. Here is a pointer to static memory. Please don't free. Both would solve my problem (and I guess 2 would be the more efficient). Cheers, Han Holl PS The above query failed very quickly indeed on Postgres 7.2.3, (due to memory exhaustion,) which seems to have a additional memory leak of it's own. On 7.3RC2 it's running fine but consuming memory at a steady rate (from 10M to 115M in 35 minutes, with 1.3G virtual memory I', confident that it will complete).
Han Holl <han.holl@prismant.nl> writes: > Postgres seems to free the palloc'd memory at the end of the > select statement, It should do so sooner than that. Can you provide a self-contained example? regards, tom lane
On Fri, Nov 29, 2002 at 05:24:20PM +0100, tgl@sss.pgh.pa.us wrote: > > Han Holl <han.holl@prismant.nl> writes: > > Postgres seems to free the palloc'd memory at the end of the > > select statement, > > It should do so sooner than that. Can you provide a self-contained > example? I have tried to do that and failed. The problem proved to have nothing to do with palloc (I was too hasty there), but there _is_ something fishy here: Given table: CREATE TABLE "rubdefs" ( "srto" character(1) NOT NULL, "titel" character varying(40), "rubnr" smallint ); and the function: CREATE FUNCTION "rubton" (text,text) RETURNS smallint AS ' select rubnr from rubdefs where srto = $1 and titel = $2 ' LANGUAGE 'sql' WITH ( iscachable ); A query like: select count( rubton(udps.srto, 'adres')) from udps where srto = 'T'; on a view will use (leak) approximately 64 bytes per tuples visited, per column. My original query had 230 columns and 450000 tuples, so it would have required about 6000 Mb. Rewriting the mapping function as: create function rubton2(text, text) returns int as ' declare rsl integer; begin select into rsl rubnr from rubdefs where srto = $1 and titel = $2; return rsl; end ' language 'plpgsql'; solves the problem (but is slightly slower). So it looks like using the pure SQL version here is not a good idea. I'm quite willing to produce an example of this behaviour, but I suspect that it's 'known behaviour' to experts. Cheers, Han Holl > > regards, tom lane >
"Han Holl" <han.holl@prismant.nl> writes: > So it looks like using the pure SQL version here is not a good idea. > I'm quite willing to produce an example of this behaviour, but I > suspect that it's 'known behaviour' to experts. No, not to me anyway. I would not be surprised if there's some leak in the SQL-function code, but could I trouble you for a complete example, rather than having to reverse-engineer one? regards, tom lane
On Sat, Nov 30, 2002 at 05:20:57PM +0100, tgl@sss.pgh.pa.us wrote: > > "Han Holl" <han.holl@prismant.nl> writes: > > So it looks like using the pure SQL version here is not a good idea. > > I'm quite willing to produce an example of this behaviour, but I > > suspect that it's 'known behaviour' to experts. > > No, not to me anyway. I would not be surprised if there's some leak in > the SQL-function code, but could I trouble you for a complete example, > rather than having to reverse-engineer one? Of course. I hope you have a Linux like system, otherwise the following might not run. If I rub this shellscript, I see with tom in a different terminal that the postmaster process grows to about 28Mb, and then collapses to the initial 1.5 Mb. To avoid quoting problems, I;ll attach the shellscript as well Cheers Han Holl #!/bin/bash PGLIB=/usr/local/pgsql/lib INC=/usr/src/postgresql-7.3rc2/src/include TABLE=/tmp/demotable.tbl MAPPING=/tmp/mapping.tbl SQL=/tmp/demo.sql set -e [ -f $TABLE ] || { perl -e 'for ($i = 0 ; $i < 300000 ; $i++) { printf("%s%d\n",chr(65 + ($i % 4)), $i); }' >$TABLE } [ -f $MAPPING ] || { for sr in A B C D do for i in 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 do echo -e "$sr\tstring$i\t$i" done done >$MAPPING } cat >$SQL <<EOF drop table demotable cascade; create table demotable ( column1 text ); drop table demorub cascade; create table demorub ( srto char, titel text, rnum integer ); copy demotable from '$TABLE'; copy demorub from '$MAPPING'; create or replace function mapping (char, text) returns int as ' select rnum from demorub where srto = \$1 and titel = \$2 ' language 'sql'; select count(mapping(substring(column1,1,1), 'string2')) from demotable; EOF psql <$SQL
Attachment
"Han Holl" <han.holl@prismant.nl> writes: >> No, not to me anyway. I would not be surprised if there's some leak in >> the SQL-function code, but could I trouble you for a complete example, >> rather than having to reverse-engineer one? > Of course. I hope you have a Linux like system, otherwise the following > might not run. Okay, it turns out this is indeed an aspect of a known problem, namely that SQL-language functions aren't good about cleaning up query-lifespan data created by the queries they run. The individual queries really ought to be run using a query memory context that's shorter-lived than that of the query that's calling the function. But they're not, yet. I was able to eliminate the memory leak in your example by adding "pfree(scanstate);" to the end of ExecEndSeqScan(). However, that's not a general solution, since there are comparable leaks in most of the other executor node types; and even if we eliminated them all today, it would be difficult to prevent new ones from appearing in future. The right long-term solution is to arrange for all query-lifetime data to be allocated in a specific context that can be freed in-toto when we end the query. To do this cleanly, we have to fix the executor to not scribble on plan trees (executor state nodes should point to plan nodes, never vice-versa). I've been intending to do that for awhile, but it's not done yet ... regards, tom lane
On Sun, Dec 01, 2002 at 09:45:14PM +0100, Han.Holl@prismant.nl wrote: > Of course. I hope you have a Linux like system, otherwise the following > might not run. > > If I rub this shellscript, I see with tom in a different terminal that Lousy spelling here! Please read as: If I run this shellscript, I see with top in a different terminal that > the postmaster process grows to about 28Mb, and then collapses to the > initial 1.5 Mb. Sorry, Han Holl
Thanks for the information. I must say the involvement of the developers of PostgreSQL is impressive. Cheers, Han Holl On Sunday 01 December 2002 11:09 pm, tgl@sss.pgh.pa.us wrote: > > Okay, it turns out this is indeed an aspect of a known problem, namely > that SQL-language functions aren't good about cleaning up query-lifespan > data created by the queries they run. The individual queries really > ought to be run using a query memory context that's shorter-lived than > that of the query that's calling the function. But they're not, yet. > > I was able to eliminate the memory leak in your example by adding > "pfree(scanstate);" to the end of ExecEndSeqScan(). However, that's > not a general solution, since there are comparable leaks in most of > the other executor node types; and even if we eliminated them all today, > it would be difficult to prevent new ones from appearing in future. > > The right long-term solution is to arrange for all query-lifetime data > to be allocated in a specific context that can be freed in-toto when > we end the query. To do this cleanly, we have to fix the executor to > not scribble on plan trees (executor state nodes should point to plan > nodes, never vice-versa). I've been intending to do that for awhile, > but it's not done yet ... > > regards, tom lane
A couple weeks ago I said: > The right long-term solution is to arrange for all query-lifetime data > to be allocated in a specific context that can be freed in-toto when > we end the query. To do this cleanly, we have to fix the executor to > not scribble on plan trees (executor state nodes should point to plan > nodes, never vice-versa). I've been intending to do that for awhile, > but it's not done yet ... Just FYI, this is now done. Your example doesn't leak memory anymore in CVS tip. regards, tom lane