Thread: BUG #18484: "Cannot enlarge string buffer" during parallel execution of prepared statement/partitioning
BUG #18484: "Cannot enlarge string buffer" during parallel execution of prepared statement/partitioning
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18484 Logged by: Michael Banck Email address: michael.banck@netapp.com PostgreSQL version: 16.3 Operating system: Debian bookworm Description: A customer reported to us that a query/prepared statement on their database aborted with ERROR: out of memory DETAIL: Cannot enlarge string buffer containing 1073741812 bytes by 32 more bytes. The underlying database schema was migrated from Oracle ("no problems there") and the base table is two-level partitioned, one range partitioning per day (currently 135), and then each of those is list-partitioned to around 50 sub-partitions. Each partition has around 20 indexes. This leads to around 20k tables and 400k indexes in the schema. The query consists on a simple select on the table with a lot of OR conditions like ((((A_FOO = $1) OR (B_FOO = $2)) AND ((START_TIME <= $3) AND (END_TIME > $4))) OR (((A_FOO = $5) OR (B_FOO = $6)) AND ((START_TIME <= $7) AND (END_TIME > $8))) OR (((A_FOO = $9) OR (B_FOO = $10)) AND ((START_TIME <= $11) AND (END_TIME > $12))) OR (((A_FOO = $13) OR (B_FOO = $14)) AND ((START_TIME <= $15) AND (END_TIME > $16)) Overall there are around 100 parameters. The explain plan shows around 15k Parallel Seq Scans on the various paritions and is around a millions lines long (in order to get the explain, max_connections and max_locks_per_transaction need to be bumped up considerably from the defaults). They shared the schema with us and we were able to reproduce the error on the empty schema. If we set max_parallel_workers_per_gather=0, the prepared statement executes fine. Also, testing this on 17beta1 did not throw the error. The backtrace is: #0 errfinish (filename=0x55feb1d28100 "./build/../src/common/stringinfo.c", lineno=303, funcname=0x55feb1d28170 <__func__.0> "enlargeStringInfo") at ./build/../src/backend/utils/error/elog.c:480 #1 0x000055feb16c7292 in enlargeStringInfo (str=str@entry=0x7fff222fe4b0, needed=32) at ./build/../src/common/stringinfo.c:303 #2 0x000055feb1b9c666 in appendStringInfo (str=str@entry=0x7fff222fe4b0, fmt=fmt@entry=0x55feb1c9e3fa " :varlevelsup %u") at ./build/../src/common/stringinfo.c:110 #3 0x000055feb18e1810 in _outVar (node=0x560304a6c420, str=0x7fff222fe4b0) at ./build/../src/backend/nodes/outfuncs.funcs.c:110 #4 outNode (str=0x7fff222fe4b0, obj=0x560304a6c420) at ./build/../src/backend/nodes/outfuncs.switch.c:31 #5 0x000055feb18e4626 in _outList (str=0x7fff222fe4b0, node=0x560304a6c3d8) at ./build/../src/backend/nodes/outfuncs.c:294 #6 0x000055feb18e0b4a in outNode (str=<optimized out>, obj=<optimized out>) at ./build/../src/backend/nodes/outfuncs.c:843 #7 0x000055feb18e528c in _outOpExpr (str=str@entry=0x7fff222fe4b0, node=node@entry=0x560304a6c390) at ./build/../src/backend/nodes/outfuncs.funcs.c:239 #8 0x000055feb18e187d in outNode (str=0x7fff222fe4b0, obj=0x560304a6c390) at ./build/../src/backend/nodes/outfuncs.switch.c:58 #9 0x000055feb18e4626 in _outList (str=0x7fff222fe4b0, node=0x560304a6c348) at ./build/../src/backend/nodes/outfuncs.c:294 #10 0x000055feb18e0b4a in outNode (str=<optimized out>, obj=<optimized out>) at ./build/../src/backend/nodes/outfuncs.c:843 #11 0x000055feb18f6a81 in _outSeqScan (str=str@entry=0x7fff222fe4b0, node=node@entry=0x560304a6a2e8) at ./build/../src/backend/nodes/outfuncs.funcs.c:4610 #12 0x000055feb18e119b in outNode (str=0x7fff222fe4b0, obj=0x560304a6a2e8) at ./build/../src/backend/nodes/outfuncs.switch.c:961 #13 0x000055feb18e4626 in _outList (str=0x7fff222fe4b0, node=0x5602d5976a10) at ./build/../src/backend/nodes/outfuncs.c:294 #14 0x000055feb18e0b4a in outNode (str=<optimized out>, obj=<optimized out>) at ./build/../src/backend/nodes/outfuncs.c:843 #15 0x000055feb18f5dc5 in _outAppend (str=0x7fff222fe4b0, node=0x5602d50a0598) at ./build/../src/backend/nodes/outfuncs.funcs.c:4485 #16 0x000055feb18e11eb in outNode (str=0x7fff222fe4b0, obj=0x5602d50a0598) at ./build/../src/backend/nodes/outfuncs.switch.c:946 #17 0x000055feb18f50fd in _outPlannedStmt (str=str@entry=0x7fff222fe4b0, node=node@entry=0x56030f2d1da8) at ./build/../src/backend/nodes/outfuncs.funcs.c:4360 #18 0x000055feb18e122b in outNode (str=0x7fff222fe4b0, obj=0x56030f2d1da8) at ./build/../src/backend/nodes/outfuncs.switch.c:934 #19 0x000055feb18fc9e4 in nodeToString (obj=obj@entry=0x56030f2d1da8) at ./build/../src/backend/nodes/outfuncs.c:889 #20 0x000055feb18741f7 in ExecSerializePlan (estate=0x5602749b9010, plan=<optimized out>) at ./build/../src/backend/executor/execParallel.c:218 #21 ExecInitParallelPlan (planstate=0x5602749b9620, estate=0x5602749b9010, sendParams=0x0, nworkers=2, tuples_needed=-1) at ./build/../src/backend/executor/execParallel.c:633 #22 0x000055feb188b051 in ExecGather (pstate=0x5602749b9448) at ./build/../src/backend/executor/nodeGather.c:171 #23 0x000055feb18a1866 in ExecProcNode (node=0x5602749b9448) at ./build/../src/include/executor/executor.h:273 #24 ExecSort (pstate=0x5602749b9240) at ./build/../src/backend/executor/nodeSort.c:149 #25 0x000055feb1870c05 in ExecProcNode (node=0x5602749b9240) at ./build/../src/include/executor/executor.h:273 #26 ExecutePlan (execute_once=<optimized out>, dest=0x55feb2b3f6d8, direction=<optimized out>, numberTuples=0, sendTuples=true, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x5602749b9240, estate=0x5602749b9010) at ./build/../src/backend/executor/execMain.c:1670 #27 standard_ExecutorRun (queryDesc=0x55feb2b4ddd0, direction=<optimized out>, count=0, execute_once=<optimized out>) at ./build/../src/backend/executor/execMain.c:365 #28 0x000055feb1a18c9e in PortalRunSelect (portal=portal@entry=0x55feb2bdcf88, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x55feb2b3f6d8) at ./build/../src/backend/tcop/pquery.c:924 #29 0x000055feb1a1a130 in PortalRun (portal=portal@entry=0x55feb2bdcf88, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=false, run_once=run_once@entry=true, dest=dest@entry=0x55feb2b3f6d8, altdest=altdest@entry=0x55feb2b3f6d8, qc=<optimized out>) at ./build/../src/backend/tcop/pquery.c:768 #30 0x000055feb1825abc in ExecuteQuery (pstate=<optimized out>, stmt=<optimized out>, intoClause=<optimized out>, params=<optimized out>, dest=<optimized out>, qc=<optimized out>) at ./build/../src/backend/commands/prepare.c:258 #31 0x000055feb1a1b748 in standard_ProcessUtility (pstmt=0x55feb2afc048, queryString=0x55feb2afab30 "execute t (\n'262010000000001', '262010000000001', '20100621131707', '20090420140642',\n'262010000000002','262010000000002', '20090420140642', '20081209084155', \n'262010000000003', '262010000000003', "..., readOnlyTree=<optimized out>, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=<optimized out>, qc=<optimized out>) at ./build/../src/backend/tcop/utility.c:757 #32 0x000055feb1a19bb1 in PortalRunUtility (portal=portal@entry=0x55feb2bdce80, pstmt=0x55feb2afc048, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=true, dest=dest@entry=0x55feb2b3f6d8, qc=qc@entry=0x7fff222fe9d0) at ./build/../src/backend/tcop/pquery.c:1158 #33 0x000055feb1a19f77 in FillPortalStore (portal=portal@entry=0x55feb2bdce80, isTopLevel=isTopLevel@entry=true) at ./build/../src/backend/tcop/pquery.c:1031 #34 0x000055feb1a1a2bf in PortalRun (portal=portal@entry=0x55feb2bdce80, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x55feb2afc150, altdest=altdest@entry=0x55feb2afc150, qc=<optimized out>) at ./build/../src/backend/tcop/pquery.c:763 #35 0x000055feb1a164c5 in exec_simple_query (query_string=<optimized out>) at ./build/../src/backend/tcop/postgres.c:1272 #36 0x000055feb1a17209 in PostgresMain (dbname=<optimized out>, username=<optimized out>) at ./build/../src/backend/tcop/postgres.c:4654 #37 0x000055feb198f33c in BackendRun (port=0x55feb2b4e7f0) at ./build/../src/backend/postmaster/postmaster.c:4464 #38 BackendStartup (port=0x55feb2b4e7f0) at ./build/../src/backend/postmaster/postmaster.c:4192 #39 ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1782 #40 0x000055feb1990328 in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0x55feb2ab7ef0) at ./build/../src/backend/postmaster/postmaster.c:1466 #41 0x000055feb16c763d in main (argc=5, argv=0x55feb2ab7ef0) at ./build/../src/backend/main/main.c:198 So I guess the problem is somewhere here: ExecInitParallelPlan - src/backend/executor/execParallel.c:633 /* Fix up and serialize plan to be sent to workers. */ pstmt_data = ExecSerializePlan(planstate->plan, estate); ExecSerializePlan - src/backend/executor/execParallel.c:218 /* Return serialized copy of our dummy PlannedStmt. */ return nodeToString(pstmt); I agree that the overall schema architecture should be rethought and much fewer partitions be used (and they are looking into this), but as the error does only happen with a parallel plan, I am reporting it here in case there is an easy fix for this, or at least maybe we can abort earlier with a better error message?
Re: BUG #18484: "Cannot enlarge string buffer" during parallel execution of prepared statement/partitioning
From
David Rowley
Date:
On Wed, 29 May 2024 at 04:42, PG Bug reporting form <noreply@postgresql.org> wrote: > ERROR: out of memory > DETAIL: Cannot enlarge string buffer containing 1073741812 bytes by 32 more > bytes. > I agree that the overall schema architecture should be rethought and much > fewer partitions be used (and they are looking into this), but as the error > does only happen with a parallel plan, I am reporting it here in case there > is an easy fix for this, or at least maybe we can abort earlier with a > better error message? I think we should at least write a better error message. "out of memory" is misleading. I've attached a patch with some proposed better wording. The above will become: ERROR: string buffer exceeds maximum allowed length (1073741823) DETAIL: Cannot enlarge string buffer containing 1073741812 bytes by 32 more I also see a few other places that report ERRCODE_PROGRAM_LIMIT_EXCEEDED and display "out of memory". Namely: pg_do_encoding_conversion perform_default_encoding_conversion pvsnprintf We should probably consider fixing those too. I understand this isn't to the level that you'd like to see this fixed, but it at least should stop people from thinking that installing more RAM might help. David
Attachment
Re: BUG #18484: "Cannot enlarge string buffer" during parallel execution of prepared statement/partitioning
From
Daniel Gustafsson
Date:
> On 30 May 2024, at 03:26, David Rowley <dgrowleyml@gmail.com> wrote: > I think we should at least write a better error message. "out of > memory" is misleading. I've attached a patch with some proposed better > wording. > > The above will become: > ERROR: string buffer exceeds maximum allowed length (1073741823) > DETAIL: Cannot enlarge string buffer containing 1073741812 bytes by 32 more Should we use a more human readable unit than bytes in the errmsg as well? -- Daniel Gustafsson
Re: BUG #18484: "Cannot enlarge string buffer" during parallel execution of prepared statement/partitioning
From
David Rowley
Date:
On Thu, 30 May 2024 at 18:25, Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 30 May 2024, at 03:26, David Rowley <dgrowleyml@gmail.com> wrote: > > The above will become: > > ERROR: string buffer exceeds maximum allowed length (1073741823) > > DETAIL: Cannot enlarge string buffer containing 1073741812 bytes by 32 more > > Should we use a more human readable unit than bytes in the errmsg as well? I wondered about that too. There were a couple of reasons I didn't. 1) It's easy to see that 1073741812 + 32 is greater than 1073741823. If we divided those down by 1024 a few times then that calculation can no longer be done. 2) I wanted to use MaxAllocSize rather than hard wire anything into the string. That left just doing something like MaxAllocSize / 1024 / 1024. If you did that with the limit size and the current string size, it would likely just turn into the same number and make the complaint confusing. e.g: ERROR: string buffer exceeds maximum allowed length (1023MB) DETAIL: Cannot enlarge string buffer containing 1023MB by 32 bytes more or ERROR: string buffer exceeds maximum allowed length (1023MB) DETAIL: Cannot enlarge string buffer containing 1023MB by 0MB more I know the bytes are hard to read, but I just don't feel either of the above are easier for a user to process. The first one gives them 2 different units that they need to convert between and the 2nd is confusing because being unable to enlarge by zero seems like a strange limitation. David David
Re: BUG #18484: "Cannot enlarge string buffer" during parallel execution of prepared statement/partitioning
From
Daniel Gustafsson
Date:
> On 30 May 2024, at 11:10, David Rowley <dgrowleyml@gmail.com> wrote: > > On Thu, 30 May 2024 at 18:25, Daniel Gustafsson <daniel@yesql.se> wrote: >> >>> On 30 May 2024, at 03:26, David Rowley <dgrowleyml@gmail.com> wrote: >>> The above will become: >>> ERROR: string buffer exceeds maximum allowed length (1073741823) >>> DETAIL: Cannot enlarge string buffer containing 1073741812 bytes by 32 more >> >> Should we use a more human readable unit than bytes in the errmsg as well? > > I wondered about that too. There were a couple of reasons I didn't. 1) > It's easy to see that 1073741812 + 32 is greater than 1073741823. If > we divided those down by 1024 a few times then that calculation can no > longer be done. 2) I wanted to use MaxAllocSize rather than hard wire > anything into the string. That left just doing something like > MaxAllocSize / 1024 / 1024. If you did that with the limit size and > the current string size, it would likely just turn into the same > number and make the complaint confusing. e.g: > > ERROR: string buffer exceeds maximum allowed length (1023MB) > DETAIL: Cannot enlarge string buffer containing 1023MB by 32 bytes more > > or > > ERROR: string buffer exceeds maximum allowed length (1023MB) > DETAIL: Cannot enlarge string buffer containing 1023MB by 0MB more > > I know the bytes are hard to read, but I just don't feel either of the > above are easier for a user to process. The first one gives them 2 > different units that they need to convert between Fair enough, mixing units probably wont be more readable. The middle ground might be a bit much: ERROR: string buffer exceeds maximum allowed length (1023MB) DETAIL: Cannot enlarge string buffer containing 1023MB (1073741823 bytes) by 32 bytes more Did you omit bytes from the errmsg on purpose? "length" of a string buffer can be interpreted as the number of characters, and that wouldn't be true for wide chars. > and the 2nd is > confusing because being unable to enlarge by zero seems like a strange > limitation. Yeah, that would no doubt be a source of endless bug reports and Stack Overflow searches. -- Daniel Gustafsson
Re: BUG #18484: "Cannot enlarge string buffer" during parallel execution of prepared statement/partitioning
From
David Rowley
Date:
On Thu, 30 May 2024 at 21:27, Daniel Gustafsson <daniel@yesql.se> wrote: > ERROR: string buffer exceeds maximum allowed length (1023MB) > DETAIL: Cannot enlarge string buffer containing 1023MB (1073741823 bytes) by 32 bytes more That's better than the two alternatives I showed. The problem now is that the actual limit (0x3fffffff), is 1 byte less than 1024MB, so showing 1023MB is likely debatable. It's just 1 byte in a ~million short. I kinda think we could just avoid any future revisits from people having concern about which direction we round to by just not rounding and doing bytes. It's not as if it's a message we expect many people to see, but if they do see it, it's probably going to be analyzed quite a bit. Having the highest level of detail for that seems like a good idea. I don't think it should leave any other questions other than "Why is Postgres trying to build such a long string?". Of course, fixing that one is harder. > Did you omit bytes from the errmsg on purpose? "length" of a string buffer can > be interpreted as the number of characters, and that wouldn't be true for wide > chars. Not on purpose. I do think (%zu bytes) would be better than (%zu). I only thought about that while typing "MB" in the 2nd email. David
Re: BUG #18484: "Cannot enlarge string buffer" during parallel execution of prepared statement/partitioning
From
Daniel Gustafsson
Date:
> On 30 May 2024, at 11:55, David Rowley <dgrowleyml@gmail.com> wrote: > It's not as if it's a message we expect > many people to see, but if they do see it, it's probably going to be > analyzed quite a bit. Having the highest level of detail for that > seems like a good idea. I don't think it should leave any other > questions other than "Why is Postgres trying to build such a long > string?". Good point. > Not on purpose. I do think (%zu bytes) would be better than (%zu). I > only thought about that while typing "MB" in the 2nd email. +1 -- Daniel Gustafsson
Re: BUG #18484: "Cannot enlarge string buffer" during parallel execution of prepared statement/partitioning
From
Tom Lane
Date:
Daniel Gustafsson <daniel@yesql.se> writes: >> On 30 May 2024, at 11:55, David Rowley <dgrowleyml@gmail.com> wrote: >> Not on purpose. I do think (%zu bytes) would be better than (%zu). I >> only thought about that while typing "MB" in the 2nd email. > +1 +1 for "(%zu bytes)", and also for not rounding off. This is a case where that won't make it more intelligible. regards, tom lane
Re: BUG #18484: "Cannot enlarge string buffer" during parallel execution of prepared statement/partitioning
From
David Rowley
Date:
On Fri, 31 May 2024 at 03:37, Tom Lane <tgl@sss.pgh.pa.us> wrote: > +1 for "(%zu bytes)", and also for not rounding off. This is a case > where that won't make it more intelligible. Here's a revised patch with "(%zu bytes)". Unsure if this needs to wait until we branch or if it's fine to sneak into v17. I'll leave it til after the branch unless anyone feels differently. David
Attachment
Re: BUG #18484: "Cannot enlarge string buffer" during parallel execution of prepared statement/partitioning
From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes: > Unsure if this needs to wait until we branch or if it's fine to sneak > into v17. I'll leave it til after the branch unless anyone feels > differently. While I'd not fight against committing now, I'd lean to waiting for v18. It's not a bug fix, and it's changing translatable messages at a time when the translators don't need extra work. regards, tom lane
Re: BUG #18484: "Cannot enlarge string buffer" during parallel execution of prepared statement/partitioning
From
Daniel Gustafsson
Date:
> On 31 May 2024, at 03:25, Tom Lane <tgl@sss.pgh.pa.us> wrote: > ..and it's changing translatable messages at a time when the translators don't need extra work. This is a compelling argument for waiting for v18 I think. -- Daniel Gustafsson
Re: BUG #18484: "Cannot enlarge string buffer" during parallel execution of prepared statement/partitioning
From
David Rowley
Date:
On Fri, 31 May 2024 at 19:05, Daniel Gustafsson <daniel@yesql.se> wrote: > > > On 31 May 2024, at 03:25, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > ..and it's changing translatable messages at a time when the translators don't need extra work. > > This is a compelling argument for waiting for v18 I think. Since we've now branched for v18, I've pushed this. Thank you for the reviews. David