Thread: BUG #17145: Invalid memory alloc request size, when searching in text column with big content > 250MB
BUG #17145: Invalid memory alloc request size, when searching in text column with big content > 250MB
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 17145 Logged by: James Inform Email address: james.inform@pharmapp.de PostgreSQL version: 13.4 Operating system: Ubuntu 18.04 LTS / MacOS 10.15.7 Description: Hi, while importing some log data into a PostgreSQL table a came across this issue. When I try to search a text field in a where clause that holds more than 250 MB of text, PostgreSQL runs out of memory for the request. You can reproduce this behaviour with the following sql statement: with q1 as ( -- 260*1024*1024 = 260MB select repeat('x',260*1024*1024) as mydata ) select count(*) from q1 where lower(mydata) like '%a%' ; ERROR: invalid memory alloc request size 1090519044 Using just a 250MB string: with q1 as ( -- 250*1024*1024 = 250MB select repeat('x',250*1024*1024) as mydata ) select count(*) from q1 where lower(mydata) like '%a%' ; Everything is fine! The alloc request size seems to be 4 times the length of the text field. The issue is also reproducible when using a table with a text column and filling it with a string like above. create table mytest as with q1 as ( -- 260*1024*1024 = 260MB select repeat('x',260*1024*1024) as mydata ) select * from q1; select count(*) from mytest where lower(mydata) like '%a%' ; ERROR: invalid memory alloc request size 1090519044 Also creating an index e.g. > create extension pg_trgm; > create index on mytest using gin(lower(mydata) gin_trgm_ops); is giving the error. Is this an expected behaviour? Why ist PostgreSQL allocating 4 times the column's memory when doing a search like above? It seems, that currently nobody will be able to use a text field with more than 256 MB content with a function bases index nor search for a record using the field in a where clause. If the behaviour of using 4 times the memory is due to performance considerations, then maybe the underlaying algorithm must be devided into an efficient algo for < 256 MB and a less efficient but working without error one for >= 256MB. If we cannot change the behaviour, we should state this information in the documentation, telling that you can store up to 1GB of string data into a text column, but only up to 256 MB if you want to use the column for complexer where clauses. Cheers, James
Re: BUG #17145: Invalid memory alloc request size, when searching in text column with big content > 250MB
From
Julien Rouhaud
Date:
On Mon, Aug 16, 2021 at 4:46 PM PG Bug reporting form <noreply@postgresql.org> wrote: > > When I try to search a text field in a where clause that holds more than 250 > MB of text, PostgreSQL runs out of memory for the request. > > You can reproduce this behaviour with the following sql statement: > > with > q1 as > ( > -- 260*1024*1024 = 260MB > select repeat('x',260*1024*1024) as mydata > ) > select count(*) from q1 where lower(mydata) like '%a%' > ; > > ERROR: invalid memory alloc request size 1090519044 > > Using just a 250MB string: > > with > q1 as > ( > -- 250*1024*1024 = 250MB > select repeat('x',250*1024*1024) as mydata > ) > select count(*) from q1 where lower(mydata) like '%a%' > ; > > Everything is fine! > > The alloc request size seems to be 4 times the length of the text field. > [...] > Is this an expected behaviour? > Why ist PostgreSQL allocating 4 times the column's memory when doing a > search like above? This is unfortunately the expected behavior, assuming that you're not dealing with C/POSIX encoding. This is because in multibyte encoding each character can occupy up to 4B. Postgres needs to allocate a single chunk of memory to hold the resulting text, and it has no way to know how many multibyte characters are present in the input string or how many character will have a different size when down-cased, so it has to allocate the maximum size that may be needed, which is 4 times the size of the input string. And there's a strict 1GB limitation for a single field size, thus the ~256MB limit.
Re: BUG #17145: Invalid memory alloc request size, when searching in text column with big content > 250MB
From
James Inform
Date:
Well I understand. But in my scenario I am using ansi characters which are just 1-byte-utf8. The system should be capable of handling that. Maybe we could implement a different approach for text column content > 256 GB, where we implement something like a "streaming" read where memory is not allocated based on fixed 4-bytes per character, but on the true length of characters found in the text column or returned by a function like lower(). But as a user i would expect that I can store as many characters as possible into a text field without getting errors in any way. The following example will give an error although the amount of text easily fits into the column because the 'x' is ansi and takes 1 byte. create temp table mytest(mydata text); create index on mytest(lower(mydata)); insert into mytest select repeat('x',300*1024*1024) as mydata; Without the index I could easily store 800mb in the text column: select repeat('x',800*1024*1024) as mydata; So from what I see we should at least extend the documentation and tell people that although a text field can hold up to 1GB of string data, only 256MB can safely be used if one wants to use function-based indexed or function on the column inside a where clauses. So for everyone using text columns in a more than basic way, we should simply tell everyone not not store more than 256 * 1024*1024 characters in a text column. Julien Rouhaud wrote: > On Mon, Aug 16, 2021 at 4:46 PM PG Bug reporting form > <noreply@postgresql.org> wrote: >> When I try to search a text field in a where clause that holds more than 250 >> MB of text, PostgreSQL runs out of memory for the request. >> >> You can reproduce this behaviour with the following sql statement: >> >> with >> q1 as >> ( >> -- 260*1024*1024 = 260MB >> select repeat('x',260*1024*1024) as mydata >> ) >> select count(*) from q1 where lower(mydata) like '%a%' >> ; >> >> ERROR: invalid memory alloc request size 1090519044 >> >> Using just a 250MB string: >> >> with >> q1 as >> ( >> -- 250*1024*1024 = 250MB >> select repeat('x',250*1024*1024) as mydata >> ) >> select count(*) from q1 where lower(mydata) like '%a%' >> ; >> >> Everything is fine! >> >> The alloc request size seems to be 4 times the length of the text field. >> [...] >> Is this an expected behaviour? >> Why ist PostgreSQL allocating 4 times the column's memory when doing a >> search like above? > This is unfortunately the expected behavior, assuming that you're not > dealing with C/POSIX encoding. > > This is because in multibyte encoding each character can occupy up to > 4B. Postgres needs to allocate a single chunk of memory to hold the > resulting text, and it has no way to know how many multibyte > characters are present in the input string or how many character will > have a different size when down-cased, so it has to allocate the > maximum size that may be needed, which is 4 times the size of the > input string. And there's a strict 1GB limitation for a single field > size, thus the ~256MB limit.