Thread: Re: PostgreSQL 17 Segmentation Fault
Hi, Thanks for the provided information. Per the backtrace, the failure happens in the LLVM JIT code in nestloop/seqscan, so it has to be in this part of the plan: -> Nested Loop (cost=0.42..6074.84 rows=117 width=641) -> Parallel Seq Scan on tasks__projects (cost=0.00..2201.62 rows=745 width=16) Filter: (gid = '1138791545416725'::text) -> Index Scan using tasks_pkey on tasks tasks_1 (cost=0.42..5.20 rows=1 width=102) Index Cond: (gid = tasks__projects._sdc_source_key_gid) Filter: ((NOT completed) AND (name <> ''::text)) But it's not clear why this should consume a lot of memory, though. It's possible the memory is consumed elsewhere, and this is simply the straw that breaks the camel's back ... Presumably it takes a while for the query to consume a lot of memory and crash - can you attach a debugger to it after after it allocates a lot of memory (but before the crash), and do this: call MemoryContextStats(TopMemoryContext) That should write memory context stats to the server log. Perhaps that will tell us which part of the query allocates memory. Next, try running the query with jit=off. If that resolves the problem, maybe it's another JIT issue. But if it completes with lower shared buffers, that doesn't seem likely. The plan has a bunch of hash joins. I wonder if that might be causing issues, because the hash tables may be kept until the end of the query, and each may be up to 64MB (you have work_mem=32, but there's also 2x multiplier since PG13). The row estimates are pretty low, but could it be that the real row counts are much higher? Did you run analyze after the upgrade? Maybe try with lower work_mem? One last thing you should check is memory overcommit. Chances are it's set just low enough for the query to hit it with SB=4GB, but not with SB=3GB. In that case you may need to tune this a bit. See /proc/meminfo and /proc/sys/vm/overcommit_*). regards -- Tomas Vondra
The query crashes less than a second after running it, so there isn't much time to consume memory or to try attaching GDB mid-query. I tried decreasing work_mem from 32MB to 128kB, but I still get the error. I've also ran vacuum and analyze to no avail. When the query is successful, it only yields 68 rows, so I don't think the row estimates are too far off. I checked the files you mentioned for memory overcommit:
/proc/sys/vm/overcommit_memory = 0
/proc/sys/vm/overcommit_kbytes = 0
/proc/sys/vm/overcommit_ratio = 50
The free RAM on the system starts at and hangs around 8GB while executing the crashing query.
The only two things that have fixed the issue so far: Turning JIT off or decreasing shared_buffers. I suppose then that it might be a JIT issue?
Cameron Vogt | Software Developer
Direct: 314-756-2302 | Cell: 636-388-2050
1585 Fencorp Drive | Fenton, MO 63026
Automatic Controls Equipment Systems, Inc.
On Sat, Oct 5, 2024 at 11:30 AM Cameron Vogt <cvogt@automaticcontrols.net> wrote: > I suppose then that it might be a JIT issue? I see from your info.txt file that this is aarch64. Could it be an instance of LLVM's ARM relocation bug[1]? I'm planning to push that fix taken from the LLVM project soon, I have just been waiting to see if a more polished version would land in LLVM's main branch first, but I'm about to give up waiting for that so we get some testing time in-tree before our next minor release. [1] https://www.postgresql.org/message-id/flat/CAO6_Xqr63qj%3DSx7HY6ZiiQ6R_JbX%2B-p6sTPwDYwTWZjUmjsYBg%40mail.gmail.com
On 2024-10-05 01:40:17 Thomas Munro
<thomas(dot)munro(at)gmail(dot)com> wrote:
> Could it be an instance of LLVM's ARM relocation bug?
After reading about the bug, I believe you are likely correct. That would explain the behavior I'm seeing with JIT and shared_buffers. When I migrated PostgreSQL versions, I also moved to a new aarch64 machine. The old machine was not aarch64, so that may explain the timing of the issue as well.
Cameron Vogt | Software Developer
Direct: 314-756-2302 | Cell: 636-388-2050
1585 Fencorp Drive | Fenton, MO 63026
Automatic Controls Equipment Systems, Inc.