Thread: server crash on raspberry pi for large queries
postgres version: PostgreSQL 15.8 (Raspbian 15.8-0+deb12u1) on arm-unknown-linux-gnueabihf, compiled by gcc (Raspbian 12.2.0-14+rpi1)
OS: Linux pi 6.6.31+rpt-rpi-v6 #1 Raspbian 1:6.6.31-1+rpt1 (2024-05-29) armv6l GNU/Linux
CPU: (/proc/cpuinfo)
name : ARMv6-compatible processor rev 7 (v6l)
BogoMIPS : 697.95
Features : half thumb fastmult vfp edsp java tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xb76
CPU revision : 7
Hardware : BCM2835
Revision : 000e
Serial : 00000000d75e7c8b
Model : Raspberry Pi Model B Rev 2
BogoMIPS : 697.95
Features : half thumb fastmult vfp edsp java tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xb76
CPU revision : 7
Hardware : BCM2835
Revision : 000e
Serial : 00000000d75e7c8b
Model : Raspberry Pi Model B Rev 2
issue: when issuing a large query , e.g. "Select count(*) from temperature" on a table with > 150M rows the system crashes.
example error: LOG: server process (PID 20037) was terminated by signal 4: Illegal instruction
no more diagnostic information is generated.
Workaround : this stops happening when jit_above_cost is set to -1 in postgresql.conf After changing this setting the problem stops.
This was reported in the past for MacOS, and the same workaround was reported to work in that case.
Matthew Clark <mclark@drmatthewclark.com> writes: > issue: when issuing a large query , e.g. "Select count(*) from temperature" on a table with > 150M rows the system crashes. > example error: LOG: server process (PID 20037) was terminated by signal 4: Illegal instruction no more diagnosticinformation is generated. > Workaround : this stops happening when jit_above_cost is set to -1 in postgresql.conf After changing this setting theproblem stops. What this sounds like is a memory leak in the JIT stuff. We fixed one such issue last year, but perhaps there's more. Can you provide a self-contained test case? Also, please be more specific about which Linux version you are using, and which LLVM version. regards, tom lane
OS: Linux pi 6.6.31+rpt-rpi-v6 #1 Raspbian 1:6.6.31-1+rpt1 (2024-05-29) armv6l GNU/Linux
I can try to make a test case, essentially a large table, then "select count(*)" from table. The select works for smaller tables.
I'm using the build of postgresql packaged with the OS version above; from the apt repository and installed with "apt install postgresql". repository -
deb [ arch=armhf ] http://raspbian.raspberrypi.com/raspbian/ bookworm main contrib non-free rpi
postgres --version
postgres (PostgreSQL) 15.8 (Raspbian 15.8-0+deb12u1)
On Tuesday, August 20, 2024 at 09:16:16 AM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Matthew Clark <mclark@drmatthewclark.com> writes:
> issue: when issuing a large query , e.g. "Select count(*) from temperature" on a table with > 150M rows the system crashes.
> example error: LOG: server process (PID 20037) was terminated by signal 4: Illegal instruction no more diagnostic information is generated.
> Workaround : this stops happening when jit_above_cost is set to -1 in postgresql.conf After changing this setting the problem stops.
What this sounds like is a memory leak in the JIT stuff. We fixed
one such issue last year, but perhaps there's more. Can you provide
a self-contained test case? Also, please be more specific about
which Linux version you are using, and which LLVM version.
regards, tom lane
> issue: when issuing a large query , e.g. "Select count(*) from temperature" on a table with > 150M rows the system crashes.
> example error: LOG: server process (PID 20037) was terminated by signal 4: Illegal instruction no more diagnostic information is generated.
> Workaround : this stops happening when jit_above_cost is set to -1 in postgresql.conf After changing this setting the problem stops.
What this sounds like is a memory leak in the JIT stuff. We fixed
one such issue last year, but perhaps there's more. Can you provide
a self-contained test case? Also, please be more specific about
which Linux version you are using, and which LLVM version.
regards, tom lane
On Wed, 21 Aug 2024 at 07:18, Matthew Clark <mclark@drmatthewclark.com> wrote: > I can try to make a test case, essentially a large table, then "select count(*)" from table. The select works for smallertables. It would be good to figure out which instruction is being executed that's causing this. Would you be able to attach with gdb and trigger the crash? [1]. I think gdb should print out the problem instruction. Looking at master, I see we call LLVMGetHostCPUFeatures() to figure this stuff out. I've not yet looked to see if that's changed since PG15. If we knew the instruction that's being executed here then we might be able to figure out if it's down to cpuid advertising something that the CPU supports that isn't fully supported (maybe unlikely?) or if it's LLVM that's accidentally emitting code that does not work on the CPU. Does it also trigger if you enable jit but do: "set jit_optimize_above_cost = -1;", maybe the problem instruction is only emitted at higher optimisation levels. Thomas mentioned to me that he has seen issues in this area before, albeit with x86 on a Celeron [2] when LLVM emitted an unsupported AVX. David [1] https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD [2] https://www.postgresql.org/message-id/CAEepm%3D1oLBeRjGw9RS6n%3Du0fE4t0WZMMawcfJopkmTmxRoefGw%40mail.gmail.com