Thread: server crash on raspberry pi for large queries

server crash on raspberry pi for large queries

From
Matthew Clark
Date:
postgres version:  PostgreSQL 15.8 (Raspbian 15.8-0+deb12u1) on arm-unknown-linux-gnueabihf, compiled by gcc (Raspbian 12.2.0-14+rpi1)
OS:  Linux pi 6.6.31+rpt-rpi-v6 #1 Raspbian 1:6.6.31-1+rpt1 (2024-05-29) armv6l GNU/Linux

CPU: (/proc/cpuinfo)

name      : ARMv6-compatible processor rev 7 (v6l)
BogoMIPS        : 697.95
Features        : half thumb fastmult vfp edsp java tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xb76
CPU revision    : 7

Hardware        : BCM2835
Revision        : 000e
Serial          : 00000000d75e7c8b
Model           : Raspberry Pi Model B Rev 2

issue: when issuing a large query ,  e.g.   "Select count(*) from temperature" on a table with > 150M rows the system crashes.

  example error:   LOG:  server process (PID 20037) was terminated by signal 4: Illegal instruction
  no more diagnostic information is generated.


Workaround : this stops happening when jit_above_cost is set to -1 in postgresql.conf   After changing this setting the problem stops.

This was reported in the past for MacOS, and the same workaround was reported to work in that case.

Re: server crash on raspberry pi for large queries

From
Tom Lane
Date:
Matthew Clark <mclark@drmatthewclark.com> writes:
> issue: when issuing a large query ,  e.g.   "Select count(*) from temperature" on a table with > 150M rows the system
crashes.
>   example error:   LOG:  server process (PID 20037) was terminated by signal 4: Illegal instruction  no more
diagnosticinformation is generated. 
> Workaround : this stops happening when jit_above_cost is set to -1 in postgresql.conf   After changing this setting
theproblem stops. 

What this sounds like is a memory leak in the JIT stuff.  We fixed
one such issue last year, but perhaps there's more.  Can you provide
a self-contained test case?  Also, please be more specific about
which Linux version you are using, and which LLVM version.

            regards, tom lane



Re: server crash on raspberry pi for large queries

From
Matthew Clark
Date:
OS:  Linux pi 6.6.31+rpt-rpi-v6 #1 Raspbian 1:6.6.31-1+rpt1 (2024-05-29) armv6l GNU/Linux

I can try to make a test case, essentially a large table, then "select count(*)" from table.  The select works for smaller tables.

I'm using the build of postgresql packaged with the OS version above; from the apt repository and installed with "apt install postgresql".  repository - 

deb [ arch=armhf ] http://raspbian.raspberrypi.com/raspbian/ bookworm main contrib non-free rpi


postgres --version

postgres (PostgreSQL) 15.8 (Raspbian 15.8-0+deb12u1)






On Tuesday, August 20, 2024 at 09:16:16 AM EDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:


Matthew Clark <mclark@drmatthewclark.com> writes:

> issue: when issuing a large query ,  e.g.   "Select count(*) from temperature" on a table with > 150M rows the system crashes.
>   example error:   LOG:  server process (PID 20037) was terminated by signal 4: Illegal instruction  no more diagnostic information is generated.
> Workaround : this stops happening when jit_above_cost is set to -1 in postgresql.conf   After changing this setting the problem stops.


What this sounds like is a memory leak in the JIT stuff.  We fixed
one such issue last year, but perhaps there's more.  Can you provide
a self-contained test case?  Also, please be more specific about
which Linux version you are using, and which LLVM version.

            regards, tom lane

Re: server crash on raspberry pi for large queries

From
David Rowley
Date:
On Wed, 21 Aug 2024 at 07:18, Matthew Clark <mclark@drmatthewclark.com> wrote:
> I can try to make a test case, essentially a large table, then "select count(*)" from table.  The select works for
smallertables.
 

It would be good to figure out which instruction is being executed
that's causing this. Would you be able to attach with gdb and trigger
the crash? [1]. I think gdb should print out the problem instruction.

Looking at master, I see we call LLVMGetHostCPUFeatures() to figure
this stuff out. I've not yet looked to see if that's changed since
PG15. If we knew the instruction that's being executed here then we
might be able to figure out if it's down to cpuid advertising
something that the CPU supports that isn't fully supported (maybe
unlikely?) or if it's LLVM that's accidentally emitting code that does
not work on the CPU.

Does it also trigger if you enable jit but do: "set
jit_optimize_above_cost = -1;", maybe the problem instruction is only
emitted at higher optimisation levels.

Thomas mentioned to me that he has seen issues in this area before,
albeit with x86 on a Celeron [2] when LLVM emitted an unsupported AVX.

David

[1] https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD
[2] https://www.postgresql.org/message-id/CAEepm%3D1oLBeRjGw9RS6n%3Du0fE4t0WZMMawcfJopkmTmxRoefGw%40mail.gmail.com