Re: wrong query results on bf leafhopper - Mailing list pgsql-hackers

From Robins Tharakan
Subject Re: wrong query results on bf leafhopper
Date
Msg-id CAEP4nAwhtsZYFfzLGiq-tHJaEFw55TpnrKOxzMU1R+HsL3wjEg@mail.gmail.com
Whole thread Raw
In response to Re: wrong query results on bf leafhopper  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
Hi,

On Thu, 29 May 2025 at 02:32, Andres Freund <andres@anarazel.de> wrote:
On 2025-05-28 22:51:14 +0930, Robins Tharakan wrote: 
> Recently leafhopper failed again on the same test. For now I've paused it.
> To rule out the compiler (and its maturity on the architecture), I'll
> upgrade
> gcc (to nightly, or something more recent) and then re-enable to see if it
> changes anything.

+1 to a gcc upgrade, gcc 11 is rather old and out of upstream support.


Ack. I've updated leafhopper to gcc master. For now (to get the machine
green / running), I've disabled some flags, which I'll revisit in some time,
but hopefully that's not about compiler maturity - which is what I'm after here.

 
A kernel upgrade would be good too. My completely baseless gut feeling is that
some SIMD registers occassionally get corrupted, e.g. due to a kernel
interrupt / context switch not properly storing & restoring them. Weirdly
enought the instrumentation code is among the pieces of PG code most
vulnerable to that because we mostly don't do enough auto-vectorizable math,
but InstrEndLoop(), InstrStopNode() etc are trivially auto-vectorizable.  I'm
pretty sure I've previously analyzed problems around this, but don't remember
the details (IA64 maybe?).

Fair point, I'll keep that option open. Originally, the machine was spun up to
evaluate the graviton4 ec2 instance and I'd like to explore whether the
stock-kernel / kernel-updates are able to keep the instance green (and resort
to updating the kernel only if I exhaust all other options - pg / compiler etc.).

-
robins
 

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Add “FOR UPDATE NOWAIT” lock details to the log.
Next
From: Tom Lane
Date:
Subject: Re: pg18: Virtual generated columns are not (yet) safe when superuser selects from them