Re: Batching in executor - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Batching in executor
Date
Msg-id CA+HiwqFY6zfTcMYiT8eJQQWp7DvtGJc88Q6xAqA8pF8mb7ic=w@mail.gmail.com
Whole thread Raw
In response to Re: Batching in executor  (Tomas Vondra <tomas@vondra.me>)
List pgsql-hackers
Hi,

On Mon, Sep 29, 2025 at 8:01 PM Tomas Vondra <tomas@vondra.me> wrote:
> I also tried running TPC-H. I don't have useful numbers yet, but I ran
> into a segfault - see the attached backtrace. It only happens with the
> batching, and only on Q22 for some reason. I initially thought it's a
> bug in clang, because I saw it with clang-22 built from git, and not
> with clang-14 or gcc. But since then I reproduced it with clang-19 (on
> debian 13). Still could be a clang bug, of course. I've seen ~20 of
> those segfaults so far, and the backtraces look exactly the same.

I can reproduce the Q22 segfault with clang-17 on macOS and the
attached patch 0009 fixes it.

The issue I observed is that two EEOPs both called the same helper,
and that helper re-peeked ExecExprEvalOp(op) to choose its path; in
this particular build the two EEOP cases in ExecInterpExpr() compiled
to identical code so their dispatch labels had the same address
(reverse_dispatch_table logging in ExecInitInterpreter() showed the
duplicate), and because ExecEvalStepOp() maps by label address the
reverse lookup could yield the other EEOP -- I saw ExprInit select
ROWLOOP EEOP while the ExprExec-time helper observed DIRECT EEOP and
ran code for it, which then crashed.

In 0009 (the fix), I split the helper into two functions, one per
EEOP, so the helper does not re-derive the opcode; with that change I
cannot reproduce the crash on macOS clang-17.

--
Thanks, Amit Langote

Attachment

pgsql-hackers by date:

Previous
From: Chao Li
Date:
Subject: Re: Incorrect version number given to sync_pgdata() in pg_combinebackup.c
Next
From: Chao Li
Date:
Subject: URLs in rbtree.c are broken