Re: Speed up COPY FROM text/CSV parsing using SIMD - Mailing list pgsql-hackers

From KAZAR Ayoub
Subject Re: Speed up COPY FROM text/CSV parsing using SIMD
Date
Msg-id CA+K2Rump8NoMRZRZ2r4jHXUJwByasy_c3_b0oaO+TLkSbMD-jw@mail.gmail.com
Whole thread Raw
In response to Re: Speed up COPY FROM text/CSV parsing using SIMD  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Speed up COPY FROM text/CSV parsing using SIMD
List pgsql-hackers
Hello,
On Wed, Nov 19, 2025 at 10:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Tue, Nov 18, 2025 at 05:20:05PM +0300, Nazir Bilal Yavuz wrote:
> Thanks, done.

I took a look at the v3 patches.  Here are my high-level thoughts:

+    /*
+     * Parse data and transfer into line_buf. To get benefit from inlining,
+     * call CopyReadLineText() with the constant boolean variables.
+     */
+    if (cstate->simd_continue)
+        result = CopyReadLineText(cstate, is_csv, true);
+    else
+        result = CopyReadLineText(cstate, is_csv, false);

I'm curious whether this actually generates different code, and if it does,
if it's actually faster.  We're already branching on cstate->simd_continue
here.
I've compiled both versions with -O2 and confirmed they generate different code. When simd_continue is passed as a constant to CopyReadLineText, the compiler optimizes out the condition checks from the SIMD path. 
A small benchmark on a 1GB+ file shows the expected benefit which is around 6% performance improvement.
I've attached the assembly outputs 
in case someone wants to check something else.


Regards,
Ayoub Kazar
Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Next
From: jian he
Date:
Subject: Re: transformJsonFuncExpr pathspec cache lookup failed