Re: Speed up COPY FROM text/CSV parsing using SIMD - Mailing list pgsql-hackers
| From | Manni Wood |
|---|---|
| Subject | Re: Speed up COPY FROM text/CSV parsing using SIMD |
| Date | |
| Msg-id | CAKWEB6o2WgXAJyWU-wdMOrv=VD26Hs3NutANhEcJbKbKWpEXEg@mail.gmail.com Whole thread Raw |
| In response to | Re: Speed up COPY FROM text/CSV parsing using SIMD (Mark Wong <markwkm@gmail.com>) |
| List | pgsql-hackers |
On Tue, Jan 13, 2026 at 1:12 PM Mark Wong <markwkm@gmail.com> wrote:
On Fri, Jan 09, 2026 at 05:21:45PM +0300, Nazir Bilal Yavuz wrote:
> Were you able to understand why Mark's benchmark results are different
> from ours?
Not yet... I had some guesses, which is why I suggested the processor pinning
and using a ramdisk. But we haven't tried applying all of those to my laptop,
which has 3 core types, or the POWER system, which may be interesting to use a
ram disk on.
I'm curious though, and admittedly haven't tried looking myself yet, about how
the SIMD calls might look across different processor architectures. We'll try
to get that on the POWER system soon...
Regards,
Mark
--
Mark Wong <markwkm@gmail.com>
EDB https://enterprisedb.com
Hello!
Nazir, I'm glad you are finding the benchmarks useful. I have more! :-)
All of these benchmarks are all-in-RAM, because I do think that is the best way of getting closest to the theoretical best and worst case scenarios.
My laptop:
master: (852558b9)
text, no special: 14996
text, 1/3 special: 17270
csv, no special: 18274
csv, 1/3 special: 23852
v3
text, no special: 11282 (24.7% speedup)
text, 1/3 special: 15748 (8.8% speedup) <-- I don't believe this but it's what I got
csv, no special: 11571 (36.6% speedup)
csv, 1/3 special: 19934 (16.4% speedup) <-- I don't believe this but it's what I got
v4.2
text, no special: 11139 (25.7% speedup)
text, 1/3 special: 18900 (9.4% regression)
csv, no special: 11490 (37.1% speedup)
csv, 1/3 special: 26134 (9.5% regression)
text, no special: 14996
text, 1/3 special: 17270
csv, no special: 18274
csv, 1/3 special: 23852
v3
text, no special: 11282 (24.7% speedup)
text, 1/3 special: 15748 (8.8% speedup) <-- I don't believe this but it's what I got
csv, no special: 11571 (36.6% speedup)
csv, 1/3 special: 19934 (16.4% speedup) <-- I don't believe this but it's what I got
v4.2
text, no special: 11139 (25.7% speedup)
text, 1/3 special: 18900 (9.4% regression)
csv, no special: 11490 (37.1% speedup)
csv, 1/3 special: 26134 (9.5% regression)
An AWS EC2 t2.2xlarge instance
master: (852558b9)
text, no special: 20677
text, 1/3 special: 22660
csv, no special: 24534
csv, 1/3 special: 30999
v3
text, no special: 17534 (15.2% speedup)
text, 1/3 special: 22816 (0.6% regression)
csv, no special: 17664 (28.0% speedup)
csv, 1/3 special: 29338 (5.3% speedup) <-- I don't believe this but it's what I got
v4.2
text, no special: 17459 (15.5% speedup)
text, 1/3 special: 25051 (10.5% regression)
csv, no special: 17574 (28.3% speedup)
csv, 1/3 special: 32092 (3.5% regression)
text, no special: 20677
text, 1/3 special: 22660
csv, no special: 24534
csv, 1/3 special: 30999
v3
text, no special: 17534 (15.2% speedup)
text, 1/3 special: 22816 (0.6% regression)
csv, no special: 17664 (28.0% speedup)
csv, 1/3 special: 29338 (5.3% speedup) <-- I don't believe this but it's what I got
v4.2
text, no special: 17459 (15.5% speedup)
text, 1/3 special: 25051 (10.5% regression)
csv, no special: 17574 (28.3% speedup)
csv, 1/3 special: 32092 (3.5% regression)
An AWS EC2 t4g.2xlarge instance (using ARM processor; first test of ARM processor!)
master: (852558b9)
text, no special: 22081
text, 1/3 special: 25100
csv, no special: 27296
csv, 1/3 special: 32344
v3
text, no special: 17724 (19.7% speedup)
text, 1/3 special: 27606 (9.9% regression) <-- yikes! We would want to test this more
csv, no special: 17597 (35.5% speedup)
csv, 1/3 special: 32597 (0.8% regression)
v4.2
text, no special: 17674 (20% speedup)
text, 1/3 special: 25773 (2.6% regression) <-- this regression is less than for the v3 patch? Atypical...
csv, no special: 17651 (35.3% speedup)
csv, 1/3 special: 34055 (5.3% regression)
text, no special: 22081
text, 1/3 special: 25100
csv, no special: 27296
csv, 1/3 special: 32344
v3
text, no special: 17724 (19.7% speedup)
text, 1/3 special: 27606 (9.9% regression) <-- yikes! We would want to test this more
csv, no special: 17597 (35.5% speedup)
csv, 1/3 special: 32597 (0.8% regression)
v4.2
text, no special: 17674 (20% speedup)
text, 1/3 special: 25773 (2.6% regression) <-- this regression is less than for the v3 patch? Atypical...
csv, no special: 17651 (35.3% speedup)
csv, 1/3 special: 34055 (5.3% regression)
Yes, I think I agree with you that the everything-in-RAM benchmarks will make the regressions more pronounced, just like the everything-in-RAM benchmarks make the improvements more pronounced.
I am not sure why the CSV regression, compared to the TXT regression (even for the v3 patch which has smaller regressions than the v4.2 patch) is usually worse. I probably should look over some flame graphs and see if I can find the place where the CSV-parsing code is so much slower. The CSV regression is actually a bit frustrating (at around 5%) because the TXT regression, at less than 1% (for the v3 patch) is so much easier to bare.
Here are some copy-to benchmarks for the v4 patch that applies SIMD to the copy-to code.
These were all-in-RAM tests.
My laptop
master: (852558b9)
text, no special: 2948
text, 1/3 special: 11258
csv, no special: 6245
csv, 1/3 special: 11258
v4 (copy to)
text, no special: 2126 (27.9% speedup)
text, 1/3 special: 12080 (7.3% regression) <-- did not see such a big regression before
csv, no special: 2432 (61.0% speedup)
csv, 1/3 special: 12344 (4.0% regression) <-- did not see such a big regression before
text, no special: 2948
text, 1/3 special: 11258
csv, no special: 6245
csv, 1/3 special: 11258
v4 (copy to)
text, no special: 2126 (27.9% speedup)
text, 1/3 special: 12080 (7.3% regression) <-- did not see such a big regression before
csv, no special: 2432 (61.0% speedup)
csv, 1/3 special: 12344 (4.0% regression) <-- did not see such a big regression before
An AWS EC2 t2.2xlarge instance
master: (852558b9)
text, no special: 4647
text, 1/3 special: 13865
csv, no special: 5421
csv, 1/3 special: 15284
v4 (copy to)
text, no special: 2460 (47.0% speedup)
text, 1/3 special: 14023 (1.1% regression)
csv, no special: 2667 (50.7% speedup)
csv, 1/3 special: 15251 (0.2% speedup)
text, no special: 4647
text, 1/3 special: 13865
csv, no special: 5421
csv, 1/3 special: 15284
v4 (copy to)
text, no special: 2460 (47.0% speedup)
text, 1/3 special: 14023 (1.1% regression)
csv, no special: 2667 (50.7% speedup)
csv, 1/3 special: 15251 (0.2% speedup)
An AWS EC2 t4g.2xlarge instance (using ARM processor; first test of ARM processor!)
master: (852558b9)
text, no special: 6951
text, 1/3 special: 17857
csv, no special: 7951
csv, 1/3 special: 18504
v4 (copy to)
text, no special: 3372 (51.4% speedup)
text, 1/3 special: 15713 (12.0% speedup)
csv, no special: 3233 (59.3% speedup)
csv, 1/3 special: 1622 (12.3% speedup)
text, no special: 6951
text, 1/3 special: 17857
csv, no special: 7951
csv, 1/3 special: 18504
v4 (copy to)
text, no special: 3372 (51.4% speedup)
text, 1/3 special: 15713 (12.0% speedup)
csv, no special: 3233 (59.3% speedup)
csv, 1/3 special: 1622 (12.3% speedup)
Once again, the v4 patch for copy-to seems like a clearer win, though, to be fair, there were regressions when running on my laptop. (I'm starting to think servers or desktops are better than laptops for testing these things, though maybe that's my bias: it just seems like the server results are always less surprising.)
Hope you all continue to find these useful...
-- Manni Wood EDB: https://www.enterprisedb.com
pgsql-hackers by date: