Re: [POC] verifying UTF-8 using SIMD instructions - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: [POC] verifying UTF-8 using SIMD instructions |
Date | |
Msg-id | adf8e27e-4729-007c-2e10-852202128ac9@iki.fi Whole thread Raw |
In response to | Re: [POC] verifying UTF-8 using SIMD instructions (John Naylor <john.naylor@enterprisedb.com>) |
Responses |
Re: [POC] verifying UTF-8 using SIMD instructions
|
List | pgsql-hackers |
On 13/02/2021 03:31, John Naylor wrote: > On Mon, Feb 8, 2021 at 6:17 AM Heikki Linnakangas <hlinnaka@iki.fi > <mailto:hlinnaka@iki.fi>> wrote: > > > > I also tested the fallback implementation from the simdjson library > > (included in the patch, if you uncomment it in simdjson-glue.c): > > > > mixed | ascii > > -------+------- > > 447 | 46 > > (1 row) > > > > I think we should at least try to adopt that. At a high level, it looks > > pretty similar your patch: you load the data 8 bytes at a time, check if > > there are all ASCII. If there are any non-ASCII chars, you check the > > bytes one by one, otherwise you load the next 8 bytes. Your patch should > > be able to achieve the same performance, if done right. I don't think > > the simdjson code forbids \0 bytes, so that will add a few cycles, but > > still. > > Attached is a patch that does roughly what simdjson fallback did, except > I use straight tests on the bytes and only calculate code points in > assertion builds. In the course of doing this, I found that my earlier > concerns about putting the ascii check in a static inline function were > due to my suboptimal loop implementation. I had assumed that if the > chunked ascii check failed, it had to check all those bytes one at a > time. As it turns out, that's a waste of the branch predictor. In the v2 > patch, we do the chunked ascii check every time we loop. With that, I > can also confirm the claim in the Lemire paper that it's better to do > the check on 16-byte chunks: > > (MacOS, Clang 10) > > master: > > chinese | mixed | ascii > ---------+-------+------- > 1081 | 761 | 366 > > v2 patch, with 16-byte stride: > > chinese | mixed | ascii > ---------+-------+------- > 806 | 474 | 83 > > patch but with 8-byte stride: > > chinese | mixed | ascii > ---------+-------+------- > 792 | 490 | 105 > > I also included the fast path in all other multibyte encodings, and that > is also pretty good performance-wise. Cool. > It regresses from master on pure > multibyte input, but that case is still faster than PG13, which I > simulated by reverting 6c5576075b0f9 and b80e10638e3: I thought the "chinese" numbers above are pure multibyte input, and it seems to do well on that. Where does it regress? In multibyte encodings other than UTF-8? How bad is the regression? I tested this on my first generation Raspberry Pi (chipmunk). I had to tweak it a bit to make it compile, since the SSE autodetection code was not finished yet. And I used generate_series(1, 1000) instead of generate_series(1, 10000) in the test script (mbverifystr-speed.sql) because this system is so slow. master: mixed | ascii -------+------- 1310 | 1041 (1 row) v2-add-portability-stub-and-new-fallback.patch: mixed | ascii -------+------- 2979 | 910 (1 row) I'm guessing that's because the unaligned access in check_ascii() is expensive on this platform. - Heikki
pgsql-hackers by date: