Home > mailing lists

Re: [POC] verifying UTF-8 using SIMD instructions - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: [POC] verifying UTF-8 using SIMD instructions
Date	February 15, 2021 13:18:09
Msg-id	adf8e27e-4729-007c-2e10-852202128ac9@iki.fi Whole thread Raw
In response to	Re: [POC] verifying UTF-8 using SIMD instructions (John Naylor <john.naylor@enterprisedb.com>)
Responses	Re: [POC] verifying UTF-8 using SIMD instructions
List	pgsql-hackers

Tree view

On 13/02/2021 03:31, John Naylor wrote:
> On Mon, Feb 8, 2021 at 6:17 AM Heikki Linnakangas <hlinnaka@iki.fi 
> <mailto:hlinnaka@iki.fi>> wrote:
>  >
>  > I also tested the fallback implementation from the simdjson library
>  > (included in the patch, if you uncomment it in simdjson-glue.c):
>  >
>  >   mixed | ascii
>  > -------+-------
>  >     447 |    46
>  > (1 row)
>  >
>  > I think we should at least try to adopt that. At a high level, it looks
>  > pretty similar your patch: you load the data 8 bytes at a time, check if
>  > there are all ASCII. If there are any non-ASCII chars, you check the
>  > bytes one by one, otherwise you load the next 8 bytes. Your patch should
>  > be able to achieve the same performance, if done right. I don't think
>  > the simdjson code forbids \0 bytes, so that will add a few cycles, but
>  > still.
> 
> Attached is a patch that does roughly what simdjson fallback did, except 
> I use straight tests on the bytes and only calculate code points in 
> assertion builds. In the course of doing this, I found that my earlier 
> concerns about putting the ascii check in a static inline function were 
> due to my suboptimal loop implementation. I had assumed that if the 
> chunked ascii check failed, it had to check all those bytes one at a 
> time. As it turns out, that's a waste of the branch predictor. In the v2 
> patch, we do the chunked ascii check every time we loop. With that, I 
> can also confirm the claim in the Lemire paper that it's better to do 
> the check on 16-byte chunks:
> 
> (MacOS, Clang 10)
> 
> master:
> 
>   chinese | mixed | ascii
> ---------+-------+-------
>      1081 |   761 |   366
> 
> v2 patch, with 16-byte stride:
> 
>   chinese | mixed | ascii
> ---------+-------+-------
>       806 |   474 |    83
> 
> patch but with 8-byte stride:
> 
>   chinese | mixed | ascii
> ---------+-------+-------
>       792 |   490 |   105
> 
> I also included the fast path in all other multibyte encodings, and that 
> is also pretty good performance-wise.

Cool.

> It regresses from master on pure 
> multibyte input, but that case is still faster than PG13, which I 
> simulated by reverting 6c5576075b0f9 and b80e10638e3:

I thought the "chinese" numbers above are pure multibyte input, and it 
seems to do well on that. Where does it regress? In multibyte encodings 
other than UTF-8? How bad is the regression?

I tested this on my first generation Raspberry Pi (chipmunk). I had to 
tweak it a bit to make it compile, since the SSE autodetection code was 
not finished yet. And I used generate_series(1, 1000) instead of 
generate_series(1, 10000) in the test script (mbverifystr-speed.sql) 
because this system is so slow.

master:

  mixed | ascii
-------+-------
   1310 |  1041
(1 row)

v2-add-portability-stub-and-new-fallback.patch:

  mixed | ascii
-------+-------
   2979 |   910
(1 row)

I'm guessing that's because the unaligned access in check_ascii() is 
expensive on this platform.

- Heikki

pgsql-hackers by date:

From: Daniel Gustafsson
Date: 15 February 2021, 13:02:02
Subject: Re: Online checksums patch - once again

From: Peter Eisentraut
Date: 15 February 2021, 13:20:14
Subject: Re: snowball update

Re: [POC] verifying UTF-8 using SIMD instructions - Mailing list pgsql-hackers

Previous

Next