Re: [HACKERS] SCRAM authentication, take three - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: [HACKERS] SCRAM authentication, take three |
Date | |
Msg-id | CAB7nPqRcKJKmXTbV7og_A-ssW4JSQj6+vLE-tYKpwbOS-dqR8A@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] SCRAM authentication, take three (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: [HACKERS] SCRAM authentication, take three
|
List | pgsql-hackers |
On Tue, Feb 7, 2017 at 11:28 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > Yes, I am actively working on this one now. I am trying to come up > first with something in the shape of an extension to begin with, and > get a patch out of it. That will be more simple for testing. For now > the work that really remains in the patches attached on this thread is > to get the internal work done, all the UTF8-related routines being > already present in scram-common.c to work on the strings. It took me a couple of days... And attached is the prototype implementing SASLprep(), or NFKC if you want for UTF-8 strings. This extension is pretty useful to check the validity of any normalization forms defined in the Unicode specs, and is as well on my github: https://github.com/michaelpq/pg_plugins/tree/master/pg_sasl_prepare In short, at build what this extension does is fetching UnicodeData.txt, and it builds a conversion table including the fields necessary for NFKC: - The utf code of a character. - The combining class of the character. - The decomposition sets of characters. I am aware of the fact that the implemention of this extension is the worst possible, there are many bytes wasted, and the resulting shared library gets at 2.4MB. Now as an example of how normalization forms work that's a great study, and with that I am now aware of what needs to be done to reduce the size of the conversion tables. This extension has two conversion functions for UTF-8 string <=> integer array (as UTF-8 characters are on 4 bytes with pg_wchar): =# SELECT array_to_utf8('{50064}'); array_to_utf8 --------------- Ð (1 row) =# SELECT utf8_to_array('ÐÐÐ'); utf8_to_array --------------------- {50064,50064,50064} (1 row) Then using an integer array in input SASLprep() can be done using pg_sasl_prepare(), which returns to caller a decomposed recursively set, with reordering done using the combining class integer array from the conversion table generated wiht UnicodeData.txt. Lookups at the conversion tables are done using bsearch(), so that's, at least I guess, fast. I have implemented as well a function to query the whole conversion as a SRF (character number, combining class and decomposition), which is useful for analysis: =# select count(*) from utf8_conv_table(); count ------- 30590 (1 row) Now using this module I have arrived to the following conclusions to put to a minimum the size of the conversion tables, without much impacting lookup performance: - There are 24k characters with a combining class of 0 and no decomposition for a total of 30k characters, those need to be dropped from the conversion table. - Most characters have a single, or double decomposition, one has a decomposition of 18 characters. So we need to create two sets of conversion tables: -- A base table, with the character number (4 bytes), the combining class (1 byte) and the size of the decomposition (1 byte). -- A set of decomposition tables, classified by size. So when decomposing a character, we check first the size of the decomposition, then get the set from the correct table. Now regarding the shape of the implementation for SCRAM, we need one thing: a set of routines in src/common/ to build decompositions for a given UTF-8 string with conversion UTF8 string <=> pg_wchar array, the decomposition and the reordering. The extension attached roughly implements that. What we can actually do as well is have in contrib/ a module that does NFK[C|D] using the base APIs in src/common/. Using arrays of pg_wchar (integers) to manipulate the characters, we can validate and have a set of regression tests that do *not* have to print non-ASCII characters. Let me know if this plan looks good, now I think that I have enough material to get SASLprep done cleanly, with a minimum memory footprint. Heikki, others, how does that sound for you? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: