Re: PATCH: Allow empty targets in unaccent dictionary - Mailing list pgsql-hackers
From | Abhijit Menon-Sen |
---|---|
Subject | Re: PATCH: Allow empty targets in unaccent dictionary |
Date | |
Msg-id | 20140625052012.GB28445@toroid.org Whole thread Raw |
In response to | PATCH: Allow empty targets in unaccent dictionary (Mohammad Alhashash <alhashash@alhashash.net>) |
Responses |
Re: PATCH: Allow empty targets in unaccent dictionary
Re: PATCH: Allow empty targets in unaccent dictionary |
List | pgsql-hackers |
Hi. At 2014-04-20 01:06:43 +0200, alhashash@alhashash.net wrote: > > To use unaccent dictionary for these languages, we need to allow empty > targets to remove diacritics instead of replacing them. Your patch should definitely add a test case or two to sql/unaccent.sql and expected/unaccent.out showing the behaviour that didn't work before the change. > The attached patch modfies unaacent.c so that dictionary parser uses > zero-length target when the line has no target. The basic idea seems sensible. > diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c > old mode 100644 > new mode 100755 > index a337df6..4e72829 > --- a/contrib/unaccent/unaccent.c > +++ b/contrib/unaccent/unaccent.c > @@ -58,7 +58,9 @@ placeChar(TrieChar *node, unsigned char *str, int lenstr, char *replaceTo, int r > { > curnode->replacelen = replacelen; > curnode->replaceTo = palloc(replacelen); > - memcpy(curnode->replaceTo, replaceTo, replacelen); > + /* palloc(0) returns a valid address, not NULL */ > + if (replaceTo) /* memcpy() is undefined for NULL pointers*/ > + memcpy(curnode->replaceTo, replaceTo, replacelen); > } > } I think these comments confuse the issue, and should be removed. In fact, I think this part of the code can remain unchanged (see below). > @@ -105,10 +107,10 @@ initTrie(char *filename) > while ((line = tsearch_readline(&trst)) != NULL) > { > /* > - * The format of each line must be "src trg" where src and trg > + * The format of each line must be "src [trg]" where src and trg > * are sequences of one or more non-whitespace characters, > * separated by whitespace. Whitespace at start or end of > - * line is ignored. > + * line is ignored. If no trg added, a zero-length string is used. > */ > int state; I suggest "If trg is empty, a zero-length string is used" for the last sentence. > @@ -160,6 +162,13 @@ initTrie(char *filename) > } > } > > + /* if no trg (loop stops at state 1 or 2), use zero-length target */ > + if (state == 1 || state == 2) > + { > + trglen = 0; > + state = 5; > + } If I understand the code correctly, "src" alone will leave state == 1, and "src " will leave state == 2, and in both cases trg and trglen will be unchanged (from NULL and 0 respectively). In that case, I think it would be clearer to do something like this: char *trg = ""; … /* It's OK to have a valid src and empty trg. */ if (state > 0 && trglen == 0) state = 5; That way, you don't have the NULL pointer, and you don't have to add a NULL-pointer test in placeChar() above. What do you think? If you submit a revised patch along these lines, I'll mark it ready for committer. Thank you. -- Abhijit
pgsql-hackers by date: