Re: [PROPOSAL] Improvements of Hunspell dictionaries support - Mailing list pgsql-hackers
From | Alvaro Herrera |
---|---|
Subject | Re: [PROPOSAL] Improvements of Hunspell dictionaries support |
Date | |
Msg-id | 20160108230434.GA652291@alvherre.pgsql Whole thread Raw |
In response to | Re: [PROPOSAL] Improvements of Hunspell dictionaries support (Artur Zakirov <a.zakirov@postgrespro.ru>) |
Responses |
Re: [PROPOSAL] Improvements of Hunspell dictionaries
support
|
List | pgsql-hackers |
Artur Zakirov wrote: > *** 77,83 **** typedef struct spell_struct > > typedef struct aff_struct > { > ! uint32 flag:8, > type:1, > flagflags:7, > issimple:1, > --- 74,80 ---- > > typedef struct aff_struct > { > ! uint32 flag:16, > type:1, > flagflags:7, > issimple:1, By doing this, you're using 40 bits of a 32-bits-wide field. What does this mean? Are the final 8 bits lost? Does the compiler allocate a second uint32 member for those additional bits? I don't know, but I don't think this is a very clean idea. > typedef struct spell_struct > { > ! union > { > ! /* > ! * flag is filled in by NIImportDictionary. After NISortDictionary, d > ! * is valid and flag is invalid. > ! */ > ! char flag[MAXFLAGLEN]; > ! struct > ! { > ! int affix; > ! int len; > ! } d; > ! } p; > char word[FLEXIBLE_ARRAY_MEMBER]; > } SPELL; > > --- 57,72 ---- > > typedef struct spell_struct > { > ! struct > { > ! int affix; > ! int len; > ! } d; > ! /* > ! * flag is filled in by NIImportDictionary. After NISortDictionary, d > ! * is valid and flag is invalid. > ! */ > ! char *flag; > char word[FLEXIBLE_ARRAY_MEMBER]; > } SPELL; Here you removed the union, with no rationale for doing so. Why did you do it? Can it be avoided? Because of the comment, I'd imagine that d and flag are valid at different times, so at any time we care about only one of them; but you haven't updated the comment stating the reason for that no longer to be the case. I suspect you need to keep flag valid after NISortDictionary has been called, but if so why? If "flag" is invalid as the comment says, what's the reason for keeping it? The routines decodeFlag and isAffixFlagInUse could do with more comments. Your patch adds zero. Actually the whole file has not nearly enough comments; adding some more would be very good. Actually, after some more reading, I think this code is pretty terrible. I have a hard time figuring out how the original works, which makes it even more difficult to figure out whether your changes make sense. I would have to take your patch on faith, which doesn't sound so great an idea. palloc / cpalloc / tmpalloc make the whole mess even more confusing. Why does this file have three ways to allocate memory? Not sure what's a good way to go about this. I am certainly not going to commit this as is, because if I do whatever bugs you have are going to become my problem; and with the severe lack of documentation and given how fiddly this stuff is, I bet there are going to be a bunch of bugs. I suspect most committers are going to be in the same position. I think you should start by adding a few comments here and there on top of the original to explain how it works, then your patch on top. I suppose it's going to be a lot of work for you but I don't see any other way. A top-level overview about it would be good, too. The current comment at top of file states: * spell.c* Normalizing word with ISpell which is, err, somewhat laconic. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: