Thread: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
>I do understand the problem, but don't undertstand the decision you >guys made. The fact that UPPER/LOWER and some other functions does not >work in win32 is surely a problem for some languages, but not a >problem for otheres. For example, Japanese (and probably Chinese and >Korean) does not have a concept upper/lower. So the fact UPPER/LOWER >does not work with UTF-8/win32 is not problem for Japanese (and for >some other languages). Just using C locale with UTF-8 is enough in >this case. The main issue is not with upper/lower, it's with ORDER BY (and doesn't that affect indexes as well). This affects Japanese as well, no? I didn't consider the C locale. Do you know for a fact that it works there on win32 as well, or is that an assumption? (I don't know either way) >In summary, I think you guys are going to overkill the multibyte >support functionality on UTF-8/win32 because of the fact that some >langauges do not work. I was under the impression that *no* languages worked. If some do work, then we definitly should not kill it. It would be good to have some way of detecting if it worked or not at the time of creation of the database. But I have no idea on how to do that in a reasonable way. //Magnus
"Magnus Hagander" <mha@sollentuna.net> writes: > I didn't consider the C locale. Do you know for a fact that it works > there on win32 as well, or is that an assumption? It should work. The only use of strcoll() in the backend is in varstr_cmp which uses strncmp() instead for C locale. Lack of working upper/lower is hardly a fatal objection, considering that we never had that for UTF8 before 8.0 anyway. But you do have to have working varstr_cmp. > It would be good to have some way of detecting if it worked or not at > the time of creation of the database. But I have no idea on how to do > that in a reasonable way. At this point I'd say that any combination of UTF8 encoding with a non C/POSIX locale probably isn't going to work on Windows. Tatsuo, do you know of other cases that will work? regards, tom lane
> "Magnus Hagander" <mha@sollentuna.net> writes: > > I didn't consider the C locale. Do you know for a fact that it works > > there on win32 as well, or is that an assumption? > > It should work. The only use of strcoll() in the backend is in > varstr_cmp which uses strncmp() instead for C locale. Lack of > working upper/lower is hardly a fatal objection, considering that > we never had that for UTF8 before 8.0 anyway. But you do have to > have working varstr_cmp. > > > It would be good to have some way of detecting if it worked or not at > > the time of creation of the database. But I have no idea on how to do > > that in a reasonable way. > > At this point I'd say that any combination of UTF8 encoding with a non > C/POSIX locale probably isn't going to work on Windows. Tatsuo, do you > know of other cases that will work? No. I think C is the only working locale. -- Tatsuo Ishii
> >I do understand the problem, but don't undertstand the decision you > >guys made. The fact that UPPER/LOWER and some other functions does not > >work in win32 is surely a problem for some languages, but not a > >problem for otheres. For example, Japanese (and probably Chinese and > >Korean) does not have a concept upper/lower. So the fact UPPER/LOWER > >does not work with UTF-8/win32 is not problem for Japanese (and for > >some other languages). Just using C locale with UTF-8 is enough in > >this case. > > The main issue is not with upper/lower, it's with ORDER BY (and doesn't > that affect indexes as well). This affects Japanese as well, no? As long as used with C locale, indexes should be ok. ORDER BY is not perfect but we can live with it. Since Japanese is an ideogram, we cannot rely on ORDER BY character codes to sort Japanese characters anyway. I believe same thing can be said to Chinese. > I didn't consider the C locale. Do you know for a fact that it works > there on win32 as well, or is that an assumption? (I don't know either > way) I have not tested 8.0 on win32, but I think it should work with C locale since I know PowerGres, which is based on 7.4, works. > >In summary, I think you guys are going to overkill the multibyte > >support functionality on UTF-8/win32 because of the fact that some > >langauges do not work. > > I was under the impression that *no* languages worked. If some do work, > then we definitly should not kill it. > > It would be good to have some way of detecting if it worked or not at > the time of creation of the database. But I have no idea on how to do > that in a reasonable way. -- Tatsuo Ishii