Re: New Chinese FAQ - Mailing list pgsql-www
From | Bruce Momjian |
---|---|
Subject | Re: New Chinese FAQ |
Date | |
Msg-id | 200505170312.j4H3CRY25080@candle.pha.pa.us Whole thread Raw |
In response to | Re: New Chinese FAQ ("Magnus Hagander" <mha@sollentuna.net>) |
Responses |
Re: New Chinese FAQ
|
List | pgsql-www |
Magnus Hagander wrote: > >> ok, I'll fix the html tag problem ASAP. > >> > > > >I fixed the tag problem and it now verifies fine: > > > > > >http://validator.w3.org/check?uri=http%3A%2F%2Fwwwmaster.postgr > >esql.org%2Fdocs%2Ffaqs.FAQ_chinese.html&charset=gb2312+%28Chine > se%2C+simplified%29 > > > >The only problem reported is that it says the encoding is incorrect for > >a large number of lines. The above encoding forces it to be > >gb2312. If > >I make it Unicode I get even more failures. However, I remember iconv > >doing the conversion to UTF8 just fine, so maybe something is > >wrong with > >how we are validating it. > > The output should be UTF8, and it should autodetect it. The output from > the *website* should *not* validate as gb2312, because it is no longer > in that encoding. > > The reason that's the only error you get may be that it doesn't validate > the document because of encoding errors. So this doesn't prove (or > disprove for that matter) that the tags are fixed. > Yes, I was using the doctype of HTML 4.0 when I tested, but when it was on the web site only then was XHTML Traditional tested. > >Anyway, the HTML is OK so it seems we just have encoding issue now. > >The current version in CVS is all fixed up so please submit updates > >based on that version. Thanks. > > I'm sorry to say, but it's invalid characters in it again :-( > On svr2: > svr2# iconv -f gb2312 -t utf-8 FAQ_chinese.html >/dev/null > iconv: FAQ_chinese.html: cannot convert > > > On developer.pgadmin.org: > mha@developer:~/ext/faqs$ iconv -f gb2312 -t utf-8 FAQ_chinese.html -o > /dev/null > iconv: illegal input sequence at position 8182 > > > Could it be cvs that messes the encoding up? Can you mail me the file as > you see it before you commit and I can see if that makes a difference? > The problem is that the document is clearly not XHTML, but when I use htmltidy -raw -asxhtml to convert it to XHTML, it somehow messes up the encodings and then iconv fails. So, I either have to manually fix the HTML file to be XHTML, or I have to figure out why htmltidy is changing the encoded text even though I am using -raw. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073