Thread: UTF-8 docs?
Just out of curiopusity, I wonder why we can't make the encoding of SGML docs to be UTF-8, rather than current ISO-8859-1. As long as everything is written in ASCII, the size of docs will be almost same even if UTF-8 is used. Plus, if the encoding is changed to UTF-8, it is very easy to translate the doc to local languages. As far as I know, al of local language docs under https://www.postgresql.org/docs/ are using UTF-8. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Mon, 22 Aug 2016 14:16:45 +0900 (JST) Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > Just out of curiopusity, I wonder why we can't make the encoding of > SGML docs to be UTF-8, rather than current ISO-8859-1. What a reason of "make the encoding of sgml docs" to be something? What actual change should be made and what problems it would solve? There are various translations of postgreSQL docs, and they use various encodings. Translated versions of docs on http://postgresql.org/docs are just links to external sites where translations are maintained. English documentation uses ISO-8859-1 (actually ASCII), Russian uses UTF-8 (you can download our source tarball from http://repo.postgrespro.ru/pgpro-9.5/src and see postgres source distribution with UTF-8 sgmls inside). Japanese documentation in HTML form is served from http://www.postgresql.jp/document/9.5/html/ in utf-8 too. I.e. everybody who need utf-8 to represent translation of documentation, already uses it. What exatly you proposes do be done? Really, what change we need, it is conversion from SGML to XML format. It would solve some real problems, such as ability to include diagrams in the docs, and also let everyone to explicitely specify encoding in XML declaration (and probably cause switch to UTF-8 as side effect, because most XML-based tools use UTF-8 as default).
> On Mon, 22 Aug 2016 14:16:45 +0900 (JST) > Tatsuo Ishii <ishii@sraoss.co.jp> wrote: > >> Just out of curiopusity, I wonder why we can't make the encoding of >> SGML docs to be UTF-8, rather than current ISO-8859-1. > > > What a reason of "make the encoding of sgml docs" to be something? > What actual change should be made and what problems it would solve? Problem is, the PostgreSQL docs is fixed to ISO-8859-1, and if I want to use other encoding, I need to change the build system, which is annoying. Ideally, if someone wants to use other than ISO-8859-1, then he/she just change the contents of sgml files. Just changing the ISO-8859-1 to UTF-8 will solve most problems. (Probably allowing to specify arbitrary encoding is better but it needs some work). > There are various translations of postgreSQL docs, and they use various > encodings. Translated versions of docs on http://postgresql.org/docs > are just links to external sites where translations are maintained. > English documentation uses ISO-8859-1 (actually ASCII), > Russian uses UTF-8 (you can download our source tarball from > http://repo.postgrespro.ru/pgpro-9.5/src and see postgres source > distribution with UTF-8 sgmls inside). > > Japanese documentation in HTML form is served from > http://www.postgresql.jp/document/9.5/html/ > in utf-8 too. > > I.e. everybody who need utf-8 to represent translation of > documentation, already uses it. > > What exatly you proposes do be done? See above. > Really, what change we need, it is conversion from SGML to XML format. > It would solve some real problems, such as ability to include diagrams > in the docs, and also let everyone to explicitely specify encoding in > XML declaration (and probably cause switch to UTF-8 as side effect, > because most XML-based tools use UTF-8 as default). That's another story. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On 8/22/16 1:16 AM, Tatsuo Ishii wrote: > Just out of curiopusity, I wonder why we can't make the encoding of > SGML docs to be UTF-8, rather than current ISO-8859-1. Encoding handling in DocBook SGML is weird, and making it work robustly will either fail or might be more work than just completing the conversion to XML. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On 8/22/16 1:16 AM, Tatsuo Ishii wrote: >> Just out of curiopusity, I wonder why we can't make the encoding of >> SGML docs to be UTF-8, rather than current ISO-8859-1. > > Encoding handling in DocBook SGML is weird, and making it work robustly > will either fail or might be more work than just completing the > conversion to XML. I don't know what kind of problem you are seeing with encoding handling, but at least UTF-8 is working for Japanese, French and Russian. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On 8/22/16 9:32 AM, Tatsuo Ishii wrote: > I don't know what kind of problem you are seeing with encoding > handling, but at least UTF-8 is working for Japanese, French and > Russian. Those translations are using DocBook XML. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On 8/22/16 9:32 AM, Tatsuo Ishii wrote: >> I don't know what kind of problem you are seeing with encoding >> handling, but at least UTF-8 is working for Japanese, French and >> Russian. > > Those translations are using DocBook XML. But in the mean time I can create UTF-8 HTML files like this: make html [snip] /bin/mkdir -p html SP_CHARSET_FIXED=1 SP_ENCODING=UTF-8 openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . -c /usr/share/sgml/docbook/stylesheet/dsssl/modular/catalog-d stylesheet.dsl -t sgml -i output-html -i include-index postgres.sgml Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Mon, 22 Aug 2016 10:53:25 -0400 Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 8/22/16 9:32 AM, Tatsuo Ishii wrote: > > I don't know what kind of problem you are seeing with encoding > > handling, but at least UTF-8 is working for Japanese, French and > > Russian. > > Those translations are using DocBook XML. > Russian translation by Postgres Professional does use DocBook SGML, although it uses xml as intermediate representation when applying gettext to the documentation. I've already posted URL where sources of postgresql with russian documentation in SGML format included can be downloaded.