fixing bookindex.html bloat - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | fixing bookindex.html bloat |
Date | |
Msg-id | 20220213201618.qz6p6noon3wagr3f@alap3.anarazel.de Whole thread Raw |
Responses |
Re: fixing bookindex.html bloat
Re: fixing bookindex.html bloat |
List | pgsql-hackers |
Hi, Sometime last year I was surprised to see (not on a public list unfortunately) that bookindex.html is 657kB, with > 200kB just being repetitions of xmlns="http://www.w3.org/1999/xhtml" xmlns:xlink="http://www.w3.org/1999/xlink" Reminded of this, due to a proposal to automatically generate docs as part of cfbot runs (which'd be fairly likely to update bookindex.html), I spent a few painful hours last night trying to track this down. The reason for the two xmlns= are different. The xmlns="http://www.w3.org/1999/xhtml" is afaict caused by confusion on our part. Some of our stylesheets use xmlns="http://www.w3.org/TR/xhtml1/transitional" others use xmlns="http://www.w3.org/1999/xhtml" It's noteworthy that the docbook xsl stylesheets end up with <html xmlns="http://www.w3.org/1999/xhtml"> so it's a bit pointless to reference http://www.w3.org/TR/xhtml1/transitional afaict. Adding xmlns="http://www.w3.org/1999/xhtml" to stylesheet-html-common.xsl gets rid of xmlns="http://www.w3.org/TR/xhtml1/transitional" in bookindex specific content. Changing stylesheet.xsl from transitional to http://www.w3.org/1999/xhtml gets rid of xmlns="http://www.w3.org/TR/xhtml1/transitional" in navigation/footer. Of course we should likely change all http://www.w3.org/TR/xhtml1/transitional references, rather than just the one necessary to get rid of the xmlns= spam. So far, so easy. It took me way longer to understand what's causing the all the xmlns:xlink= appearances. For a long time I was misdirected because if I remove the <xsl:template name="generate-basic-index"> in stylesheet-html-common.xsl, the number of xmlns:xlink drastically reduces to a handful. Which made me think that their existance is somehow our fault. And I tried and tried to find the cause. But it turns out that this originally is caused by a still existing buglet in the docbook xsl stylesheets, specifically autoidx.xsl. It doesn't omit xlink in exclude-result-prefixes, but uses ids etc from xlink. The reason that we end up with so many more xmlns:xlink is just that without our customization there ends up being a single <div xmlns:xlink="http://www.w3.org/1999/xlink" class="index"> and then everything below that doesn't need the xmlns:xlink anymore. But because stylesheet-html-common.xsl emits the div, the xmlns:xlink is emitted for each element that autoidx.xsl has "control" over. Waiting for docbook to fix this seems a bit futile, I eventually found a bugreport about this, from 2016: https://sourceforge.net/p/docbook/bugs/1384/ But we can easily reduce the "impact" of the issue, by just adding a single xmlns:xlink to <div class="index">, which is sufficient to convince xsltproc to not repeat it. Before: -rw-r--r-- 1 andres andres 683139 Feb 13 04:31 html-broken/bookindex.html After: -rw-r--r-- 1 andres andres 442923 Feb 13 12:03 html/bookindex.html While most of the savings are in bookindex, the rest of the files are reduced by another ~100kB. WIP patch attached. For now I just adjusted the minimal set of xmlns="http://www.w3.org/TR/xhtml1/transitional", but I think we should update all. Greetings, Andres Freund
Attachment
pgsql-hackers by date: