Thread: Broken links in mailinglist archive due to percent-encoding
I've just send a mail [1] to pgsql-general and the mailinglist archive shows a broken link [2]. I included the correct link [3] in my message and also received my message with the correct link from the list. It looks like the archive percent-encodes subcomponent delimiters in the query component. Perhaps the encoding is allowed and it's just git.postgresql.org that can't handle it. But I'm pretty sure that links to git.postgresql.org from the archive worked in the past. [1] https://www.postgresql.org/message-id/1550267563.330669.1693335893138%40office.mailbox.org [2] https://git.postgresql.org/gitweb/?p=postgresql.git%3Ba%3Dblob%3Bf%3Dsrc%2Fbin%2Fpsql%2Fdescribe.c%3Bh%3Dbac94a338cfbc497200f0cf960cbabce2dadaa33%3Bhb%3D9b581c53418666205938311ef86047aa3c6b741f#l1149 [3] https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/bin/psql/describe.c;h=bac94a338cfbc497200f0cf960cbabce2dadaa33;hb=9b581c53418666205938311ef86047aa3c6b741f#l1420 -- Erik
On 29/08/2023 21:38 CEST Erik Wienhold <ewie@ewie.name> wrote: > It looks like the archive percent-encodes subcomponent delimiters in the query > component. Perhaps the encoding is allowed and it's just git.postgresql.org > that can't handle it. But I'm pretty sure that links to git.postgresql.org > from the archive worked in the past. I've been digging around a bit more because this is an odd bug. Turns out it's the result of applying Django's urlize filter to the message body [1]: >>> from django.template.defaultfilters import urlize >>> urlize('http://example.net/foo?bar=baz;abc=123') '<a href="http://example.net/foo?bar=baz%3Babc%3D123" rel="nofollow">http://example.net/foo?bar=baz;abc=123</a>' Looks like a bug in Django because it does not percent-encode any sub-delimiters outside the query component: >>> urlize('http://example.net/foo;bar=baz') '<a href="http://example.net/foo;bar=baz" rel="nofollow">http://example.net/foo;bar=baz</a>' And regarding git.postgresql.org: gitweb generates URLs with semicolon as the separator of query pairs [2] instead of using ampersand, although semicolon is no longer recommended by W3C. But gitweb also handles query components with ampersand instead of semicolon. Which means that links [1] and [3] work after I've manually replaced all semicolons with ampersands. [1] https://git.postgresql.org/gitweb/?p=pgarchives.git&a=blob&f=django/archives/mailarchives/templates/_message.html&h=c90a80afea418fc4800ae81bb517978fa56f7a4d&hb=HEAD#l64 [2] https://git.kernel.org/pub/scm/git/git.git/tree/gitweb/gitweb.perl#n1505 [3] https://git.postgresql.org/gitweb/?p=postgresql.git&a=blob&f=src/bin/psql/describe.c&h=bac94a338cfbc497200f0cf960cbabce2dadaa33&hb=9b581c53418666205938311ef86047aa3c6b741f#l1420 -- Erik