Thread: Broken links in mailinglist archive due to percent-encoding

Broken links in mailinglist archive due to percent-encoding

From
Erik Wienhold
Date:
I've just send a mail [1] to pgsql-general and the mailinglist archive shows
a broken link [2].  I included the correct link [3] in my message and also
received my message with the correct link from the list.

It looks like the archive percent-encodes subcomponent delimiters in the query
component.  Perhaps the encoding is allowed and it's just git.postgresql.org
that can't handle it.  But I'm pretty sure that links to git.postgresql.org
from the archive worked in the past.

[1] https://www.postgresql.org/message-id/1550267563.330669.1693335893138%40office.mailbox.org
[2]
https://git.postgresql.org/gitweb/?p=postgresql.git%3Ba%3Dblob%3Bf%3Dsrc%2Fbin%2Fpsql%2Fdescribe.c%3Bh%3Dbac94a338cfbc497200f0cf960cbabce2dadaa33%3Bhb%3D9b581c53418666205938311ef86047aa3c6b741f#l1149
[3]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/bin/psql/describe.c;h=bac94a338cfbc497200f0cf960cbabce2dadaa33;hb=9b581c53418666205938311ef86047aa3c6b741f#l1420

--
Erik



Re: Broken links in mailinglist archive due to percent-encoding

From
Erik Wienhold
Date:
On 29/08/2023 21:38 CEST Erik Wienhold <ewie@ewie.name> wrote:

> It looks like the archive percent-encodes subcomponent delimiters in the query
> component.  Perhaps the encoding is allowed and it's just git.postgresql.org
> that can't handle it.  But I'm pretty sure that links to git.postgresql.org
> from the archive worked in the past.

I've been digging around a bit more because this is an odd bug.

Turns out it's the result of applying Django's urlize filter to the message
body [1]:

    >>> from django.template.defaultfilters import urlize
    >>> urlize('http://example.net/foo?bar=baz;abc=123')
    '<a href="http://example.net/foo?bar=baz%3Babc%3D123" rel="nofollow">http://example.net/foo?bar=baz;abc=123</a>'

Looks like a bug in Django because it does not percent-encode any sub-delimiters
outside the query component:

    >>> urlize('http://example.net/foo;bar=baz')
    '<a href="http://example.net/foo;bar=baz" rel="nofollow">http://example.net/foo;bar=baz</a>'

And regarding git.postgresql.org: gitweb generates URLs with semicolon as the
separator of query pairs [2] instead of using ampersand, although semicolon is
no longer recommended by W3C.  But gitweb also handles query components with
ampersand instead of semicolon.  Which means that links [1] and [3] work after
I've manually replaced all semicolons with ampersands.

[1]
https://git.postgresql.org/gitweb/?p=pgarchives.git&a=blob&f=django/archives/mailarchives/templates/_message.html&h=c90a80afea418fc4800ae81bb517978fa56f7a4d&hb=HEAD#l64
[2] https://git.kernel.org/pub/scm/git/git.git/tree/gitweb/gitweb.perl#n1505
[3]
https://git.postgresql.org/gitweb/?p=postgresql.git&a=blob&f=src/bin/psql/describe.c&h=bac94a338cfbc497200f0cf960cbabce2dadaa33&hb=9b581c53418666205938311ef86047aa3c6b741f#l1420

--
Erik