Thread: [PATCH] Fix hostaddr crash during non-blocking cancellation

[PATCH] Fix hostaddr crash during non-blocking cancellation

From
Jacob Champion
Date:
Hi all,

A connection with only a hostaddr (no host) can't be cancelled via
PQcreateCancel(), because we'll crash in emitHostIdentityInfo(). The
problem is that the synthetic connhost entry we've created for
cancellation has an incorrect type field, which causes the following
code to make bad decisions if connhost[0].host is NULL:

> emitHostIdentifyInfo(...)
> {
>     ...
>     if (conn->connhost[conn->whichhost].type == CHT_HOST_ADDRESS)
>         displayed_host = conn->connhost[conn->whichhost].hostaddr;
>     else
>         displayed_host = conn->connhost[conn->whichhost].host;
> ...
>     if (conn->connhost[conn->whichhost].type != CHT_HOST_ADDRESS &&
>         host_addr[0] &&
>         strcmp(displayed_host, host_addr) != 0)  <- crashes here

I think the solution is just to copy over the correct type, as done in
the attached 0001.

Putting in a regression test requires us to once again answer the
question of "where do we test TCP-only features". I'd like to have a
PG_TEST_EXTRA entry for these, so 0002 adds a `tcp` group. That's
going to need more debate, and 002_tcp.pl is very quick-and-dirty, but
I also _really_ want to stop throwing tests away just because we don't
have a nice place to put them... [1]

So: if 0001 looks good, I propose that I backpatch it after beta1, but
hold onto 0002 until REL_18_STABLE is split off. Then we can figure
out the "test TCP" semantics for the full 19 cycle, and maybe
eventually backpatch tests once we're happy with how they work.

WDYT?

--Jacob

[1] https://postgr.es/m/flat/CAOYmi%2Bkx8eOmKj01dV4vSBeq9pvqR8dt6rGw%2BB_pBOE2_GOj%2Bg%40mail.gmail.com

Attachment
Jacob Champion <jacob.champion@enterprisedb.com> writes:
> A connection with only a hostaddr (no host) can't be cancelled via
> PQcreateCancel(), because we'll crash in emitHostIdentityInfo(). The
> problem is that the synthetic connhost entry we've created for
> cancellation has an incorrect type field, which causes the following
> code to make bad decisions if connhost[0].host is NULL:

I hadn't noticed (or maybe I forgot) this thread, so when the
same problem was reported at [1] I just went ahead and pushed the
submitted patch, which is only cosmetically different from your 0001.
Apologies for treading on your toes.

As for the question of how to test this sort of thing, I'm not
too excited about the narrow-gauge test case your 0002 proposes.
What I did for manual testing in [1] was to hack the postgres_fdw
tests to connect using hostaddr instead of the default.  I think
formalizing that sort of approach would yield much better coverage.
I don't have any specific ideas about how to do it, though.
Maybe get our tests to respond to an environment variable that
allows overriding the default choices of connection properties?

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/18974-575f02b2168b36b3%40postgresql.org



Re: [PATCH] Fix hostaddr crash during non-blocking cancellation

From
Jacob Champion
Date:
On Thu, Jul 3, 2025 at 11:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I hadn't noticed (or maybe I forgot) this thread, so when the
> same problem was reported at [1] I just went ahead and pushed the
> submitted patch, which is only cosmetically different from your 0001.
> Apologies for treading on your toes.

No worries, as long as it's fixed I'm happy!

(And many thanks to Greg for the review; sorry for not getting to it
fast enough.)

> As for the question of how to test this sort of thing, I'm not
> too excited about the narrow-gauge test case your 0002 proposes.
> What I did for manual testing in [1] was to hack the postgres_fdw
> tests to connect using hostaddr instead of the default.  I think
> formalizing that sort of approach would yield much better coverage.

I agree that overriding connection defaults probably gets us better
overall coverage -- I just think I got pushback in the past for adding
"multipliers" in that way. But I won't argue against test coverage as
long as we get it in the end. :D

That said, I am planning to get noisier about the lack of "TCP suite".
The number of tests we've discarded just because we don't have a
current place to put them keeps slowly growing, and my long-term
intent with 0002 was to actually add a new place for them. Whatever
formalization we choose, let's please keep a TCP-only cluster
somewhere instead of forcing people to try to find a least-bad suite
to slot new tests into.

Thanks!
--Jacob