Thread: Direct SSL connection and ALPN loose ends

Direct SSL connection and ALPN loose ends

From

Heikki Linnakangas

Date:

29 April 2024, 15:24:04

There's been a bunch of bugs, and discussion on the intended behavior of
sslnegotiation and ALPN. This email summarizes the current status:

## Status and loose ends for beta1

All reported bugs have now been fixed. We now enforce ALPN in all the
right places. Please let me know if I missed something.

There are two open items remaining that I intend to address in the next
few days, before beta1:

- I am going to rename sslnegotiation=requiredirect to
sslnegotiation=directonly. I acknowledge that there is still some debate
on this: Jacob (and Robert?) would prefer to change the behavior
instead, so that sslnegotiation=requiredirect would also imply or
require sslmode=require, while IMHO the settings should be orthogonal so
that sslmode controls whether SSL is used or not, and sslnegotiation
controls how the SSL layer is negotiated when SSL is used. Given that
they are orthogonal, "directonly" is a better name. I will also take
another look at the documentation, if it needs clarification on that
point. If you have more comments on whether this is a good idea or not
or how sslnegotiation should work, please reply on the other thread,
let's keep this one focused on the overall status. [1]

- The registration of the ALPN name with IANA hasn't been finished yet
[2]. I originally requested the name "pgsql", but after Peter's comment,
I changed the request to "postgresql". The string we have in 'master' is
currently "TBD-pgsql". I'm very confident that the registration will go
through with "postgresql", so my plan is to commit that change before
beta1, even if the IANA process hasn't completed by then.

## V18 material

- Add an option to disable traditional SSL negotiation in the server.
There was discussion on doing this via HBA rules or as a global option,
and the consensus seems to be for a global option. This would be just to
reduce the attach surface, there is no known vulnerabilities or other
issues with the traditional negotiation. And maybe to help with testing. [3]

These are not directly related to sslnegotiation, but came up in the
discussion:

- Clarify the situation with sslmode=require and gssencmode=require
combination, by replacing sslmode and gssencmode options with a single
"encryption=[ssl|gss|none], [...]" option. [4]

- Make sslmode=require the default. This is orthogonal to the SSL
negotiation, but I think the root cause for the disagreements on
sslnegotiation is actually that we'd like SSL to be the default. [5]

The details of these need to be hashed out, in particular the
backwards-compatibility and migration aspects, but the consensus seems
to be that it's the right direction.

## V19 and beyond

In the future, once v17 is ubiquitous and the ecosystem (pgbouncer etc)
have added direct SSL support, we can change the default sslnegotiation
from 'postgres' to 'direct'. I'm thinking 3-5 years from now. In the
more distant future, we could remove the traditional SSLRequest
negotiation altogether and always use direct SSL negotiation.

There's no rush on these.

## Retrospective

There were a lot more cleanups required for this work than I expected,
given that there were little changes to the patches between January and
March commitfests. I was mostly worried about the refactoring of the
retry logic in libpq (and about the pre-existing logic too to be honest,
it was complicated before these changes already). That's why I added a
lot more tests for that. However, I did not foresee all the ALPN related
issues. In hindsight, it would have been good to commit most of the ALPN
changes first, and with more tests. Jacob wrote a python test suite; I
should've played more with that, that could have demonstrated the ALPN
issues earlier.

[1]
https://www.postgresql.org/message-id/CA%2BTgmobV9JEk4AFy61Xw%2B2%2BcCTBqdTsDopkeB%2Bgb81kq3f-o6A%40mail.gmail.com

[2]
https://mailarchive.ietf.org/arch/msg/tls-reg-review/9LWPzQfOpbc8dTT7vc9ahNeNaiw/

[3]
https://www.postgresql.org/message-id/CA%2BTgmoaLpDVY2ywqQUfxvKEQZ%2Bnwkabcw_f%3Di4Zyivt9CLjcmA%40mail.gmail.com

[4]
https://www.postgresql.org/message-id/3a6f126c-e1aa-4dcc-9252-9868308f6cf0%40iki.fi

[5]
https://www.postgresql.org/message-id/CA%2BTgmoaNkRerEmB9JPgW0FhcJAe337AA%3D5kp6je9KekQhhRbmA%40mail.gmail.com

--
Heikki Linnakangas
Neon (https://neon.tech)

re: Direct SSL connection and ALPN loose ends

From

Ranier Vilela

Date:

29 April 2024, 17:10:44

Hi,

With TLS 1.3 and others there is possibly a security flaw using ALPN [1].

It seems to me that the ALPN protocol can be bypassed if the client does not correctly inform the ClientHello header.

So, the suggestion is to check the ClientHello header in the server and

terminate the TLS handshake early.

Patch attached.

best regards,

Ranier Vilela

[1] terminate-tlsv1-3-handshake-if-alpn-is-missing

Attachment

terminate-tls-handshake-if-no-alpn.patch

Re: Direct SSL connection and ALPN loose ends

From

Heikki Linnakangas

Date:

29 April 2024, 17:56:42

On 29/04/2024 20:10, Ranier Vilela wrote:
> Hi,
> 
> With TLS 1.3 and others there is possibly a security flaw using ALPN [1].
> 
> It seems to me that the ALPN protocol can be bypassed if the client does 
> not correctly inform the ClientHello header.
> 
> So, the suggestion is to check the ClientHello header in the server and
> terminate the TLS handshake early.

Sounds to me like it's working as designed. ALPN in general is optional; 
if the client doesn't request it, then you proceed without it. We do 
require ALPN for direct SSL connections though. We can, because direct 
SSL connections is a new feature in Postgres. But we cannot require it 
for the connections negotiated with SSLRequest, or we break 
compatibility with old clients that don't use ALPN.

There is a check in direct SSL mode that ALPN was used 
(ProcessSSLStartup in backend_startup.c):

>         if (!port->alpn_used)
>         {
>                 ereport(COMMERROR,
>                                 (errcode(ERRCODE_PROTOCOL_VIOLATION),
>                                  errmsg("received direct SSL connection request without ALPN protocol negotiation
extension")));
>                 goto reject;
>         }

That happens immediately after the SSL connection has been established.

Hmm. I guess it would be better to abort the connection earlier, without 
completing the TLS handshake. Otherwise the client might send the first 
message in wrong protocol to the PostgreSQL server. That's not a 
security issue for the PostgreSQL server: the server disconnects without 
reading the message. And I don't see any way for an ALPACA attack when 
the server ignores the client's message. Nevertheless, from the point of 
view of keeping the attack surface as small as possible, aborting 
earlier seems better.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Direct SSL connection and ALPN loose ends

From

Ranier Vilela

Date:

29 April 2024, 18:06:52

Em seg., 29 de abr. de 2024 às 14:56, Heikki Linnakangas <hlinnaka@iki.fi> escreveu:

On 29/04/2024 20:10, Ranier Vilela wrote:
> Hi,
>
> With TLS 1.3 and others there is possibly a security flaw using ALPN [1].
>
> It seems to me that the ALPN protocol can be bypassed if the client does
> not correctly inform the ClientHello header.
>
> So, the suggestion is to check the ClientHello header in the server and
> terminate the TLS handshake early.

Sounds to me like it's working as designed. ALPN in general is optional;
if the client doesn't request it, then you proceed without it. We do
require ALPN for direct SSL connections though. We can, because direct
SSL connections is a new feature in Postgres. But we cannot require it
for the connections negotiated with SSLRequest, or we break
compatibility with old clients that don't use ALPN.

Ok.

But what if I have a server configured for TLS 1.3 and that requires ALPN to allow access?
What about a client configured without ALPN requiring connection?

There is a check in direct SSL mode that ALPN was used
(ProcessSSLStartup in backend_startup.c):

> if (!port->alpn_used)
> {
> ereport(COMMERROR,
> (errcode(ERRCODE_PROTOCOL_VIOLATION),
> errmsg("received direct SSL connection request without ALPN protocol negotiation extension")));
> goto reject;
> }

That happens immediately after the SSL connection has been established.

Hmm. I guess it would be better to abort the connection earlier, without
completing the TLS handshake. Otherwise the client might send the first
message in wrong protocol to the PostgreSQL server. That's not a
security issue for the PostgreSQL server: the server disconnects without
reading the message. And I don't see any way for an ALPACA attack when
the server ignores the client's message. Nevertheless, from the point of
view of keeping the attack surface as small as possible, aborting
earlier seems better.

So the ClientHello callback is the correct way to determine the end.

best regards,

Ranier Vilela

Re: Direct SSL connection and ALPN loose ends

From

Heikki Linnakangas

Date:

29 April 2024, 18:36:34

On 29/04/2024 21:06, Ranier Vilela wrote:
> Em seg., 29 de abr. de 2024 às 14:56, Heikki Linnakangas 
> <hlinnaka@iki.fi <mailto:hlinnaka@iki.fi>> escreveu:
> 
>     On 29/04/2024 20:10, Ranier Vilela wrote:
>      > Hi,
>      >
>      > With TLS 1.3 and others there is possibly a security flaw using
>     ALPN [1].
>      >
>      > It seems to me that the ALPN protocol can be bypassed if the
>     client does
>      > not correctly inform the ClientHello header.
>      >
>      > So, the suggestion is to check the ClientHello header in the
>     server and
>      > terminate the TLS handshake early.
> 
>     Sounds to me like it's working as designed. ALPN in general is
>     optional;
>     if the client doesn't request it, then you proceed without it. We do
>     require ALPN for direct SSL connections though. We can, because direct
>     SSL connections is a new feature in Postgres. But we cannot require it
>     for the connections negotiated with SSLRequest, or we break
>     compatibility with old clients that don't use ALPN.
> 
> Ok.
> But what if I have a server configured for TLS 1.3 and that requires 
> ALPN to allow access?
> What about a client configured without ALPN requiring connection?

Sorry, I don't understand the questions. What about them?

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Direct SSL connection and ALPN loose ends

From

Ranier Vilela

Date:

29 April 2024, 19:19:07

Em seg., 29 de abr. de 2024 às 15:36, Heikki Linnakangas <hlinnaka@iki.fi> escreveu:

On 29/04/2024 21:06, Ranier Vilela wrote:
> Em seg., 29 de abr. de 2024 às 14:56, Heikki Linnakangas
> <hlinnaka@iki.fi <mailto:hlinnaka@iki.fi>> escreveu:
>
> On 29/04/2024 20:10, Ranier Vilela wrote:
> > Hi,
> >
> > With TLS 1.3 and others there is possibly a security flaw using
> ALPN [1].
> >
> > It seems to me that the ALPN protocol can be bypassed if the
> client does
> > not correctly inform the ClientHello header.
> >
> > So, the suggestion is to check the ClientHello header in the
> server and
> > terminate the TLS handshake early.
>
> Sounds to me like it's working as designed. ALPN in general is
> optional;
> if the client doesn't request it, then you proceed without it. We do
> require ALPN for direct SSL connections though. We can, because direct
> SSL connections is a new feature in Postgres. But we cannot require it
> for the connections negotiated with SSLRequest, or we break
> compatibility with old clients that don't use ALPN.
>
> Ok.
> But what if I have a server configured for TLS 1.3 and that requires
> ALPN to allow access?
> What about a client configured without ALPN requiring connection?

Sorry, I don't understand the questions. What about them?

Sorry, I'll try to be clearer.
The way it is designed, can we impose TLS 1.3 and ALPN to allow access to a public server?

And if on the other side we have a client, configured without ALPN,

when requesting access, the server will refuse?

best regards,

Ranier Vilela

Re: Direct SSL connection and ALPN loose ends

From

Jacob Champion

Date:

17 June 2024, 14:11:06

On Mon, Apr 29, 2024 at 8:24 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I was mostly worried about the refactoring of the
> retry logic in libpq (and about the pre-existing logic too to be honest,
> it was complicated before these changes already).

Some changes in the v17 negotiation fallback order caught my eye:

1. For sslmode=prefer, a modern v3 error during negotiation now
results in a fallback to plaintext. For v16 this resulted in an
immediate failure. (v2 errors retain the v16 behavior.)
2. For gssencmode=prefer, a legacy v2 error during negotiation now
results in an immediate failure. In v16 it allowed fallback to SSL or
plaintext depending on sslmode.

Are both these changes intentional/desirable? Change #1 seems to
partially undo the decision made in a49fbaaf:

>     Don't assume that "E" response to NEGOTIATE_SSL_CODE means pre-7.0 server.
>
>     These days, such a response is far more likely to signify a server-side
>     problem, such as fork failure. [...]
>
>     Hence, it seems best to just eliminate the assumption that backing off
>     to non-SSL/2.0 protocol is the way to recover from an "E" response, and
>     instead treat the server error the same as we would in non-SSL cases.

Thanks,
--Jacob

Re: Direct SSL connection and ALPN loose ends

From

Heikki Linnakangas

Date:

17 June 2024, 15:23:54

On 17/06/2024 17:11, Jacob Champion wrote:
> On Mon, Apr 29, 2024 at 8:24 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> I was mostly worried about the refactoring of the
>> retry logic in libpq (and about the pre-existing logic too to be honest,
>> it was complicated before these changes already).
> 
> Some changes in the v17 negotiation fallback order caught my eye:
> 
> 1. For sslmode=prefer, a modern v3 error during negotiation now
> results in a fallback to plaintext. For v16 this resulted in an
> immediate failure. (v2 errors retain the v16 behavior.)
> 2. For gssencmode=prefer, a legacy v2 error during negotiation now
> results in an immediate failure. In v16 it allowed fallback to SSL or
> plaintext depending on sslmode.
> 
> Are both these changes intentional/desirable? Change #1 seems to
> partially undo the decision made in a49fbaaf:
> 
>>      Don't assume that "E" response to NEGOTIATE_SSL_CODE means pre-7.0 server.
>>
>>      These days, such a response is far more likely to signify a server-side
>>      problem, such as fork failure. [...]
>>
>>      Hence, it seems best to just eliminate the assumption that backing off
>>      to non-SSL/2.0 protocol is the way to recover from an "E" response, and
>>      instead treat the server error the same as we would in non-SSL cases.

They were not intentional. Let me think about the desirable part :-).

By "negotiation", which part of the protocol are we talking about 
exactly? In the middle of the TLS handshake? After sending the startup 
packet?

I think the behavior with v2 and v3 errors should be the same. And I 
think an immediate failure is appropriate on any v2/v3 error during 
negotiation, assuming we don't use those errors for things like "TLS not 
supported", which would warrant a fallback.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Direct SSL connection and ALPN loose ends

From

Jacob Champion

Date:

17 June 2024, 16:23:09

On Mon, Jun 17, 2024 at 8:24 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> By "negotiation", which part of the protocol are we talking about
> exactly? In the middle of the TLS handshake? After sending the startup
> packet?

By "negotiation" I mean the server's response to the startup packet.
I.e. "supported"/"not supported"/"error".

> I think the behavior with v2 and v3 errors should be the same. And I
> think an immediate failure is appropriate on any v2/v3 error during
> negotiation, assuming we don't use those errors for things like "TLS not
> supported", which would warrant a fallback.

For GSS encryption, it was my vague understanding that older servers
respond with an error rather than the "not supported" indication. For
TLS, though, the decision in a49fbaaf (immediate failure) seemed
reasonable.

Thanks,
--Jacob

Re: Direct SSL connection and ALPN loose ends

From

Andres Freund

Date:

17 June 2024, 18:33:35

Hi,

On 2024-04-29 18:24:04 +0300, Heikki Linnakangas wrote:
> All reported bugs have now been fixed. We now enforce ALPN in all the right
> places. Please let me know if I missed something.

Very minor and not really your responsibility:

If provided with the necessary key information, wireshark can decode TLS
exchanges when using sslnegotiation=postgres but not with direct. Presumably
it needs to be taught postgres' ALPN id or something.

Example with direct:

  476 6513.310308457 192.168.0.113 → 192.168.0.200 48978 5432 142 TLSv1.3 Finished
  477 6513.310341492 192.168.0.113 → 192.168.0.200 48978 5432 151 TLSv1.3 Application Data
  478 6513.320730295 192.168.0.200 → 192.168.0.113 5432 48978 147 TLSv1.3 New Session Ticket
  479 6513.320745684 192.168.0.200 → 192.168.0.113 5432 48978 147 TLSv1.3 New Session Ticket
  480 6513.321175713 192.168.0.113 → 192.168.0.200 48978 5432 68 TCP 48978 → 5432 [ACK] Seq=915 Ack=1665 Win=62848
Len=0TSval=3779915421 TSecr=3469016093

  481 6513.323161553 192.168.0.200 → 192.168.0.113 5432 48978 518 TLSv1.3 Application Data
  482 6513.323626180 192.168.0.113 → 192.168.0.200 48978 5432 125 TLSv1.3 Application Data
  483 6513.333977769 192.168.0.200 → 192.168.0.113 5432 48978 273 TLSv1.3 Application Data
  484 6513.334581920 192.168.0.113 → 192.168.0.200 48978 5432 95 TLSv1.3 Application Data
  485 6513.334666116 192.168.0.113 → 192.168.0.200 48978 5432 92 TLSv1.3 Alert (Level: Warning, Description: Close
Notify)

Example with postgres:

  502 6544.752799560 192.168.0.113 → 192.168.0.200 46300 5432 142 TLSv1.3 Finished
  503 6544.752842863 192.168.0.113 → 192.168.0.200 46300 5432 151 PGSQL >?
  504 6544.763152222 192.168.0.200 → 192.168.0.113 5432 46300 147 TLSv1.3 New Session Ticket
  505 6544.763163155 192.168.0.200 → 192.168.0.113 5432 46300 147 TLSv1.3 New Session Ticket
  506 6544.763587595 192.168.0.113 → 192.168.0.200 46300 5432 68 TCP 46300 → 5432 [ACK] Seq=923 Ack=1666 Win=62848
Len=0TSval=3779946864 TSecr=3469047536

  507 6544.765024827 192.168.0.200 → 192.168.0.113 5432 46300 518 PGSQL <R/S/S/S/S/S/S/S/S/S/S/S/S/S/S/K/Z
  508 6544.766288155 192.168.0.113 → 192.168.0.200 46300 5432 125 PGSQL >Q
  509 6544.776974164 192.168.0.200 → 192.168.0.113 5432 46300 273 PGSQL <T/D/D/D/D/D/D/D/D/D/D/C/Z
  510 6544.777597927 192.168.0.113 → 192.168.0.200 46300 5432 95 PGSQL >X
  511 6544.777631520 192.168.0.113 → 192.168.0.200 46300 5432 92 TLSv1.3 Alert (Level: Warning, Description: Close
Notify)

Note that in the second one it knows what's inside the "Application Data"
messages and decodes them (S: startup, authentication ok, parameters, cancel key,
ready for query, C: simple query, S: description, 10 rows, command complete,
ready for query).

In the GUI you can obviously go into the "postgres messages" in more detail
than I know how to do on the console.

A second aspect is that I'm not super happy about the hack of stashing data
into Port.  I think medium term we'd be better off separating out the
buffering for unencrypted and encrypted data properly. It turns out that not
having any buffering *below* openssl (i.e. the encrypted data) hurts both for
the send and receive side, due to a) increased number of syscalls b) too many
small packets being sent, as we use TCP_NODELAY c) kernel memory copies being
slower due to the small increments.

Greetings,

Andres Freund

Re: Direct SSL connection and ALPN loose ends

From

Jacob Champion

Date:

20 June 2024, 17:02:41

On Mon, Jun 17, 2024 at 9:23 AM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
> > I think the behavior with v2 and v3 errors should be the same. And I
> > think an immediate failure is appropriate on any v2/v3 error during
> > negotiation, assuming we don't use those errors for things like "TLS not
> > supported", which would warrant a fallback.
>
> For GSS encryption, it was my vague understanding that older servers
> respond with an error rather than the "not supported" indication. For
> TLS, though, the decision in a49fbaaf (immediate failure) seemed
> reasonable.

Would an open item for this be appropriate?

Thanks,
--Jacob

Re: Direct SSL connection and ALPN loose ends

From

Heikki Linnakangas

Date:

20 June 2024, 23:13:05

On 20/06/2024 20:02, Jacob Champion wrote:
> On Mon, Jun 17, 2024 at 9:23 AM Jacob Champion
> <jacob.champion@enterprisedb.com> wrote:
>>> I think the behavior with v2 and v3 errors should be the same. And I
>>> think an immediate failure is appropriate on any v2/v3 error during
>>> negotiation, assuming we don't use those errors for things like "TLS not
>>> supported", which would warrant a fallback.
>>
>> For GSS encryption, it was my vague understanding that older servers
>> respond with an error rather than the "not supported" indication. For
>> TLS, though, the decision in a49fbaaf (immediate failure) seemed
>> reasonable.
> 
> Would an open item for this be appropriate?

Added.

> By "negotiation" I mean the server's response to the startup packet.
> I.e. "supported"/"not supported"/"error".

Ok, I'm still a little confused, probably a terminology issue. The 
server doesn't respond with "supported" or "not supported" to the 
startup packet, that happens earlier. I think you mean the SSLRequst / 
GSSRequest packet, which is sent *before* the startup packet?

>> I think the behavior with v2 and v3 errors should be the same. And I
>> think an immediate failure is appropriate on any v2/v3 error during
>> negotiation, assuming we don't use those errors for things like "TLS not
>> supported", which would warrant a fallback.
> 
> For GSS encryption, it was my vague understanding that older servers
> respond with an error rather than the "not supported" indication. For
> TLS, though, the decision in a49fbaaf (immediate failure) seemed
> reasonable.

Hmm, right, GSS encryption was introduced in v12, and older versions 
respond with an error to a GSSRequest.

We probably could make the same assumption for GSS as we did for TLS in 
a49fbaaf, i.e. that an error means that something's wrong with the 
server, rather than that it's just very old and doesn't support GSS. But 
the case for that is a lot weaker case than with TLS. There are still 
pre-v12 servers out there in the wild.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Direct SSL connection and ALPN loose ends

From

Jacob Champion

Date:

20 June 2024, 23:32:45

On Thu, Jun 20, 2024 at 4:13 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > By "negotiation" I mean the server's response to the startup packet.
> > I.e. "supported"/"not supported"/"error".
>
> Ok, I'm still a little confused, probably a terminology issue. The
> server doesn't respond with "supported" or "not supported" to the
> startup packet, that happens earlier. I think you mean the SSLRequst /
> GSSRequest packet, which is sent *before* the startup packet?

Yes, sorry. (I'm used to referring to those as startup packets too, ha.)

> Hmm, right, GSS encryption was introduced in v12, and older versions
> respond with an error to a GSSRequest.
>
> We probably could make the same assumption for GSS as we did for TLS in
> a49fbaaf, i.e. that an error means that something's wrong with the
> server, rather than that it's just very old and doesn't support GSS. But
> the case for that is a lot weaker case than with TLS. There are still
> pre-v12 servers out there in the wild.

Right. Since we default to gssencmode=prefer, if you have Kerberos
creds in your environment, I think this could potentially break
existing software that connects to v11 servers once you upgrade libpq.

Thanks,
--Jacob

Re: Direct SSL connection and ALPN loose ends

From

Heikki Linnakangas

Date:

24 June 2024, 20:30:53

On 21/06/2024 02:32, Jacob Champion wrote:
> On Thu, Jun 20, 2024 at 4:13 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>> By "negotiation" I mean the server's response to the startup packet.
>>> I.e. "supported"/"not supported"/"error".
>>
>> Ok, I'm still a little confused, probably a terminology issue. The
>> server doesn't respond with "supported" or "not supported" to the
>> startup packet, that happens earlier. I think you mean the SSLRequst /
>> GSSRequest packet, which is sent *before* the startup packet?
> 
> Yes, sorry. (I'm used to referring to those as startup packets too, ha.)

Yeah I'm not sure what the right term would be.

>> Hmm, right, GSS encryption was introduced in v12, and older versions
>> respond with an error to a GSSRequest.
>>
>> We probably could make the same assumption for GSS as we did for TLS in
>> a49fbaaf, i.e. that an error means that something's wrong with the
>> server, rather than that it's just very old and doesn't support GSS. But
>> the case for that is a lot weaker case than with TLS. There are still
>> pre-v12 servers out there in the wild.
> 
> Right. Since we default to gssencmode=prefer, if you have Kerberos
> creds in your environment, I think this could potentially break
> existing software that connects to v11 servers once you upgrade libpq.

When you connect to a V11 server and attempt to perform GSSAPI 
authentication, it will respond with a V3 error that says: "unsupported 
frontend protocol 1234.5680: server supports 2.0 to 3.0". That was a 
surprise to me until I tested it just now. I thought that it would 
respond with a protocol V2 error, but it is not so. The backend sets 
FrontendProtocol to 1234.5680 before sending the error, and because it 
is >= 3, the error is sent with protocol version 3.

Given that, I think it is a good thing to fail the connection completely 
on receiving a V2 error.

Attached is a patch to fix the other issue, with falling back from SSL 
to plaintext. And some tests and comment fixes I spotted while at it.

0001: A small comment fix
0002: This is the main patch that fixes the SSL fallback issue

0003: This adds fault injection tests to exercise these early error 
codepaths. It is not ready to be merged, as it contains a hack to skip 
locking. See thread at 
https://www.postgresql.org/message-id/e1ffb822-054e-4006-ac06-50532767f75b%40iki.fi.

0004: More tests, for what happens if the server sends an error after 
responding "yes" to the SSLRequest or GSSRequest, but before performing 
the SSL/GSS handshake.

Attached is also a little stand-alone perl program that listens on a 
socket, and when you connect to it, it immediately sends a V2 or V3 
error, depending on the argument. That's useful for testing. It could be 
used as an alternative strategy to the injection points I used in the 
0003-0004 patches, but for now I just used it for manual testing.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

Re: Direct SSL connection and ALPN loose ends

From

Vladimir Sitnikov

Date:

25 June 2024, 13:36:58

I reviewed the documentation for "direct ALPN connections' ', and it looks like it could be improved.

Here's the link: https://www.postgresql.org/docs/17/protocol-flow.html#PROTOCOL-FLOW-SSL

The currently suggested values for "sslnegotiations" are "direct" and "postgres".

The project name is PostgreSQL and the ALPN name is postgresql. Is there a reason why property value uses "postgres"?

Can the value be renamed to postgresql for consistency?

"SSL". Technically, the proper term is TLS, and even the document refers to "IANA TLS ALPN Protocol IDs" (TLS, not SSL).

I would not die on that hill, however, going for tlsnegotiation would look better than sslnegotiation.

Vladimir

Re: Direct SSL connection and ALPN loose ends

From

Dave Cramer

Date:

25 June 2024, 14:20:30

On Tue, 25 Jun 2024 at 09:37, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:

I reviewed the documentation for "direct ALPN connections' ', and it looks like it could be improved.
Here's the link: https://www.postgresql.org/docs/17/protocol-flow.html#PROTOCOL-FLOW-SSL

The currently suggested values for "sslnegotiations" are "direct" and "postgres".
The project name is PostgreSQL and the ALPN name is postgresql. Is there a reason why property value uses "postgres"?
Can the value be renamed to postgresql for consistency?

+1 I found it strange that we are not using postgresql

"SSL". Technically, the proper term is TLS, and even the document refers to "IANA TLS ALPN Protocol IDs" (TLS, not SSL).
I would not die on that hill, however, going for tlsnegotiation would look better than sslnegotiation.

+1 again, unusual to use SSL when this really is TLS.

Dave

Re: Direct SSL connection and ALPN loose ends

From

Jacob Champion

Date:

25 June 2024, 15:51:03

On Thu, Jun 20, 2024 at 4:32 PM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
> Thanks,
> --Jacob

Hey Heikki,

[sending this to the list in case it's not just me]

I cannot for the life of me get GMail to deliver your latest message,
even though I see it on postgresql.org. It's not in spam; it's just
gone. I wonder if it's possibly the Perl server script causing
virus-scanner issues?

--Jacob

Re: Direct SSL connection and ALPN loose ends

From

Jacob Champion

Date:

25 June 2024, 16:05:19

On Tue, Jun 25, 2024 at 7:20 AM Dave Cramer <davecramer@postgres.rocks> wrote:
>
> On Tue, 25 Jun 2024 at 09:37, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:
>>
>> "SSL". Technically, the proper term is TLS, and even the document refers to "IANA TLS ALPN Protocol IDs" (TLS, not
SSL).
>> I would not die on that hill, however, going for tlsnegotiation would look better than sslnegotiation.
>
> +1 again, unusual to use SSL when this really is TLS.

This was sort of litigated last ye-(checks notes) oh no, three years ago:

    https://www.postgresql.org/message-id/flat/CE12DD5C-4BB3-4166-BC9A-39779568734C%40yesql.se

I'm your side when it comes to the use of the TLS acronym, personally,
but I think introducing a brand new option that interfaces with
sslmode and sslrootcert and etc. while not being named like them would
be outright unhelpful. And the idea of switching everything to use TLS
in docs seemed to be met with a solid "meh" on the other thread.

--Jacob

Re: Direct SSL connection and ALPN loose ends

From

Michael Paquier

Date:

16 July 2024, 06:54:22

On Mon, Jun 24, 2024 at 11:30:53PM +0300, Heikki Linnakangas wrote:
> Given that, I think it is a good thing to fail the connection completely on
> receiving a V2 error.
>
> Attached is a patch to fix the other issue, with falling back from SSL to
> plaintext. And some tests and comment fixes I spotted while at it.
>
> 0001: A small comment fix

Already committed as of cc68ca6d420e.

> 0002: This is the main patch that fixes the SSL fallback issue

+     conn->failed_enc_methods |= conn->allowed_enc_methods &
(~conn->current_enc_method);

Sounds reasonable to me.

It's a bit annoying to have to guess that current_enc_method is
tracking only one method at a time (aka these three fields are not
documented in libpq-int.h), while allowed_enc_methods and
failed_enc_methods is a bitwise combination of the methods that are
still allowed or that have already failed.

> 0003: This adds fault injection tests to exercise these early error
> codepaths. It is not ready to be merged, as it contains a hack to skip
> locking. See thread at
> https://www.postgresql.org/message-id/e1ffb822-054e-4006-ac06-50532767f75b%40iki.fi.

Locking when running an injection point has been replaced by some
atomics in 86db52a5062a.

+    if (IsInjectionPointAttached("backend-initialize-v2-error"))
+    {
+        FrontendProtocol = PG_PROTOCOL(2,0);
+        elog(FATAL, "protocol version 2 error triggered");
+    }

This is an attempt to do stack manipulation with an injection point
set.  FrontendProtocol is a global variable, so you could have a new
callback setting up this global variable directly, then FATAL (I
really don't mind is modules/injection_points finishes with a library
of callbacks).

Not sure to like much this new IsInjectionPointAttached() that does a
search in the existing injection point pool, though.  This leads to
more code footprint in the core backend, and I'm trying to minimize
that.  Not everybody agrees with this view, I'd guess, which is also
fine.

> 0004: More tests, for what happens if the server sends an error after
> responding "yes" to the SSLRequest or GSSRequest, but before performing the
> SSL/GSS handshake.

No objections to these two additions.

> Attached is also a little stand-alone perl program that listens on a socket,
> and when you connect to it, it immediately sends a V2 or V3 error, depending
> on the argument. That's useful for testing. It could be used as an
> alternative strategy to the injection points I used in the 0003-0004
> patches, but for now I just used it for manual testing.

Nice toy.
--
Michael

Attachment

signature.asc

Re: Direct SSL connection and ALPN loose ends

From

Heikki Linnakangas

Date:

23 July 2024, 17:32:29

On 16/07/2024 09:54, Michael Paquier wrote:
> On Mon, Jun 24, 2024 at 11:30:53PM +0300, Heikki Linnakangas wrote:
>> 0002: This is the main patch that fixes the SSL fallback issue
> 
> +     conn->failed_enc_methods |= conn->allowed_enc_methods &
> (~conn->current_enc_method);
> 
> Sounds reasonable to me.
> 
> It's a bit annoying to have to guess that current_enc_method is
> tracking only one method at a time (aka these three fields are not
> documented in libpq-int.h), while allowed_enc_methods and
> failed_enc_methods is a bitwise combination of the methods that are
> still allowed or that have already failed.

Yeah. In hindsight I'm still not very happy with the code structure with 
"allowed_enc_methods" and "current_enc_methods" and all that. The 
fallback logic is still complicated. It's better than in v16, IMHO, but 
still not great. This patch seems like the best fix for v17, but I 
wouldn't mind another round of refactoring for v18, if anyone's got some 
good ideas on how to structure it better. All these new tests are a 
great asset when refactoring this again.

> +    if (IsInjectionPointAttached("backend-initialize-v2-error"))
> +    {
> +        FrontendProtocol = PG_PROTOCOL(2,0);
> +        elog(FATAL, "protocol version 2 error triggered");
> +    }
> 
> This is an attempt to do stack manipulation with an injection point
> set.  FrontendProtocol is a global variable, so you could have a new
> callback setting up this global variable directly, then FATAL (I
> really don't mind is modules/injection_points finishes with a library
> of callbacks).
> 
> Not sure to like much this new IsInjectionPointAttached() that does a
> search in the existing injection point pool, though.  This leads to
> more code footprint in the core backend, and I'm trying to minimize
> that.  Not everybody agrees with this view, I'd guess, which is also
> fine.

Yeah, I'm also not too excited about the additional code in the backend, 
but I'm also not excited about writing another test C module just for 
this. I'm inclined to commit this as it is, but we can certainly revisit 
this later, since it's just test code.

Here's a new rebased version with some minor cleanup. Notably, I added 
docs for the new IS_INJECTION_POINT_ATTACHED() macro.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

Re: Direct SSL connection and ALPN loose ends

From

Michael Paquier

Date:

23 July 2024, 23:37:31

On Tue, Jul 23, 2024 at 08:32:29PM +0300, Heikki Linnakangas wrote:
> All these new tests are a great asset when refactoring this again.

Thanks for doing that.  The coverage, especially with v2, is going to
be really useful.

> Yeah, I'm also not too excited about the additional code in the backend, but
> I'm also not excited about writing another test C module just for this. I'm
> inclined to commit this as it is, but we can certainly revisit this later,
> since it's just test code.

The point would be to rely on the existing injection_points module,
with a new callback in it.  The callbacks could be on a file of their
own in the module, for clarity.  What you have is OK for me anyway, it
is good to add more options to developers in this area and this gets
used in core.  That's also enough to manipulate the stack in or even
out of core.

> Here's a new rebased version with some minor cleanup. Notably, I added docs
> for the new IS_INJECTION_POINT_ATTACHED() macro.

0001 looks OK.

+       push @events, "backenderror" if $line =~ /error triggered for
injection point backend-/;
+       push @events, "v2error" if $line =~ /protocol version 2 error
triggered/;

Perhaps append an "injection_" for these two keywords?

+#include "storage/proc.h"

This inclusion in injection_point.c should not be needed.

> sets the FrontendProtocol global variable, but I think it's more
> straightforward to have the test code

The last sentence in the commit message of 0002 seems to be
unfinished.

Could you run a perltidy on 005_negotiate_encryption.pl?  There are a
bunch of new diffs in it.
--
Michael

Attachment

signature.asc

Re: Direct SSL connection and ALPN loose ends

From

Heikki Linnakangas

Date:

26 July 2024, 12:40:50

On 24/07/2024 02:37, Michael Paquier wrote:
> On Tue, Jul 23, 2024 at 08:32:29PM +0300, Heikki Linnakangas wrote:
>> All these new tests are a great asset when refactoring this again.
> 
> Thanks for doing that.  The coverage, especially with v2, is going to
> be really useful.
> 
>> Yeah, I'm also not too excited about the additional code in the backend, but
>> I'm also not excited about writing another test C module just for this. I'm
>> inclined to commit this as it is, but we can certainly revisit this later,
>> since it's just test code.
> 
> The point would be to rely on the existing injection_points module,
> with a new callback in it.  The callbacks could be on a file of their
> own in the module, for clarity.

Hmm, do we want injection_points module to be a dumping ground for 
callbacks that are only useful for very specific injection points, in 
specific tests? I view it as a more general purpose module, containing 
callbacks that are useful for many different tests. Don't get me wrong, 
I'm not necessarily against it, and it would be expedient, that's just 
not how I see the purpose of injection_points.

> What you have is OK for me anyway, it
> is good to add more options to developers in this area and this gets
> used in core.  That's also enough to manipulate the stack in or even
> out of core.

Ok, I kept it that way.

>> Here's a new rebased version with some minor cleanup. Notably, I added docs
>> for the new IS_INJECTION_POINT_ATTACHED() macro.
> 
> 0001 looks OK.
> 
> +       push @events, "backenderror" if $line =~ /error triggered for
> injection point backend-/;
> +       push @events, "v2error" if $line =~ /protocol version 2 error
> triggered/;
> 
> Perhaps append an "injection_" for these two keywords?
> 
> +#include "storage/proc.h"
> 
> This inclusion in injection_point.c should not be needed.
> 
>> sets the FrontendProtocol global variable, but I think it's more
>> straightforward to have the test code
> 
> The last sentence in the commit message of 0002 seems to be
> unfinished.

Fixed.

> Could you run a perltidy on 005_negotiate_encryption.pl?  There are a
> bunch of new diffs in it.

Fixed.

Committed, thanks for the review, and thanks Jacob for the testing!

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Direct SSL connection and ALPN loose ends

From

Heikki Linnakangas

Date:

26 July 2024, 14:34:06

On 17/06/2024 21:33, Andres Freund wrote:
> If provided with the necessary key information, wireshark can decode TLS
> exchanges when using sslnegotiation=postgres but not with direct. Presumably
> it needs to be taught postgres' ALPN id or something.

I opened https://gitlab.com/wireshark/wireshark/-/merge_requests/16612 
to fix that in the wireshark pgsql protocol dissector.

-- 
Heikki Linnakangas
Neon (https://neon.tech)