Thread: Direct SSL connection and ALPN loose ends
There's been a bunch of bugs, and discussion on the intended behavior of sslnegotiation and ALPN. This email summarizes the current status: ## Status and loose ends for beta1 All reported bugs have now been fixed. We now enforce ALPN in all the right places. Please let me know if I missed something. There are two open items remaining that I intend to address in the next few days, before beta1: - I am going to rename sslnegotiation=requiredirect to sslnegotiation=directonly. I acknowledge that there is still some debate on this: Jacob (and Robert?) would prefer to change the behavior instead, so that sslnegotiation=requiredirect would also imply or require sslmode=require, while IMHO the settings should be orthogonal so that sslmode controls whether SSL is used or not, and sslnegotiation controls how the SSL layer is negotiated when SSL is used. Given that they are orthogonal, "directonly" is a better name. I will also take another look at the documentation, if it needs clarification on that point. If you have more comments on whether this is a good idea or not or how sslnegotiation should work, please reply on the other thread, let's keep this one focused on the overall status. [1] - The registration of the ALPN name with IANA hasn't been finished yet [2]. I originally requested the name "pgsql", but after Peter's comment, I changed the request to "postgresql". The string we have in 'master' is currently "TBD-pgsql". I'm very confident that the registration will go through with "postgresql", so my plan is to commit that change before beta1, even if the IANA process hasn't completed by then. ## V18 material - Add an option to disable traditional SSL negotiation in the server. There was discussion on doing this via HBA rules or as a global option, and the consensus seems to be for a global option. This would be just to reduce the attach surface, there is no known vulnerabilities or other issues with the traditional negotiation. And maybe to help with testing. [3] These are not directly related to sslnegotiation, but came up in the discussion: - Clarify the situation with sslmode=require and gssencmode=require combination, by replacing sslmode and gssencmode options with a single "encryption=[ssl|gss|none], [...]" option. [4] - Make sslmode=require the default. This is orthogonal to the SSL negotiation, but I think the root cause for the disagreements on sslnegotiation is actually that we'd like SSL to be the default. [5] The details of these need to be hashed out, in particular the backwards-compatibility and migration aspects, but the consensus seems to be that it's the right direction. ## V19 and beyond In the future, once v17 is ubiquitous and the ecosystem (pgbouncer etc) have added direct SSL support, we can change the default sslnegotiation from 'postgres' to 'direct'. I'm thinking 3-5 years from now. In the more distant future, we could remove the traditional SSLRequest negotiation altogether and always use direct SSL negotiation. There's no rush on these. ## Retrospective There were a lot more cleanups required for this work than I expected, given that there were little changes to the patches between January and March commitfests. I was mostly worried about the refactoring of the retry logic in libpq (and about the pre-existing logic too to be honest, it was complicated before these changes already). That's why I added a lot more tests for that. However, I did not foresee all the ALPN related issues. In hindsight, it would have been good to commit most of the ALPN changes first, and with more tests. Jacob wrote a python test suite; I should've played more with that, that could have demonstrated the ALPN issues earlier. [1] https://www.postgresql.org/message-id/CA%2BTgmobV9JEk4AFy61Xw%2B2%2BcCTBqdTsDopkeB%2Bgb81kq3f-o6A%40mail.gmail.com [2] https://mailarchive.ietf.org/arch/msg/tls-reg-review/9LWPzQfOpbc8dTT7vc9ahNeNaiw/ [3] https://www.postgresql.org/message-id/CA%2BTgmoaLpDVY2ywqQUfxvKEQZ%2Bnwkabcw_f%3Di4Zyivt9CLjcmA%40mail.gmail.com [4] https://www.postgresql.org/message-id/3a6f126c-e1aa-4dcc-9252-9868308f6cf0%40iki.fi [5] https://www.postgresql.org/message-id/CA%2BTgmoaNkRerEmB9JPgW0FhcJAe337AA%3D5kp6je9KekQhhRbmA%40mail.gmail.com -- Heikki Linnakangas Neon (https://neon.tech)
Hi,
With TLS 1.3 and others there is possibly a security flaw using ALPN [1].
It seems to me that the ALPN protocol can be bypassed if the client does not correctly inform the ClientHello header.
So, the suggestion is to check the ClientHello header in the server and
terminate the TLS handshake early.
Patch attached.
best regards,
Ranier Vilela
Attachment
On 29/04/2024 20:10, Ranier Vilela wrote: > Hi, > > With TLS 1.3 and others there is possibly a security flaw using ALPN [1]. > > It seems to me that the ALPN protocol can be bypassed if the client does > not correctly inform the ClientHello header. > > So, the suggestion is to check the ClientHello header in the server and > terminate the TLS handshake early. Sounds to me like it's working as designed. ALPN in general is optional; if the client doesn't request it, then you proceed without it. We do require ALPN for direct SSL connections though. We can, because direct SSL connections is a new feature in Postgres. But we cannot require it for the connections negotiated with SSLRequest, or we break compatibility with old clients that don't use ALPN. There is a check in direct SSL mode that ALPN was used (ProcessSSLStartup in backend_startup.c): > if (!port->alpn_used) > { > ereport(COMMERROR, > (errcode(ERRCODE_PROTOCOL_VIOLATION), > errmsg("received direct SSL connection request without ALPN protocol negotiation extension"))); > goto reject; > } That happens immediately after the SSL connection has been established. Hmm. I guess it would be better to abort the connection earlier, without completing the TLS handshake. Otherwise the client might send the first message in wrong protocol to the PostgreSQL server. That's not a security issue for the PostgreSQL server: the server disconnects without reading the message. And I don't see any way for an ALPACA attack when the server ignores the client's message. Nevertheless, from the point of view of keeping the attack surface as small as possible, aborting earlier seems better. -- Heikki Linnakangas Neon (https://neon.tech)
Em seg., 29 de abr. de 2024 às 14:56, Heikki Linnakangas <hlinnaka@iki.fi> escreveu:
On 29/04/2024 20:10, Ranier Vilela wrote:
> Hi,
>
> With TLS 1.3 and others there is possibly a security flaw using ALPN [1].
>
> It seems to me that the ALPN protocol can be bypassed if the client does
> not correctly inform the ClientHello header.
>
> So, the suggestion is to check the ClientHello header in the server and
> terminate the TLS handshake early.
Sounds to me like it's working as designed. ALPN in general is optional;
if the client doesn't request it, then you proceed without it. We do
require ALPN for direct SSL connections though. We can, because direct
SSL connections is a new feature in Postgres. But we cannot require it
for the connections negotiated with SSLRequest, or we break
compatibility with old clients that don't use ALPN.
Ok.
But what if I have a server configured for TLS 1.3 and that requires ALPN to allow access?
What about a client configured without ALPN requiring connection?
What about a client configured without ALPN requiring connection?
There is a check in direct SSL mode that ALPN was used
(ProcessSSLStartup in backend_startup.c):
> if (!port->alpn_used)
> {
> ereport(COMMERROR,
> (errcode(ERRCODE_PROTOCOL_VIOLATION),
> errmsg("received direct SSL connection request without ALPN protocol negotiation extension")));
> goto reject;
> }
That happens immediately after the SSL connection has been established.
Hmm. I guess it would be better to abort the connection earlier, without
completing the TLS handshake. Otherwise the client might send the first
message in wrong protocol to the PostgreSQL server. That's not a
security issue for the PostgreSQL server: the server disconnects without
reading the message. And I don't see any way for an ALPACA attack when
the server ignores the client's message. Nevertheless, from the point of
view of keeping the attack surface as small as possible, aborting
earlier seems better.
So the ClientHello callback is the correct way to determine the end.
best regards,
Ranier Vilela
On 29/04/2024 21:06, Ranier Vilela wrote: > Em seg., 29 de abr. de 2024 às 14:56, Heikki Linnakangas > <hlinnaka@iki.fi <mailto:hlinnaka@iki.fi>> escreveu: > > On 29/04/2024 20:10, Ranier Vilela wrote: > > Hi, > > > > With TLS 1.3 and others there is possibly a security flaw using > ALPN [1]. > > > > It seems to me that the ALPN protocol can be bypassed if the > client does > > not correctly inform the ClientHello header. > > > > So, the suggestion is to check the ClientHello header in the > server and > > terminate the TLS handshake early. > > Sounds to me like it's working as designed. ALPN in general is > optional; > if the client doesn't request it, then you proceed without it. We do > require ALPN for direct SSL connections though. We can, because direct > SSL connections is a new feature in Postgres. But we cannot require it > for the connections negotiated with SSLRequest, or we break > compatibility with old clients that don't use ALPN. > > Ok. > But what if I have a server configured for TLS 1.3 and that requires > ALPN to allow access? > What about a client configured without ALPN requiring connection? Sorry, I don't understand the questions. What about them? -- Heikki Linnakangas Neon (https://neon.tech)
Em seg., 29 de abr. de 2024 às 15:36, Heikki Linnakangas <hlinnaka@iki.fi> escreveu:
On 29/04/2024 21:06, Ranier Vilela wrote:
> Em seg., 29 de abr. de 2024 às 14:56, Heikki Linnakangas
> <hlinnaka@iki.fi <mailto:hlinnaka@iki.fi>> escreveu:
>
> On 29/04/2024 20:10, Ranier Vilela wrote:
> > Hi,
> >
> > With TLS 1.3 and others there is possibly a security flaw using
> ALPN [1].
> >
> > It seems to me that the ALPN protocol can be bypassed if the
> client does
> > not correctly inform the ClientHello header.
> >
> > So, the suggestion is to check the ClientHello header in the
> server and
> > terminate the TLS handshake early.
>
> Sounds to me like it's working as designed. ALPN in general is
> optional;
> if the client doesn't request it, then you proceed without it. We do
> require ALPN for direct SSL connections though. We can, because direct
> SSL connections is a new feature in Postgres. But we cannot require it
> for the connections negotiated with SSLRequest, or we break
> compatibility with old clients that don't use ALPN.
>
> Ok.
> But what if I have a server configured for TLS 1.3 and that requires
> ALPN to allow access?
> What about a client configured without ALPN requiring connection?
Sorry, I don't understand the questions. What about them?
Sorry, I'll try to be clearer.
The way it is designed, can we impose TLS 1.3 and ALPN to allow access to a public server?
And if on the other side we have a client, configured without ALPN,
The way it is designed, can we impose TLS 1.3 and ALPN to allow access to a public server?
And if on the other side we have a client, configured without ALPN,
when requesting access, the server will refuse?
best regards,
Ranier Vilela
On Mon, Apr 29, 2024 at 8:24 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > I was mostly worried about the refactoring of the > retry logic in libpq (and about the pre-existing logic too to be honest, > it was complicated before these changes already). Some changes in the v17 negotiation fallback order caught my eye: 1. For sslmode=prefer, a modern v3 error during negotiation now results in a fallback to plaintext. For v16 this resulted in an immediate failure. (v2 errors retain the v16 behavior.) 2. For gssencmode=prefer, a legacy v2 error during negotiation now results in an immediate failure. In v16 it allowed fallback to SSL or plaintext depending on sslmode. Are both these changes intentional/desirable? Change #1 seems to partially undo the decision made in a49fbaaf: > Don't assume that "E" response to NEGOTIATE_SSL_CODE means pre-7.0 server. > > These days, such a response is far more likely to signify a server-side > problem, such as fork failure. [...] > > Hence, it seems best to just eliminate the assumption that backing off > to non-SSL/2.0 protocol is the way to recover from an "E" response, and > instead treat the server error the same as we would in non-SSL cases. Thanks, --Jacob
On 17/06/2024 17:11, Jacob Champion wrote: > On Mon, Apr 29, 2024 at 8:24 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> I was mostly worried about the refactoring of the >> retry logic in libpq (and about the pre-existing logic too to be honest, >> it was complicated before these changes already). > > Some changes in the v17 negotiation fallback order caught my eye: > > 1. For sslmode=prefer, a modern v3 error during negotiation now > results in a fallback to plaintext. For v16 this resulted in an > immediate failure. (v2 errors retain the v16 behavior.) > 2. For gssencmode=prefer, a legacy v2 error during negotiation now > results in an immediate failure. In v16 it allowed fallback to SSL or > plaintext depending on sslmode. > > Are both these changes intentional/desirable? Change #1 seems to > partially undo the decision made in a49fbaaf: > >> Don't assume that "E" response to NEGOTIATE_SSL_CODE means pre-7.0 server. >> >> These days, such a response is far more likely to signify a server-side >> problem, such as fork failure. [...] >> >> Hence, it seems best to just eliminate the assumption that backing off >> to non-SSL/2.0 protocol is the way to recover from an "E" response, and >> instead treat the server error the same as we would in non-SSL cases. They were not intentional. Let me think about the desirable part :-). By "negotiation", which part of the protocol are we talking about exactly? In the middle of the TLS handshake? After sending the startup packet? I think the behavior with v2 and v3 errors should be the same. And I think an immediate failure is appropriate on any v2/v3 error during negotiation, assuming we don't use those errors for things like "TLS not supported", which would warrant a fallback. -- Heikki Linnakangas Neon (https://neon.tech)
On Mon, Jun 17, 2024 at 8:24 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > By "negotiation", which part of the protocol are we talking about > exactly? In the middle of the TLS handshake? After sending the startup > packet? By "negotiation" I mean the server's response to the startup packet. I.e. "supported"/"not supported"/"error". > I think the behavior with v2 and v3 errors should be the same. And I > think an immediate failure is appropriate on any v2/v3 error during > negotiation, assuming we don't use those errors for things like "TLS not > supported", which would warrant a fallback. For GSS encryption, it was my vague understanding that older servers respond with an error rather than the "not supported" indication. For TLS, though, the decision in a49fbaaf (immediate failure) seemed reasonable. Thanks, --Jacob
Hi, On 2024-04-29 18:24:04 +0300, Heikki Linnakangas wrote: > All reported bugs have now been fixed. We now enforce ALPN in all the right > places. Please let me know if I missed something. Very minor and not really your responsibility: If provided with the necessary key information, wireshark can decode TLS exchanges when using sslnegotiation=postgres but not with direct. Presumably it needs to be taught postgres' ALPN id or something. Example with direct: 476 6513.310308457 192.168.0.113 → 192.168.0.200 48978 5432 142 TLSv1.3 Finished 477 6513.310341492 192.168.0.113 → 192.168.0.200 48978 5432 151 TLSv1.3 Application Data 478 6513.320730295 192.168.0.200 → 192.168.0.113 5432 48978 147 TLSv1.3 New Session Ticket 479 6513.320745684 192.168.0.200 → 192.168.0.113 5432 48978 147 TLSv1.3 New Session Ticket 480 6513.321175713 192.168.0.113 → 192.168.0.200 48978 5432 68 TCP 48978 → 5432 [ACK] Seq=915 Ack=1665 Win=62848 Len=0TSval=3779915421 TSecr=3469016093 481 6513.323161553 192.168.0.200 → 192.168.0.113 5432 48978 518 TLSv1.3 Application Data 482 6513.323626180 192.168.0.113 → 192.168.0.200 48978 5432 125 TLSv1.3 Application Data 483 6513.333977769 192.168.0.200 → 192.168.0.113 5432 48978 273 TLSv1.3 Application Data 484 6513.334581920 192.168.0.113 → 192.168.0.200 48978 5432 95 TLSv1.3 Application Data 485 6513.334666116 192.168.0.113 → 192.168.0.200 48978 5432 92 TLSv1.3 Alert (Level: Warning, Description: Close Notify) Example with postgres: 502 6544.752799560 192.168.0.113 → 192.168.0.200 46300 5432 142 TLSv1.3 Finished 503 6544.752842863 192.168.0.113 → 192.168.0.200 46300 5432 151 PGSQL >? 504 6544.763152222 192.168.0.200 → 192.168.0.113 5432 46300 147 TLSv1.3 New Session Ticket 505 6544.763163155 192.168.0.200 → 192.168.0.113 5432 46300 147 TLSv1.3 New Session Ticket 506 6544.763587595 192.168.0.113 → 192.168.0.200 46300 5432 68 TCP 46300 → 5432 [ACK] Seq=923 Ack=1666 Win=62848 Len=0TSval=3779946864 TSecr=3469047536 507 6544.765024827 192.168.0.200 → 192.168.0.113 5432 46300 518 PGSQL <R/S/S/S/S/S/S/S/S/S/S/S/S/S/S/K/Z 508 6544.766288155 192.168.0.113 → 192.168.0.200 46300 5432 125 PGSQL >Q 509 6544.776974164 192.168.0.200 → 192.168.0.113 5432 46300 273 PGSQL <T/D/D/D/D/D/D/D/D/D/D/C/Z 510 6544.777597927 192.168.0.113 → 192.168.0.200 46300 5432 95 PGSQL >X 511 6544.777631520 192.168.0.113 → 192.168.0.200 46300 5432 92 TLSv1.3 Alert (Level: Warning, Description: Close Notify) Note that in the second one it knows what's inside the "Application Data" messages and decodes them (S: startup, authentication ok, parameters, cancel key, ready for query, C: simple query, S: description, 10 rows, command complete, ready for query). In the GUI you can obviously go into the "postgres messages" in more detail than I know how to do on the console. A second aspect is that I'm not super happy about the hack of stashing data into Port. I think medium term we'd be better off separating out the buffering for unencrypted and encrypted data properly. It turns out that not having any buffering *below* openssl (i.e. the encrypted data) hurts both for the send and receive side, due to a) increased number of syscalls b) too many small packets being sent, as we use TCP_NODELAY c) kernel memory copies being slower due to the small increments. Greetings, Andres Freund
On Mon, Jun 17, 2024 at 9:23 AM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > > I think the behavior with v2 and v3 errors should be the same. And I > > think an immediate failure is appropriate on any v2/v3 error during > > negotiation, assuming we don't use those errors for things like "TLS not > > supported", which would warrant a fallback. > > For GSS encryption, it was my vague understanding that older servers > respond with an error rather than the "not supported" indication. For > TLS, though, the decision in a49fbaaf (immediate failure) seemed > reasonable. Would an open item for this be appropriate? Thanks, --Jacob
On 20/06/2024 20:02, Jacob Champion wrote: > On Mon, Jun 17, 2024 at 9:23 AM Jacob Champion > <jacob.champion@enterprisedb.com> wrote: >>> I think the behavior with v2 and v3 errors should be the same. And I >>> think an immediate failure is appropriate on any v2/v3 error during >>> negotiation, assuming we don't use those errors for things like "TLS not >>> supported", which would warrant a fallback. >> >> For GSS encryption, it was my vague understanding that older servers >> respond with an error rather than the "not supported" indication. For >> TLS, though, the decision in a49fbaaf (immediate failure) seemed >> reasonable. > > Would an open item for this be appropriate? Added. > By "negotiation" I mean the server's response to the startup packet. > I.e. "supported"/"not supported"/"error". Ok, I'm still a little confused, probably a terminology issue. The server doesn't respond with "supported" or "not supported" to the startup packet, that happens earlier. I think you mean the SSLRequst / GSSRequest packet, which is sent *before* the startup packet? >> I think the behavior with v2 and v3 errors should be the same. And I >> think an immediate failure is appropriate on any v2/v3 error during >> negotiation, assuming we don't use those errors for things like "TLS not >> supported", which would warrant a fallback. > > For GSS encryption, it was my vague understanding that older servers > respond with an error rather than the "not supported" indication. For > TLS, though, the decision in a49fbaaf (immediate failure) seemed > reasonable. Hmm, right, GSS encryption was introduced in v12, and older versions respond with an error to a GSSRequest. We probably could make the same assumption for GSS as we did for TLS in a49fbaaf, i.e. that an error means that something's wrong with the server, rather than that it's just very old and doesn't support GSS. But the case for that is a lot weaker case than with TLS. There are still pre-v12 servers out there in the wild. -- Heikki Linnakangas Neon (https://neon.tech)
On Thu, Jun 20, 2024 at 4:13 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > By "negotiation" I mean the server's response to the startup packet. > > I.e. "supported"/"not supported"/"error". > > Ok, I'm still a little confused, probably a terminology issue. The > server doesn't respond with "supported" or "not supported" to the > startup packet, that happens earlier. I think you mean the SSLRequst / > GSSRequest packet, which is sent *before* the startup packet? Yes, sorry. (I'm used to referring to those as startup packets too, ha.) > Hmm, right, GSS encryption was introduced in v12, and older versions > respond with an error to a GSSRequest. > > We probably could make the same assumption for GSS as we did for TLS in > a49fbaaf, i.e. that an error means that something's wrong with the > server, rather than that it's just very old and doesn't support GSS. But > the case for that is a lot weaker case than with TLS. There are still > pre-v12 servers out there in the wild. Right. Since we default to gssencmode=prefer, if you have Kerberos creds in your environment, I think this could potentially break existing software that connects to v11 servers once you upgrade libpq. Thanks, --Jacob
On 21/06/2024 02:32, Jacob Champion wrote: > On Thu, Jun 20, 2024 at 4:13 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >>> By "negotiation" I mean the server's response to the startup packet. >>> I.e. "supported"/"not supported"/"error". >> >> Ok, I'm still a little confused, probably a terminology issue. The >> server doesn't respond with "supported" or "not supported" to the >> startup packet, that happens earlier. I think you mean the SSLRequst / >> GSSRequest packet, which is sent *before* the startup packet? > > Yes, sorry. (I'm used to referring to those as startup packets too, ha.) Yeah I'm not sure what the right term would be. >> Hmm, right, GSS encryption was introduced in v12, and older versions >> respond with an error to a GSSRequest. >> >> We probably could make the same assumption for GSS as we did for TLS in >> a49fbaaf, i.e. that an error means that something's wrong with the >> server, rather than that it's just very old and doesn't support GSS. But >> the case for that is a lot weaker case than with TLS. There are still >> pre-v12 servers out there in the wild. > > Right. Since we default to gssencmode=prefer, if you have Kerberos > creds in your environment, I think this could potentially break > existing software that connects to v11 servers once you upgrade libpq. When you connect to a V11 server and attempt to perform GSSAPI authentication, it will respond with a V3 error that says: "unsupported frontend protocol 1234.5680: server supports 2.0 to 3.0". That was a surprise to me until I tested it just now. I thought that it would respond with a protocol V2 error, but it is not so. The backend sets FrontendProtocol to 1234.5680 before sending the error, and because it is >= 3, the error is sent with protocol version 3. Given that, I think it is a good thing to fail the connection completely on receiving a V2 error. Attached is a patch to fix the other issue, with falling back from SSL to plaintext. And some tests and comment fixes I spotted while at it. 0001: A small comment fix 0002: This is the main patch that fixes the SSL fallback issue 0003: This adds fault injection tests to exercise these early error codepaths. It is not ready to be merged, as it contains a hack to skip locking. See thread at https://www.postgresql.org/message-id/e1ffb822-054e-4006-ac06-50532767f75b%40iki.fi. 0004: More tests, for what happens if the server sends an error after responding "yes" to the SSLRequest or GSSRequest, but before performing the SSL/GSS handshake. Attached is also a little stand-alone perl program that listens on a socket, and when you connect to it, it immediately sends a V2 or V3 error, depending on the argument. That's useful for testing. It could be used as an alternative strategy to the injection points I used in the 0003-0004 patches, but for now I just used it for manual testing. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
I reviewed the documentation for "direct ALPN connections' ', and it looks like it could be improved.
The currently suggested values for "sslnegotiations" are "direct" and "postgres".
The project name is PostgreSQL and the ALPN name is postgresql. Is there a reason why property value uses "postgres"?
Can the value be renamed to postgresql for consistency?
"SSL". Technically, the proper term is TLS, and even the document refers to "IANA TLS ALPN Protocol IDs" (TLS, not SSL).
I would not die on that hill, however, going for tlsnegotiation would look better than sslnegotiation.
Vladimir
On Tue, 25 Jun 2024 at 09:37, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:
I reviewed the documentation for "direct ALPN connections' ', and it looks like it could be improved.The currently suggested values for "sslnegotiations" are "direct" and "postgres".The project name is PostgreSQL and the ALPN name is postgresql. Is there a reason why property value uses "postgres"?Can the value be renamed to postgresql for consistency?
+1 I found it strange that we are not using postgresql
"SSL". Technically, the proper term is TLS, and even the document refers to "IANA TLS ALPN Protocol IDs" (TLS, not SSL).I would not die on that hill, however, going for tlsnegotiation would look better than sslnegotiation.
+1 again, unusual to use SSL when this really is TLS.
Dave
On Thu, Jun 20, 2024 at 4:32 PM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > Thanks, > --Jacob Hey Heikki, [sending this to the list in case it's not just me] I cannot for the life of me get GMail to deliver your latest message, even though I see it on postgresql.org. It's not in spam; it's just gone. I wonder if it's possibly the Perl server script causing virus-scanner issues? --Jacob
On Tue, Jun 25, 2024 at 7:20 AM Dave Cramer <davecramer@postgres.rocks> wrote: > > On Tue, 25 Jun 2024 at 09:37, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote: >> >> "SSL". Technically, the proper term is TLS, and even the document refers to "IANA TLS ALPN Protocol IDs" (TLS, not SSL). >> I would not die on that hill, however, going for tlsnegotiation would look better than sslnegotiation. > > +1 again, unusual to use SSL when this really is TLS. This was sort of litigated last ye-(checks notes) oh no, three years ago: https://www.postgresql.org/message-id/flat/CE12DD5C-4BB3-4166-BC9A-39779568734C%40yesql.se I'm your side when it comes to the use of the TLS acronym, personally, but I think introducing a brand new option that interfaces with sslmode and sslrootcert and etc. while not being named like them would be outright unhelpful. And the idea of switching everything to use TLS in docs seemed to be met with a solid "meh" on the other thread. --Jacob
On Mon, Jun 24, 2024 at 11:30:53PM +0300, Heikki Linnakangas wrote: > Given that, I think it is a good thing to fail the connection completely on > receiving a V2 error. > > Attached is a patch to fix the other issue, with falling back from SSL to > plaintext. And some tests and comment fixes I spotted while at it. > > 0001: A small comment fix Already committed as of cc68ca6d420e. > 0002: This is the main patch that fixes the SSL fallback issue + conn->failed_enc_methods |= conn->allowed_enc_methods & (~conn->current_enc_method); Sounds reasonable to me. It's a bit annoying to have to guess that current_enc_method is tracking only one method at a time (aka these three fields are not documented in libpq-int.h), while allowed_enc_methods and failed_enc_methods is a bitwise combination of the methods that are still allowed or that have already failed. > 0003: This adds fault injection tests to exercise these early error > codepaths. It is not ready to be merged, as it contains a hack to skip > locking. See thread at > https://www.postgresql.org/message-id/e1ffb822-054e-4006-ac06-50532767f75b%40iki.fi. Locking when running an injection point has been replaced by some atomics in 86db52a5062a. + if (IsInjectionPointAttached("backend-initialize-v2-error")) + { + FrontendProtocol = PG_PROTOCOL(2,0); + elog(FATAL, "protocol version 2 error triggered"); + } This is an attempt to do stack manipulation with an injection point set. FrontendProtocol is a global variable, so you could have a new callback setting up this global variable directly, then FATAL (I really don't mind is modules/injection_points finishes with a library of callbacks). Not sure to like much this new IsInjectionPointAttached() that does a search in the existing injection point pool, though. This leads to more code footprint in the core backend, and I'm trying to minimize that. Not everybody agrees with this view, I'd guess, which is also fine. > 0004: More tests, for what happens if the server sends an error after > responding "yes" to the SSLRequest or GSSRequest, but before performing the > SSL/GSS handshake. No objections to these two additions. > Attached is also a little stand-alone perl program that listens on a socket, > and when you connect to it, it immediately sends a V2 or V3 error, depending > on the argument. That's useful for testing. It could be used as an > alternative strategy to the injection points I used in the 0003-0004 > patches, but for now I just used it for manual testing. Nice toy. -- Michael
Attachment
On 16/07/2024 09:54, Michael Paquier wrote: > On Mon, Jun 24, 2024 at 11:30:53PM +0300, Heikki Linnakangas wrote: >> 0002: This is the main patch that fixes the SSL fallback issue > > + conn->failed_enc_methods |= conn->allowed_enc_methods & > (~conn->current_enc_method); > > Sounds reasonable to me. > > It's a bit annoying to have to guess that current_enc_method is > tracking only one method at a time (aka these three fields are not > documented in libpq-int.h), while allowed_enc_methods and > failed_enc_methods is a bitwise combination of the methods that are > still allowed or that have already failed. Yeah. In hindsight I'm still not very happy with the code structure with "allowed_enc_methods" and "current_enc_methods" and all that. The fallback logic is still complicated. It's better than in v16, IMHO, but still not great. This patch seems like the best fix for v17, but I wouldn't mind another round of refactoring for v18, if anyone's got some good ideas on how to structure it better. All these new tests are a great asset when refactoring this again. > + if (IsInjectionPointAttached("backend-initialize-v2-error")) > + { > + FrontendProtocol = PG_PROTOCOL(2,0); > + elog(FATAL, "protocol version 2 error triggered"); > + } > > This is an attempt to do stack manipulation with an injection point > set. FrontendProtocol is a global variable, so you could have a new > callback setting up this global variable directly, then FATAL (I > really don't mind is modules/injection_points finishes with a library > of callbacks). > > Not sure to like much this new IsInjectionPointAttached() that does a > search in the existing injection point pool, though. This leads to > more code footprint in the core backend, and I'm trying to minimize > that. Not everybody agrees with this view, I'd guess, which is also > fine. Yeah, I'm also not too excited about the additional code in the backend, but I'm also not excited about writing another test C module just for this. I'm inclined to commit this as it is, but we can certainly revisit this later, since it's just test code. Here's a new rebased version with some minor cleanup. Notably, I added docs for the new IS_INJECTION_POINT_ATTACHED() macro. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
On Tue, Jul 23, 2024 at 08:32:29PM +0300, Heikki Linnakangas wrote: > All these new tests are a great asset when refactoring this again. Thanks for doing that. The coverage, especially with v2, is going to be really useful. > Yeah, I'm also not too excited about the additional code in the backend, but > I'm also not excited about writing another test C module just for this. I'm > inclined to commit this as it is, but we can certainly revisit this later, > since it's just test code. The point would be to rely on the existing injection_points module, with a new callback in it. The callbacks could be on a file of their own in the module, for clarity. What you have is OK for me anyway, it is good to add more options to developers in this area and this gets used in core. That's also enough to manipulate the stack in or even out of core. > Here's a new rebased version with some minor cleanup. Notably, I added docs > for the new IS_INJECTION_POINT_ATTACHED() macro. 0001 looks OK. + push @events, "backenderror" if $line =~ /error triggered for injection point backend-/; + push @events, "v2error" if $line =~ /protocol version 2 error triggered/; Perhaps append an "injection_" for these two keywords? +#include "storage/proc.h" This inclusion in injection_point.c should not be needed. > sets the FrontendProtocol global variable, but I think it's more > straightforward to have the test code The last sentence in the commit message of 0002 seems to be unfinished. Could you run a perltidy on 005_negotiate_encryption.pl? There are a bunch of new diffs in it. -- Michael
Attachment
On 24/07/2024 02:37, Michael Paquier wrote: > On Tue, Jul 23, 2024 at 08:32:29PM +0300, Heikki Linnakangas wrote: >> All these new tests are a great asset when refactoring this again. > > Thanks for doing that. The coverage, especially with v2, is going to > be really useful. > >> Yeah, I'm also not too excited about the additional code in the backend, but >> I'm also not excited about writing another test C module just for this. I'm >> inclined to commit this as it is, but we can certainly revisit this later, >> since it's just test code. > > The point would be to rely on the existing injection_points module, > with a new callback in it. The callbacks could be on a file of their > own in the module, for clarity. Hmm, do we want injection_points module to be a dumping ground for callbacks that are only useful for very specific injection points, in specific tests? I view it as a more general purpose module, containing callbacks that are useful for many different tests. Don't get me wrong, I'm not necessarily against it, and it would be expedient, that's just not how I see the purpose of injection_points. > What you have is OK for me anyway, it > is good to add more options to developers in this area and this gets > used in core. That's also enough to manipulate the stack in or even > out of core. Ok, I kept it that way. >> Here's a new rebased version with some minor cleanup. Notably, I added docs >> for the new IS_INJECTION_POINT_ATTACHED() macro. > > 0001 looks OK. > > + push @events, "backenderror" if $line =~ /error triggered for > injection point backend-/; > + push @events, "v2error" if $line =~ /protocol version 2 error > triggered/; > > Perhaps append an "injection_" for these two keywords? > > +#include "storage/proc.h" > > This inclusion in injection_point.c should not be needed. > >> sets the FrontendProtocol global variable, but I think it's more >> straightforward to have the test code > > The last sentence in the commit message of 0002 seems to be > unfinished. Fixed. > Could you run a perltidy on 005_negotiate_encryption.pl? There are a > bunch of new diffs in it. Fixed. Committed, thanks for the review, and thanks Jacob for the testing! -- Heikki Linnakangas Neon (https://neon.tech)
On 17/06/2024 21:33, Andres Freund wrote: > If provided with the necessary key information, wireshark can decode TLS > exchanges when using sslnegotiation=postgres but not with direct. Presumably > it needs to be taught postgres' ALPN id or something. I opened https://gitlab.com/wireshark/wireshark/-/merge_requests/16612 to fix that in the wireshark pgsql protocol dissector. -- Heikki Linnakangas Neon (https://neon.tech)