Thread: BUG #18907: SSL error: bad length failure during transfer data in pipeline mode with libpq

The following bug has been logged on the website:

Bug reference:      18907
Logged by:          Dorjpalam Batbaatar
Email address:      htgn.dbat.95@gmail.com
PostgreSQL version: 16.4
Operating system:   AlmaLinux 9
Description:

When using libpq to transfer large amounts of data to the server in pipeline
mode (registering with COPY), an error "SSL error: bad length"
sometimes occurs. The most common cause of the error is libpq's
PQsendQueryParams(). PostgreSQL is version 16.4.
I looked into this here, and it seems that the cause is that openssl's
SSL_write() is not being retried when it should be.
According to the openssl documentation SSL_write(), if the return value of
SSL_get_error() is SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE,
it must be called again with the same data.
https://docs.openssl.org/3.0/man3/SSL_write/#warnings
In libpq's message sending function pqPutMsgEnd(PGconn *conn), if not all
data has been sent and in non-blocking mode, it just returns,
but in the libpq's exported API (e.g. PQsendQueryGuts() called by
PQsendQueryParams()), pqPutMsgEnd() is called multiple times, so I think the
sent data changes.
So in the above situation, it needs to be retried with the same data, but it
seems that the error occurs because the send data has changed.
As a test, I tried to retry if pqsecure_write() returned 0 in pqSendSome(),
and it ran in pipeline mode without errors. pqSendSome()
is a function which called from pqPutMsgEnd(PGconn *conn) and
pqsecure_write() is called from this. In pqsecure_write() SSL_write() is
performed.
Below is the patch I tried.
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index 488f7d6e55..bbafb189c9 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -914,22 +914,43 @@ pqSendSome(PGconn *conn, int len)
                         * Note that errors here don't result in
write_failed becoming
                         * set.
                         */
-                       if (pqReadData(conn) < 0)
+                       if (sent > 0)
                        {
-                               result = -1;    /* error message already set
up */
-                               break;
-                       }
+                               if (pqReadData(conn) < 0)
+                               {
+                                       result = -1;    /* error message
already set up */
+                                       break;
+                               }
-                       if (pqIsnonblocking(conn))
-                       {
-                               result = 1;
-                               break;
-                       }
+                               if (pqIsnonblocking(conn))
+                               {
+                                       result = 1;
+                                       break;
+                               }
-                       if (pqWait(true, true, conn))
+                               if (pqWait(true, true, conn))
+                               {
+                                       result = -1;
+                                       break;
+                               }
+                       }
+                       else
                        {
-                               result = -1;
-                               break;
+                               /*
+                                * When sent is 0 retry for write. Before
write again read
+                                * which arrived responses from the server
+                                */
+                               if (pqWait(true, true, conn))
+                               {
+                                       result = -1;
+                                       break;
+                               }
+
+                               if (pqReadData(conn) < 0)
+                               {
+                                       result = -1;    /* error message
already set up */
+                                       break;
+                               }
                        }
                }
        }


PG Bug reporting form <noreply@postgresql.org> writes:
> When using libpq to transfer large amounts of data to the server in pipeline
> mode (registering with COPY), an error "SSL error: bad length"
> sometimes occurs.

Could you provide a self-contained test case demonstrating such
failures?  This is not the kind of code that we like to change
on the basis of undocumented claims.

            regards, tom lane



On Tue, Apr 29, 2025 at 11:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Could you provide a self-contained test case demonstrating such
> failures?  This is not the kind of code that we like to change
> on the basis of undocumented claims.

Agreed -- but also, let us know if the answer is "no, I can't", or if
you get stuck and need some additional collaboration. These corner
cases can be really nasty to track down and record.

Thanks,
--Jacob



I am sending a sample program to reproduce the this phenomenon.
The attached archive contains a Makefile to build with PostgreSQL17.
To run the program, all you need is a PostgreSQL17 server with SSL 
connection.
After building, you will have an executable file named 
query-data-send-error.
Please execute it as follows.

./query-data-send-error -i 200 -u 200 -c 
"postgres://postgres:postgres@192.168.0.10/postgres?sslmode=require"

The -i is the number of times to create a test data record,
-u is the number of times to update the test data record,
-c specifies the connection string of the PostgreSQL server to connect to,
respectively.

The sample program does the following
1) Create the test_data table.
2) Register test data in units of 100 records for the number of times 
specified by -i.
3) Repeat updating the registered records for the number of times 
specified by -u.

My environment is as follows
PostgreSQL Server: 17.2
OS: Rocky Linux 9.5 (Blue Onyx)
Kernel: Linux 5.14.0-503.22.1.el9_5.x86_64
Spec: CPU 4vCore/Memory 8G/HDD 400G

At runtime, the following error occurs when updating.
Line : 552
SSL error: bad length
SSL SYSCALL error: EOF detected

Depending on the timing, this error may not occur, but if the number of
times is increased, will occur almost every time.

On 2025/04/30 3:48, Jacob Champion wrote:
> On Tue, Apr 29, 2025 at 11:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Could you provide a self-contained test case demonstrating such
>> failures?  This is not the kind of code that we like to change
>> on the basis of undocumented claims.
> Agreed -- but also, let us know if the answer is "no, I can't", or if
> you get stuck and need some additional collaboration. These corner
> cases can be really nasty to track down and record.
>
> Thanks,
> --Jacob

Attachment