Thread: BUG #18096: In edge-triggered epoll and kqueue, PQconsumeInput/PQisBusy are insufficient for correct async ops.
BUG #18096: In edge-triggered epoll and kqueue, PQconsumeInput/PQisBusy are insufficient for correct async ops.
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18096 Logged by: Masatoshi Fukunaga Email address: mah0x211@gmail.com PostgreSQL version: 14.4 Operating system: macOS 13.5.1, Ubuntu 22.04 Description: When processing asynchronous commands, I call the `PQconsumeInput` and `PQisBusy` functions to check if data has arrived, as shown below, but this does not work correctly in edge trigger mode for epoll and kqueue. In the edge trigger mode of epoll and kqueue, calls to the `PQconsumeInput()` and `PQisBusy()` funct I believe the following code is correct in the way it is instructed in the manual. > 34.4. Asynchronous Command Processing, the following is written. > https://www.postgresql.org/docs/current/libpq-async.html ```C // on edge-trigger mode, this code does not work correctly /** * check if the result is readable or not * @return 1: readable, 0: not readable, -1: error */ int is_readable(PGconn *conn) { if (!PQconsumeInput(conn)) { // caller should call PQerrorMessage to get error message return -1; } else if (!PQisBusy(conn)) { // caller can call PQgetResult to get the result return 1; } // caller should be wait for the socket to become readable return 0; } ``` The `PQconsumeInput()` function reads input data by calling the `pqReadData()` function internally and using the `pqsecure_read()` function is used to read the input data. However, the `pqReadData()` function will not call the `pqsecure_read()` function until the `errno` is set to `EAGAIN` or `EWOULDBLOCK`, so if you poll after the `PQisBusy()` call returns `1`, readable event will not fire and will be permanently in a wait state. By the way, I am aware that even if the result of a read by `pqsecure_read()` does not result in `EAGAIN` or `EWOULDBLOCK`, the event will still be raised if all the data in the socket has been read. The problem seems to be that `PQisBusy()` is returning `1`, but the preceding call to `PQconsumeInput()` has not read all the data in the socket. So, if I check the errno and branch the process as follows, it works fine. ```C /** * check if the result is readable or not * @return 1: readable, 0: not readable, -1: error */ int is_readable(PGconn *conn) { int should_retry = 0; RETRY: errno = 0; if (!PQconsumeInput(conn)) { // caller should call PQerrorMessage to get error message return -1; } should_retry = errno != EAGAIN && errno != EWOULDBLOCK; if (!PQisBusy(conn)) { // caller can call PQgetResult to get the result return 1; } else if(should_retry) { // it is necessary to retry because the data has not been read completely goto RETRY; } // caller should be wait for the socket to become readable return 0; } ```
Re: BUG #18096: In edge-triggered epoll and kqueue, PQconsumeInput/PQisBusy are insufficient for correct async ops.
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > When processing asynchronous commands, I call the `PQconsumeInput` and > `PQisBusy` functions to check if data has arrived, as shown below, but this > does not work correctly in edge trigger mode for epoll and kqueue. You have not really provided any evidence of a bug. The contract for PQconsumeInput is that it will consume *some* input if any is available, not that it will consume *all* available input. (I don't think there is much reason to try to change that. In the first place, there might not be enough buffer space, and in the second place, even if it did consume all input, more might arrive immediately after it looks.) Without a self-contained test case, it's hard to be sure what is going wrong for you; but my guess is that this is a bug in the way you are checking for more available input rather than something libpq did wrong. > However, the `pqReadData()` function will not call the `pqsecure_read()` > function until the `errno` is set to `EAGAIN` or `EWOULDBLOCK`, Uh ... what? regards, tom lane
Re: BUG #18096: In edge-triggered epoll and kqueue, PQconsumeInput/PQisBusy are insufficient for correct async ops.
From
mah0x211
Date:
Thanks for the reply.
I'm not good at English, so I am using machine translation to correct it.
some sentences may be difficult to understand.
The following test code performs asynchronous operations using libpq along
with either epoll or kqueue. I wasn't sure if it's appropriate to include
a large code in an email, so I've uploaded the test code to a gist.
I can easily reproduce the issue on my macOS system which using kqueue, but
it takes many runs to reproduce on my Linux system using epoll.
In the case of edge-triggered mode, `consume_input()` may fail depending on
the amount of data received. This happens when `PQconsumeInput()` doesn't
read all the data received on the socket (The size of the received data of
the socket is larger than the size of the buffer area), and the subsequent
call to `PQisBusy()` returns `1`. Then, waiting for a socket read event,
it fails with a timeout.
In the case of level-triggered mode, there's no problem as events will be
continuously generated while data remains in the socket.
The main issue here is whether to wait for data to arrive in the main loop or
to call `PQconsumeInput()` again. This decision requires checking errno on
the application side.
Is there any other way to resolve this issue?
2023年9月8日(金) 23:33 Tom Lane <tgl@sss.pgh.pa.us>:
PG Bug reporting form <noreply@postgresql.org> writes:
> When processing asynchronous commands, I call the `PQconsumeInput` and
> `PQisBusy` functions to check if data has arrived, as shown below, but this
> does not work correctly in edge trigger mode for epoll and kqueue.
You have not really provided any evidence of a bug. The contract
for PQconsumeInput is that it will consume *some* input if any
is available, not that it will consume *all* available input.
(I don't think there is much reason to try to change that. In the
first place, there might not be enough buffer space, and in the
second place, even if it did consume all input, more might arrive
immediately after it looks.)
Without a self-contained test case, it's hard to be sure what is going
wrong for you; but my guess is that this is a bug in the way you are
checking for more available input rather than something libpq did
wrong.
> However, the `pqReadData()` function will not call the `pqsecure_read()`
> function until the `errno` is set to `EAGAIN` or `EWOULDBLOCK`,
Uh ... what?
regards, tom lane