Thread: Libpq single-row mode slowness

Libpq single-row mode slowness

From

Daniele Varrazzo

Date:

01 May 2022, 20:35:52

Hello,

a Psycopg 3 user has tested what boils down pretty much to a
"generate_series(100K)" and has reported a 100x difference between
fetching it normally and in single-row mode. I have repeated the test
myself and I have found a 50x difference (the OP doesn't specify which
platform is using, mine is Linux).

Of course calling PQconsumeInput 100K times has some overhead compared
to calling it only once. However, I would like to know if this level
of overhead is expected, or if instead anyone smells some wasted
cycles.

According to some profiling, the largest part of the time is spent
inside a libc function I don't know the symbol of, called by
pqReadData(). Details and pretty graphs are available at
https://github.com/psycopg/psycopg/issues/286

The operations we perform, for every row, are PQconsumeInput,
PQisBusy, PQgetResult. Every PQconsumeInput results in a recvfrom()
syscall, of which the first one returns the whole recordset, the
following ones return EAGAIN. There are two extra cycles: one to get
the TUPLES_OK result, one to get the end-of-stream NULL. It seems the
documented usage pattern, but I'm not sure if I'm not misreading it,
especially in the light of this libpq grumble [1].

[1] https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-misc.c#L681

Our connection is in non-blocking mode and we see the need for waiting
(using epoll here) only on the first call. The resulting strace of the
entire query process (of two rows) is:

21:36:53.659529 sendto(3, "P\0\0\0>\0select 'test' a, t b from "...,
108, MSG_NOSIGNAL, NULL, 0) = 108
21:36:53.660236 recvfrom(3, 0x1f6a870, 16384, 0, NULL, NULL) = -1
EAGAIN (Resource temporarily unavailable)
21:36:53.660589 epoll_create1(EPOLL_CLOEXEC) = 4
21:36:53.660848 epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLONESHOT,
{u32=3, u64=3}}) = 0
21:36:53.661099 epoll_wait(4, [{EPOLLIN, {u32=3, u64=3}}], 1023, 100000) = 1
21:36:53.661941 recvfrom(3,
"1\0\0\0\0042\0\0\0\4T\0\0\0.\0\2a\0\0\0\0\0\0\0\0\0\0\31\377\377\377"...,
16384, 0, NULL, NULL) = 117
21:36:53.662506 close(4)                = 0
21:36:53.662830 recvfrom(3, 0x1f6a898, 16344, 0, NULL, NULL) = -1
EAGAIN (Resource temporarily unavailable)
21:36:53.663162 recvfrom(3, 0x1f6a884, 16364, 0, NULL, NULL) = -1
EAGAIN (Resource temporarily unavailable)
21:36:53.663359 recvfrom(3, 0x1f6a876, 16378, 0, NULL, NULL) = -1
EAGAIN (Resource temporarily unavailable)

The test is on a localhost connection with sslmode disabled using libpq 14.2.

Is this the correct usage? Any insight is welcome.

Thank you very much!

-- Daniele

Re: Libpq single-row mode slowness

From

Tom Lane

Date:

01 May 2022, 21:12:26

Daniele Varrazzo <daniele.varrazzo@gmail.com> writes:
> The operations we perform, for every row, are PQconsumeInput,
> PQisBusy, PQgetResult. Every PQconsumeInput results in a recvfrom()
> syscall, of which the first one returns the whole recordset, the
> following ones return EAGAIN. There are two extra cycles: one to get
> the TUPLES_OK result, one to get the end-of-stream NULL. It seems the
> documented usage pattern, but I'm not sure if I'm not misreading it,
> especially in the light of this libpq grumble [1].

The usual expectation is that you call PQconsumeInput to get rid of
a read-ready condition on the socket.  If you don't have a poll() or
select() or the like in the loop, you might be wasting a lot of
pointless recvfrom calls.  You definitely don't need to call
PQconsumeInput if PQisBusy is already saying that a result is available,
and in single-row mode it's likely that several results can be consumed
per recvfrom call.

            regards, tom lane

Re: Libpq single-row mode slowness

From

Daniele Varrazzo

Date:

01 May 2022, 23:51:41

On Sun, 1 May 2022 at 23:12, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> The usual expectation is that you call PQconsumeInput to get rid of
> a read-ready condition on the socket.  If you don't have a poll() or
> select() or the like in the loop, you might be wasting a lot of
> pointless recvfrom calls.  You definitely don't need to call
> PQconsumeInput if PQisBusy is already saying that a result is available,
> and in single-row mode it's likely that several results can be consumed
> per recvfrom call.

This makes sense and, with some refactoring of our fetch loop, the
overhead of using single-row mode is now down to about 3x, likely
caused by the greater overhead in Python calls.

Please note that the insight you gave in your answer seems to
contradict the documentation. Some excerpts of
https://www.postgresql.org/docs/current/libpq-async.html:

"""
PQconsumeInput:  "After calling PQconsumeInput , the application can
check PQisBusy and/or PQnotifies to see if their state has changed"

PQisBusy: "will not itself attempt to read data from the server;
therefore PQconsumeInput must be invoked first, or the busy state will
never end."

...
A typical application [will use select()]. When the main loop detects
input ready, it should call PQconsumeInput to read the input. It can
then call PQisBusy, followed by PQgetResult if PQisBusy returns false
(0).
"""

All these indications give the impression that there is a sort of
mandatory order, requiring to call first PQconsumeInput, then
PQisBusy. As a consequence, the core of our function to fetch a single
result was implemented as:

```
def fetch(pgconn):
    while True:
        pgconn.consume_input()
        if not pgconn.is_busy():
            break
        yield Wait.R

    return pgconn.get_result()
```

(Where the `yield Wait.R` suspends this execution to call into
select() or whatever waiting policy the program is using.)

Your remarks suggest that PQisBusy() can be called before
PQconsumeInput(), and that the latter doesn't need to be called if not
busy. As such I have modified the loop to be:

```
def fetch(pgconn):
    if pgconn.is_busy():
        yield Wait.R
        while True:
            pgconn.consume_input()
            if not pgconn.is_busy():
                break
            yield Wait.R

    return pgconn.get_result()
```

which seems to work well: tests don't show regressions and single-row
mode doesn't waste recvfrom() anymore.

Is this new fetching pattern the expected way to interact with the libpq?

If so, should we improve the documentation to suggest that there are
reasons to call PQisBusy before PQconsumeInput? Especially in the
single-row mode docs page, which doesn't make relevant mentions to the
use of these functions.

Thank you very much for your help, really appreciated.

-- Daniele