Home > mailing lists

Latest developments in psycopg3 - Mailing list psycopg

From	Daniele Varrazzo
Subject	Latest developments in psycopg3
Date	February 4, 2021 16:55:21
Msg-id	CA+mi_8btbC_-0KheQ3WYZW81fS4B0MyPyuALihmqHTvqq=6VyA@mail.gmail.com Whole thread Raw
Responses	Re: Latest developments in psycopg3
List	psycopg

Tree view

Hello,

(message is getting so long as to require titles, so here they are)

# Latest adaptation woes

I have finally broken the adaptation model where a Python class gets
adapted to a Postgres type. The model is what psycopg2 is based on but
for psycopg3 it was proving unsatisfactory.

The problems were some of the ones discussed in a previous thread,
revolving around the different behaviour of server-side binding
(psycopg3) vs. client-side binding (psycopg2). With client side
binding psycopg2 could get away with a lot that is no longer allowed
by the server; trying to use binary parameters makes it even more
tricky.

The area of the numbers is the most prominent: "select current_date +
%s" only works if the parameter is specified as either int2 or int4.
If it's int8 or numeric, Postgres will complain that there is no cast.
If it's unknown, Postgres will complain that it cannot choose if %s is
a number or an interval.

Other thorny problems are with arrays: you know that in Python there
are lists of any types (we hope they are uniform here) and in Postgres
there are arrays of a specific type. Just the type list doesn't say
enough, and empty lists are an even more special case: you cannot have
an array of unknowns, but without an item in it you cannot really know
its oid. You can specify the whole array to be unknown, with the '{}'
syntax and unknown oid, but only in text, not in binary.

Another funny area is with ranges, as postgres can cast between
timestamp and timestamptz transparently but there's no cast between
tsrange and tstzrange, and Python has a single datetime object which
you have to inspect to know if it's tz aware or naive...

# Introducing two-steps adaptation

For some time, in order to accommodate some of these quirks, there
were special cases in the adaptation functions, but now there is a
generic mechanism which can be used for these Python types whose
Postgres type cannot be determined only by the class but the value
must be observed. Choosing what adapter class to use for the object
'obj' goes this way [1]:

1) start from a key = type(object)

2) look for such key in the adapters map. If you don't find it it's an
error (cannot adapt object 'obj'). If found, you have an object dumper
of a subclass of Dumper.

3) ask the dumper if it needs an upgrade by calling
dumper.get_key(obj). If it returns the same key as in 1) (which is the
base implementation) then there's nothing to do

4) if it returns a different key, call dumper.upgrade(obj) which
returns a new dumper instance.

Details omitted from this implementation are caching (the reason for
the twins get_key/upgrade) and managing the binary/text format: they
can be seen in the implementation in [1].

[1]:
https://github.com/psycopg/psycopg3/blob/68547b8c16d3b3b093fba8d6df1f56cf52b37f08/psycopg3/psycopg3/_transform.py#L131

This schema allows all the pretty things: a Python int can be dumped
as int2, int4, int8, numeric according to its value; empty lists can
be adapted as unknown '{}' if the query is willing to accept text
params, there is a single Range object, instead of DateRange, IntRange
etc...

I have to go back on the project of implementing the Django db backend
for psycopg3, but a lot of what I had done so far in order to
accommodate its differences (often consisting in finding the right
place to add a ::integer cast...) can be probably simplified. I am
absolutely delighted because it goes in the direction of making a
migration from psycopg2 to psycopg3 smoother for many users: the
amount of things I had to do for the Django backend was telling me
that the thing wasn't trivial yet but I think it's improving now.

# Automatic use of binary parameters

Another change happened, related, to the above, is the automatic
selection of text/binary format for the query arguments. Previously
parameters passed to %s placeholders would have been adapted in text,
and people would have opted in to binary parameters by specifying a %b
placeholder. This rigidity didn't allow for the required adaptation
fluidity of the schema above, so now %s allows for automatic
selection: if a type has a binary dumper it will be used, otherwise a
text dumper will be looked up, and adaptation will fail if neither is
available. If people have reasons to use a specific format they can
specify %b or %t as placeholders for the binary or the text format.

# Connecting oid catalog and adaptation contexts

The next changes in the adaptation area will be around unifying the
oids map (mapping type names to oids) with the adaptation context
(mapping types to adaptation objects). At the moment the former is
global and supposed for builtins only, the latter is attached to a
connection or a cursor. This causes a few problems: one is around
looking up custom types in the catalog and registering adapters for
them, because the thing happens in two different places and the global
one is not supposed to be modified. Another is in certain areas where
the types are not dictated by the Python types of the arguments, like
in normal queries, but by what Postgres is willing to accept: it
happens with casting composite types in binary and in binary COPY
FROM, where Postgres doesn't perform any cast. In these places the
array of dumpers to use should be configured in other ways, for
example specifying the list of type names to pass, which may contain
custom data types too.

# jsonb binary loading

The thing I've worked on in the last few days is something entirely
new and exciting instead: finding a way to transfer jsonb in binary
(at the moment jsonb is only passed as text to the client).
Preliminary results seem to suggest a lot of performance to be gained.
(https://github.com/dvarrazzo/jsonb_parser) and it's something I will
explore further with the help of some of the PostgresPro people, who
are keeping on developing jsonb in the backend.

# single-row mode

The libpq allows a single-row mode, but given the batching nature of
the fetch*() methods it's not something of obvious utility. However a
few alternative databases are emerging (Materialise, CockroachDB)
which use the query protocol as a stream: pretty much a never-ending
query. I have added a `stream()` method [2] to the cursor, allowing to
consume these results too in single-row mode.

https://www.psycopg.org/psycopg3/docs/cursor.html#psycopg3.Cursor.stream

# Thank you for the support

Thank you very much to the companies and the individuals sponsoring
this project: https://www.psycopg.org/sponsors/

If anyone would like to contribute, in ways other than funding, there
is also fun code to write (strictly typed, async, Cython if you wish)
and you are welcome to get in touch: I am sure there are practical
ways to contribute!

Cheers

-- Daniele

psycopg by date:

From: Christophe Pettus
Date: 04 February 2021, 16:44:31
Subject: Re: about client-side cursors

From: Daniele Varrazzo
Date: 04 February 2021, 16:57:45
Subject: Re: about client-side cursors

Latest developments in psycopg3 - Mailing list psycopg

Previous

Next