Latest developments in psycopg3 - Mailing list psycopg
From | Daniele Varrazzo |
---|---|
Subject | Latest developments in psycopg3 |
Date | |
Msg-id | CA+mi_8btbC_-0KheQ3WYZW81fS4B0MyPyuALihmqHTvqq=6VyA@mail.gmail.com Whole thread Raw |
Responses |
Re: Latest developments in psycopg3
|
List | psycopg |
Hello, (message is getting so long as to require titles, so here they are) # Latest adaptation woes I have finally broken the adaptation model where a Python class gets adapted to a Postgres type. The model is what psycopg2 is based on but for psycopg3 it was proving unsatisfactory. The problems were some of the ones discussed in a previous thread, revolving around the different behaviour of server-side binding (psycopg3) vs. client-side binding (psycopg2). With client side binding psycopg2 could get away with a lot that is no longer allowed by the server; trying to use binary parameters makes it even more tricky. The area of the numbers is the most prominent: "select current_date + %s" only works if the parameter is specified as either int2 or int4. If it's int8 or numeric, Postgres will complain that there is no cast. If it's unknown, Postgres will complain that it cannot choose if %s is a number or an interval. Other thorny problems are with arrays: you know that in Python there are lists of any types (we hope they are uniform here) and in Postgres there are arrays of a specific type. Just the type list doesn't say enough, and empty lists are an even more special case: you cannot have an array of unknowns, but without an item in it you cannot really know its oid. You can specify the whole array to be unknown, with the '{}' syntax and unknown oid, but only in text, not in binary. Another funny area is with ranges, as postgres can cast between timestamp and timestamptz transparently but there's no cast between tsrange and tstzrange, and Python has a single datetime object which you have to inspect to know if it's tz aware or naive... # Introducing two-steps adaptation For some time, in order to accommodate some of these quirks, there were special cases in the adaptation functions, but now there is a generic mechanism which can be used for these Python types whose Postgres type cannot be determined only by the class but the value must be observed. Choosing what adapter class to use for the object 'obj' goes this way [1]: 1) start from a key = type(object) 2) look for such key in the adapters map. If you don't find it it's an error (cannot adapt object 'obj'). If found, you have an object dumper of a subclass of Dumper. 3) ask the dumper if it needs an upgrade by calling dumper.get_key(obj). If it returns the same key as in 1) (which is the base implementation) then there's nothing to do 4) if it returns a different key, call dumper.upgrade(obj) which returns a new dumper instance. Details omitted from this implementation are caching (the reason for the twins get_key/upgrade) and managing the binary/text format: they can be seen in the implementation in [1]. [1]: https://github.com/psycopg/psycopg3/blob/68547b8c16d3b3b093fba8d6df1f56cf52b37f08/psycopg3/psycopg3/_transform.py#L131 This schema allows all the pretty things: a Python int can be dumped as int2, int4, int8, numeric according to its value; empty lists can be adapted as unknown '{}' if the query is willing to accept text params, there is a single Range object, instead of DateRange, IntRange etc... I have to go back on the project of implementing the Django db backend for psycopg3, but a lot of what I had done so far in order to accommodate its differences (often consisting in finding the right place to add a ::integer cast...) can be probably simplified. I am absolutely delighted because it goes in the direction of making a migration from psycopg2 to psycopg3 smoother for many users: the amount of things I had to do for the Django backend was telling me that the thing wasn't trivial yet but I think it's improving now. # Automatic use of binary parameters Another change happened, related, to the above, is the automatic selection of text/binary format for the query arguments. Previously parameters passed to %s placeholders would have been adapted in text, and people would have opted in to binary parameters by specifying a %b placeholder. This rigidity didn't allow for the required adaptation fluidity of the schema above, so now %s allows for automatic selection: if a type has a binary dumper it will be used, otherwise a text dumper will be looked up, and adaptation will fail if neither is available. If people have reasons to use a specific format they can specify %b or %t as placeholders for the binary or the text format. # Connecting oid catalog and adaptation contexts The next changes in the adaptation area will be around unifying the oids map (mapping type names to oids) with the adaptation context (mapping types to adaptation objects). At the moment the former is global and supposed for builtins only, the latter is attached to a connection or a cursor. This causes a few problems: one is around looking up custom types in the catalog and registering adapters for them, because the thing happens in two different places and the global one is not supposed to be modified. Another is in certain areas where the types are not dictated by the Python types of the arguments, like in normal queries, but by what Postgres is willing to accept: it happens with casting composite types in binary and in binary COPY FROM, where Postgres doesn't perform any cast. In these places the array of dumpers to use should be configured in other ways, for example specifying the list of type names to pass, which may contain custom data types too. # jsonb binary loading The thing I've worked on in the last few days is something entirely new and exciting instead: finding a way to transfer jsonb in binary (at the moment jsonb is only passed as text to the client). Preliminary results seem to suggest a lot of performance to be gained. (https://github.com/dvarrazzo/jsonb_parser) and it's something I will explore further with the help of some of the PostgresPro people, who are keeping on developing jsonb in the backend. # single-row mode The libpq allows a single-row mode, but given the batching nature of the fetch*() methods it's not something of obvious utility. However a few alternative databases are emerging (Materialise, CockroachDB) which use the query protocol as a stream: pretty much a never-ending query. I have added a `stream()` method [2] to the cursor, allowing to consume these results too in single-row mode. https://www.psycopg.org/psycopg3/docs/cursor.html#psycopg3.Cursor.stream # Thank you for the support Thank you very much to the companies and the individuals sponsoring this project: https://www.psycopg.org/sponsors/ If anyone would like to contribute, in ways other than funding, there is also fun code to write (strictly typed, async, Cython if you wish) and you are welcome to get in touch: I am sure there are practical ways to contribute! Cheers -- Daniele