Re: about client-side cursors - Mailing list psycopg
From | Daniele Varrazzo |
---|---|
Subject | Re: about client-side cursors |
Date | |
Msg-id | CA+mi_8a1ery2xvb+AdxMtVXydOdGC7wRMnfXNAvk8v73HsLFyw@mail.gmail.com Whole thread Raw |
In response to | about client-side cursors (Denis Laxalde <denis.laxalde@dalibo.com>) |
Responses |
Re: about client-side cursors
Re: about client-side cursors Re: about client-side cursors |
List | psycopg |
On Wed, 3 Feb 2021 at 17:16, Denis Laxalde <denis.laxalde@dalibo.com> wrote: > I'd like to discuss about the concept of client-side cursor as defined > in psycopg. Welcome Denis! > This is a concept I've always been quite uncomfortable with; > I'll try to explain why. > > First, from a practical point of view: > > cur = connection.cursor() > cur.execute(...) > result = cur.fetchxxx() > > This typical pattern seems awkward to me. Where do IO operations > actually take place? (on execute()? on fetch()? on both?) For a client-side cursor, it happens on execute, but for a server-side cursor it happens in both (execute creates the cursor, I have no idea as to when the server decides to hit the disks, but for the client every fetch is a roundtrip with the server). > Then, and besides the terminology, the fact that named cursors are very > different from client-side cursors while being constructed by the same > API is confusing. I understand that this may come from the DB-API: > > https://www.python.org/dev/peps/pep-0249/#cursor-objects > > [Cursor] objects represent a database cursor. [...] > > But this does not seem quite right to me since a client-side cursor is > actually not a PostgreSQL cursor. (The documention on psycopg2 is clear > on the difference, though.) Yes, that's correct: the name comes from the dbapi. The dbapi cursor does two different things indeed: it runs queries and manages results. I understand that the overlap is sort of weird and everyone who has said "psycopg sucks: let me write my own driver" have used the mental model - much more familiar - of "I ask a query to a connection which gives me a result". So usually what is created is something which can be used as: result = conn.execute(query, vars) for record in result: ... This is understandable, so much that in psycopg3 I've added conn.execute, which is implemented as: def execute(self, query, vars) cur = self.cursor() cur.execute(query, vars) return cur and allow you to use the cursor only to iterate on the data returned, ignoring the execute*() methods. > In asyncio context, while playing around with psycopg3, the situation is > even worse because of the additional async/await keywords; for instance: > > conn = await psycopg3.AsyncConnection.connect() > async with conn: > cur = await conn.execute("SELECT * FROM test") > r = await cur.fetchall() > > Why do we need two 'await' (i.e. two IO operations) on conn.execute() > and cur.fetchall() if 'cur' is a client-side cursor? (Back to my first > point about where IO happen, this appears to depend on whether the > connection has 'autocommit' or not...) I am going a bit back and forth on this mentally, although not in the code. The forces pulling in different directions are: 1) if a method may possibly do I/O activity, then it should be async. I don't know where I have read it and how strong is this guideline, however it makes sense: it's better to have a function to call with await where the await is actually not needed than a function which should have been await but now you are stuck with a blocking interface. 2) if we want to have sync and async cursor, it would be better if they had the same interface The point 2 is pretty true in psycopg2 because there is a single cursor class, and every method is if'ed like: def fetchmany(self, nrecs): if not self.name: # client-side cursor res = self.result[self.current, self.current + nrecs] self.current += nrecs return res else: # server-side cursor result = self.connection.execute(f"FETCH {nrecs} FROM {self.name}") return result I don't think this is a good implementation, so in psycopg3 I would like to have a Cursor interface with a ClientCursor and a ServerCursor implementation, without the if for each method. In async mode, AsyncClientCursor.fetchmany() doesn't actually need to be async, but AsyncServerCursor.fetchmany() does I/O activity. However, would two different implementation classes demand two different interfaces? > All in all, my main point is: (why) do we need client-side cursors? I think client-side cursors are the bread and butter of querying the database. They are a lightweight and stateless way to receive a result: you ask a question, you receive an answer, and all is over. You can put caches between client server, connection pools... Server-side cursors impy a state persisted across calls and shared between clients and server, which is much more heavyweight, scales less for the server, and makes the identity of the connection precious. If any I would do without the server-side cursor, but it would be abdicating a responsibility that the driver should handle IMO, so although I'm not a fan I will definitely implement them from psycopg3. There are choices to be made, and it's one of those points for which I'd be happy to hear other experiences; I haven't asked around yet because I first wanted to implement a sketch of server-side cursors and then look at the 4 class side to side (to side to side) and decide what to do. Let's call them: - SyncClientCursor - AsyncClientCursor - SyncServerCursor - AsyncServerCursor The Server versions have I/O done on fetch*() and on close(). So the first three objects might have non-async fetch(). I was preferring to give all them an uniform interface so the documentation would be "on async communication, always use async when fetching", as opposite as "on async use "await execute()", no await on fetch() for client-side cursor, but "await fetch()` for server-side ones". In practice, I am envisioning a SyncCursor object interface, with SyncClient and SyncServer being two different implementations, and an AsyncCursor interface, with AsyncClient and AsyncServer being two different implementations: @abstract class SyncCursor: def execute() def fetch() class SyncClientCursor(SyncCursor): def execute() def fetch() class SyncClientCursor(SyncCursor): def execute() def fetch() @abstract class AsyncCursor: async def execute() async def fetch() class AsyncClientCursor(AsyncCursor): await def execute() await def fetch() class AsyncClientCursor(AsyncCursor): await def execute() await def fetch() Doing without await on AsyncClientCursor.fetch() creates an asymmetry: class AsyncClientCursor: await def execute() def fetch() class AsyncClientCursor: await def execute() await def fetch() so you couldn't write a function "processResults(acur)" independent of whether the cursor is client or server, whereas you can do it no problem for sync cursors. That's my brain dump! @Denis: it's just more to explain current implementation and mental processes, however in pg3 there is still space to make changes and I'm happy to hear yours and others input. I think that using await uniformly makes things easier to think about and code to compose but I'm happy to hear if there are different opinions. -- Daniele