Re: about client-side cursors - Mailing list psycopg
From | Denis Laxalde |
---|---|
Subject | Re: about client-side cursors |
Date | |
Msg-id | 20210204172135.bl2boudpyw4vfahv@dalibo.com Whole thread Raw |
In response to | about client-side cursors (Denis Laxalde <denis.laxalde@dalibo.com>) |
Responses |
Re: about client-side cursors
|
List | psycopg |
Daniele Varrazzo a écrit : > On Thu, 4 Feb 2021 at 12:16, Denis Laxalde <denis.laxalde@dalibo.com> wrote: > > > Daniele Varrazzo a écrit : > > > This is understandable, so much that in psycopg3 I've added > > > conn.execute, which is implemented as: > > > I am aware of that and that's indeed a nice addition. As I mentioned in > > the thread introducing this feature, I still think that returning a > > (client-side) cursor here is confusing. > > You can think about the cursor as a result in that case. I understand > your dissatisfaction with it is a) the names b) the async. Let's see > if there is something we can do around these points. > > [...] > > It seems to me that connection.execute() provides an implementation to > > that case, except that it'd better if it did not return a misnamed > > cursor but some kind of "result set" or "rows" object. As you mentioned, > > some other Python drivers already do this. > > > > Alternatively (or in complement), how about renaming connection.cursor() > > as connection.client() for client-side cursors (while keep cursor() for > > server-side ones)? That would fulfill the second statement above, I > > think. (Or is it something closer to a "portal", as defined in the > > Extended Query protocol? Maybe not...) > > ... while I want to use the psycopg3 chance to improve what can be > done better than psycopg2, given the ways Python and Postgres have > changed in several years, I don't want to introduce gratuitous > changes. I would still like for psycopg3 to be a drop-in replacement > for psycopg2, and while there are differences enough to require > testing your application, I don't want to require the users to change > *every single call* to psycopg2. I understand that getone(), > getmany(), getall() would be more apt names than fetch*() in case > their implementation says they are client-side, but is it really worth > to ask the users to grep their codebases to replace all the > occurrences? As things stand now, a lot of relatively simple code can > run unmodified with either pg2 and 3 I understand that; keeping the user base by providing a smooth migration path is highly important. (As I said in another message, I'm not suggesting to entirely drop the compatibility.) So renaming client-side cursors to something else is not an option (too much of a "gratuitous" change), that's a fair point. However, could we have connection.execute() not return a cursor (with fetch*() methods)? It seems like a thin adaptation to me, apparently not needing bw compat, and a good step forwards "cursorless querying". > With the async objects we can have more freedom because there is no > precedent code (I believe the interface is the same exposed by aiopg, > the async psycopg2 wrapper, but maintaining compatibility with it is > not a thing I feel required). > > > > Looking around in other programming languages, the situation is similar: > > > > * Golang's libpq https://godocs.io/github.com/lib/pq follows > > https://github.com/golang/go/wiki/SQLInterface : queries are executed > > on the connection object and Rows object are returned. > > > > * In Rust, there is https://docs.rs/crate/postgres/0.19.0 (sync) which > > is quite similar and https://docs.rs/tokio-postgres/0.7.0/tokio_postgres/ > > (async), which exposes a Client type to execute queries. > > > > With psycopg3 coming up, I believe it'd be a good opportunity to improve > > that. > > tokio_postgres makes me curious about pipelining... > > However those seem all interfaces which don't hide the nature of > cursorless querying. If we have an adapter which can work both cursor > and cursorless I think it is good that they offer the same interface > to obtain the results, even if the way they obtain has a different > implementation. This concept of "cursorless querying" is new to me and I somehow see the situation reversed: why would we expose an artificial thing (the client-side cursor)? (Rhetorical question, feel free to ignore.) We should make "cursorless querying" more explicit! (Building on connection.execute().) > > > 1) if a method may possibly do I/O activity, then it should be async. > > > I don't know where I have read it and how strong is this guideline, > > > however it makes sense: it's better to have a function to call with > > > await where the await is actually not needed than a function which > > > should have been await but now you are stuck with a blocking > > > interface. > > > > I gently disagree here. This is essentially the same issue that having > > cursor.fetch*() not fetching anything, but it's worse than "just" a > > naming problem because it's on the control flow. > > > > For instance, consider the following examples with and without > > autocommit on the connection to demonstrate where IO takes place: > > > > conn = await psycopg3.AsyncConnection.connect(autocommit=False) > > async with conn: > > cur = await conn.execute("SELECT 1") > > rows = await cur.fetchall() # IO happens here > > > > conn = await psycopg3.AsyncConnection.connect(autocommit=True) > > async with conn: > > cur = await conn.execute("SELECT 1") # IO happens here > > rows = await cur.fetchall() > > > > This is quite unexpected, and behaves differently on sync cursors (IO > > always happens on execute()). > > I think that's wrong, I don't believe it changes with autocommit. > No-autocommit fires an extra begin together with the execute but > doesn't change the fetch behaviour. IO on execute happens regardless > of the type of connection and cursor, IO on fetch happens only for > server-side cursors. Well, maybe I'm missing something... In the examples above, (written down explicitly to understand where IO happens), if I shut down postgres between 'await conn.execute()' and 'await cur.fetchall()', the first example breaks but the second doesn't. Perhaps the autocommit mention was misleading; it's enough to insert 'await conn.commit()' before 'await cur.fetchall()' to reproduce. So (and again, unless I'm missing something), if this is not "by design", maybe this is bug? > > > 2) if we want to have client and server cursor, it would be better if > > > they had the same interface > > > > (Unless we drop the client-side cursor, or rename it.) > > > > > The point 2 is pretty true in psycopg2 because there is a single > > > cursor class, and every method is if'ed like: > > > > > > def fetchmany(self, nrecs): > > > if not self.name: > > > # client-side cursor > > > res = self.result[self.current, self.current + nrecs] > > > self.current += nrecs > > > return res > > > else: > > > # server-side cursor > > > result = self.connection.execute(f"FETCH {nrecs} FROM {self.name}") > > > return result > > > > > > I don't think this is a good implementation, so in psycopg3 I would > > > like to have a Cursor interface with a ClientCursor and a ServerCursor > > > implementation, without the if for each method. In async mode, > > > AsyncClientCursor.fetchmany() doesn't actually need to be async, but > > > AsyncServerCursor.fetchmany() does I/O activity. However, would two > > > different implementation classes demand two different interfaces? > > > > These are not just two different implementations, these are two > > different things. So, yes, different interfaces (and names) would be > > better, IMHO. > > But for an user who is not overly concerned with when I/O happen > (assuming that they know where it happens and that they are fine with > it, not wanting to keep them in the dark) having the same interface > means that their application can be written without being concerned by > whether a server or client cursor was used. > > You seem to want to implement two diverging and leaky interfaces, > modeled on what effectively happens in I/O and not providing any > leeway. The price to pay for it is to have the async keyword on > methods which in a specific implementation don't actually do I/O but > 1) it's cheap, you don't pay much for it, it doesn't really do I/O and > 2) it allows the possibility to write a cursor subclass which actually > does IO on input, for instance a catalog lookup on fetch(). But even > logging to a remote logger is an I/O operation, so if you don't have > `async fetchone()` you cannot write a subclass of these objects to do > it: you have to write a wrapper instead, with a necessarily different > interface. I think we would be painting ourselves in a corner. Sorry, I don't buy that point. To me, slightly exaggerating, it seems like we should make any methods of an interface 'async' just in case someones wants to fire some IO in a subclass. In other words, having IO entry-points properly identified follows the principle of least surprise (or "explicit is better than implicit", to take over from the famous zen). > > > > All in all, my main point is: (why) do we need client-side cursors? > > > > > > I think client-side cursors are the bread and butter of querying the > > > database. They are a lightweight and stateless way to receive a > > > result: you ask a question, you receive an answer, and all is over. > > > > Ok. But then, they are just a proxy to the actual results already > > fetched upon execute() with no added value, aren't they? > > Yes, they are just a useful container for a resultset. It's a useful > object to have, regardless of its name. > > > > > You can put caches between client server, connection pools... > > > > How is this related to client-side cursors? > > They are things you cannot do with a server-side cursor, and in my > experience much more frequently used. When I say "cursor" I mean > "result" here, really, as in "you can have these things if you fetch > your result entirely". > > > > So my humble opinion is that naming and control flow should be explicit. > > For instance, we could drop the fetch*() method from client-side cursors > > (to be renamed) and add result access methods: > > > > class AsyncClient: > > await def execute() > > def first() # return first row > > def all() # return all rows > > > > while keeping the cursor term for real cursors: > > > > class AsyncCursor: > > await def execute() > > await def fetch() > > This requires everyone to rewrite their program in order to use > psycopg3. I think that the adoption of the adapter would be zero if > there isn't a frictionless way to use the new version, and I am trying > to design the adapter to be easy to upgrade. This might imply some non > trivial work internally but I think it's necessary if we want a > non-zero chance of the adapter to be considered by the industry. As far as the async interface is concerned, I think there is no adoption issue because there's no precedent use from psycopg2. So we could expose two API: cursorless querying ('await conn.execute()') and have a single server-side cursor class. Cheers, Denis