Thread: Using real libpq parameters
Hello, I've tried to make order in the ideas about pros and cons of using real query parameters in libpq. Here is my analysis and a possible plan. Currently Psycopg uses only the PQexec [1] libpq function to send queries to the backend: when a query with parameters is executed, psycopg adapts every parameter value to a sql literal (the protocol name is ISQLQuote) [2], merges the parameters to the query and sends everything to the backend using PQexec. This strategy has largely proven satisfying, providing the greatest flexibility for the adapter to write the query. Using PQexecParams [3] instead of PQexec would have several problems and the behavior of Psycopg would change in a non-backward compatible way. PQexecParams only accepts one non-empty command while cur.execute() allows any number of queries separated by semicolons. Also passing complex data types seems more... complex. An example is with the array: because psycopg sends a complete query to the backend, it can use the array[...] syntax where every element is escaped in a standard way [4] but the array[] constructor doesn't seem accepted as a parameter. In other places psycopg uses constructors that wouldn't be valid as parameter values (e.g. '2010-01-01'::date for date representation): in some cases the python type could be converted into the proper paramTypes entry for disambiguation (in execute, but probably not in executemany). The "default" literal is another handy feature not accepted as parameter. The bottom line is that adapting Python values for PQexec and for PQexecParams are two entirely different businesses and adapters currently implemented for ISQLQuote wouldn't be compatible. So we have incompatibilities in the queries that can be executed and in the adapters that users may have extended. I think this rules out entirely the idea of using PQexecParams as the default exec function, at least until psycopg3000. What would be the advantages? One is that large strings or binary values would be treated more efficiently in the backend, as they wouldn't bloat the parser. Another is that we may start supporting prepared statements: I would avoid an explicit .prepare() method exposed to Python (as the PREPARE/EXECUTE sql statements can be already used for this task), but for instance .executemany() could use internally PQprepare/PQexecPrepared, whose interface is similar to PQexecParams. Luckily (well, it wasn't luck at all: thanks to Federico's design) psycopg already has two extension mechanisms to cope with having both the execution methods: one is the possibility to subclass connections/cursors and thus have a cursor supporting PQexecParams in execute() and PQprepare in executemany() (let's call it ParamCursor). Another one is the completely generic adaptation mechanism of Python objects [5]: there could be a separate protocol (let's say ISQLParam) to which python object passed to the query should be conformed: conforming object may expose different methods from the ones required by ISQLQuote, that may be needed to interface with PQexecParam. Keeping the two protocols well separated would also allow us to use the PQexec-based side in ways we preferred to avoid before. For instance there's often been the request of a way to pass identifiers to a query e.g. to represents table or field names: I remember the request being discarded precisely on the basis that switching to "proper parameters" the method would have broke. Keeping two protocols we may have, for instance, an "Identifier" wrapper allowing execute("insert into %s values (%s)", [Identifier('foo'), 'bar']): the method would only work on the classic cursor, not the param-based one, and this would be implemented by making the Identifier object conform to ISQLQuote but not to ISQLParam. PQexecParams also allows binary representation for the parameters. I know they would be more efficient than textual types, but it is an implementation leak probably useful for a specific program talking to a specific server, but not for a generic library. So I would rule out its usage for every data type except for bytea. This of course will not be implemented tonight and won't be released in Psycopg 2.4 :) Comments and discussion are appreciated. Cheers. -- Daniele [1] http://www.postgresql.org/docs/9.0/static/libpq-exec.html#LIBPQ-PQEXEC [2] http://initd.org/psycopg/docs/extensions.html#sql-adaptation-protocol-objects [3] http://www.postgresql.org/docs/9.0/static/libpq-exec.html#LIBPQ-PQEXECPARAMS [4] http://www.postgresql.org/docs/9.0/static/arrays.html#ARRAYS-IO [5] http://www.python.org/dev/peps/pep-0246/
On Feb 26, 2011, at 6:40 PM, Daniele Varrazzo wrote: > Hello, > > I've tried to make order in the ideas about pros and cons of using > real query parameters in libpq. Here is my analysis and a possible > plan. > <snip> > Comments and discussion are appreciated. Cheers. Have you looked at libpqtypes? It makes dealing with complex datatypes trivial. http://libpqtypes.esilo.com/ Cheers, M
On Sunday 27 February 2011, Daniele Varrazzo wrote: > Hello, > > I've tried to make order in the ideas about pros and cons of using > real query parameters in libpq. Here is my analysis and a possible > plan. > ... Sorry not to have answered this thread earlier, I've had some trouble with kmail+akonadi which kept me silent for a couple of days :S Yes, we have /already/ done a few experiments on using binary data types with psycopg2. A few remarks (merely what Daniele has said): 1. When used properly, binary datatypes have significant speed gain over text ones. But not in all cases. 2. Not all datatypes can go binary. That's not a problem in the send direction, because we can control that with a boolean flag. 3. There is NO universally backwards compatible way of switching from text params to binary ones. If we ever support binary, it will need an explicit attribute/method, like the ISQLParam adaptation that Daniele has proposed. Since breaking compatibility of psycopg2 is out of the question, so is moving to binary mode by default. 4. In the receive direction, things are even more complex. It wouldn't be recommended except for cases of extreme data transfers (always with the option to fall back to the old text code). 5. libpqtypes doesn't seem to help as you suggested. It does text->net-bin adaptations, which we don't need, since Python internal representation is already binary. We want a bin->net-bin library (where net-bin is the network Postgres-native format of data, heavily dependent on server's version).. OTOH, reading the code of libpqtypes /could/ help us write ourselves similar adapters. Code at: http://git.hellug.gr/?p=xrg/psycopg2;a=shortlog;h=refs/heads/execparams -- Say NO to spam and viruses. Stop using Microsoft Windows!
On Sat, Feb 26, 2011 at 11:47 PM, A.M. <agentm@themactionfaction.com> wrote: > Have you looked at libpqtypes? It makes dealing with complex datatypes trivial. > > http://libpqtypes.esilo.com/ Hi A.M., David Blewett had suggested the same just a few hours before you :) Maybe I'm missing something about libpqtypes: in this case some explanation would be appreciated. What I see is that it offers printf/scanf style parameters passing. This is greatly helpful if you have C variables containing the parameters: it saves building the arrays to be passed to PQexecParams. From their example: PGresult *res = PQexecf(conn, "INSERT INTO t VALUES (%int4, %text)", 654321, "some text"); this seems saving getting the oid of the types, converting the non-string parameters to string and handling the arrays PQexecParams requires. But I don't see how psycopg could use this handy function because there isn't a C function containing the parameters. To simplify what we have, let's say there is a Python tuple 't' containing the values all as strings: using such structure would be something like: char *paramValues[PySequence_Size(t)]; for (i = 0; i < PySequence_Size(t); i++) { paramValues[i] } PQexecParams(query, ..., paramValues, ...); (omitting everything about types handling). How could PQexecf be called given the parameters in the Python tuple t instead of in separate C variables? Would it be a pattern easier than the one above? About the types, using the bare libpq, psycopg should iterate over the python types in input and map every python type into a numeric OID (about which psycopg already has knowledge), pack them into a C array and send it as paramTypes. Using libpqtypes instead psycopg should map the python types into a string - the Postgres name of the types - and then mangle the type names into the query string. I don't see particular saving in doing the latter instead of the former: there is still types mapping to do, and the result should be sprintf'd into a new query instead of put into a C array - it seems more clumsy. So, I think libpqtypes is a huge saving for a program that would need otherwise a lot of bureaucracy to operate with the libpq (knowledge of the Postgres types etc). But psycopg has already sorted out this kind of layer. Then, you may be referring to some different libpqtypes functions and I may have misunderstood both David's and your advice. In this case I would be grateful if you could illuminate me about what part of libpqtypes would be great for psycopg to use: I may be too focused on PQexecf and completely missing the forest for the trees. Thank you very much. -- Daniele
On Sun, Feb 27, 2011 at 10:49 AM, Daniele Varrazzo <daniele.varrazzo@gmail.com> wrote: > char *paramValues[PySequence_Size(t)]; > for (i = 0; i < PySequence_Size(t); i++) { > paramValues[i] That should have been something like: paramValues[i] = PyString_AsString(PySequence_GetItem(t, i)); > } > PQexecParams(query, ..., paramValues, ...); -- Daniele
On 27/02/11 00:40, Daniele Varrazzo wrote: [snip] > Luckily (well, it wasn't luck at all: thanks to Federico's design) > psycopg already has two extension mechanisms to cope with having both > the execution methods: one is the possibility to subclass > connections/cursors and thus have a cursor supporting PQexecParams in > execute() and PQprepare in executemany() (let's call it ParamCursor). > Another one is the completely generic adaptation mechanism of Python > objects [5]: there could be a separate protocol (let's say ISQLParam) > to which python object passed to the query should be conformed: > conforming object may expose different methods from the ones required > by ISQLQuote, that may be needed to interface with PQexecParam. If you have a look at ISQLQuote empty implementation you'll see a getbinary() method that was put there exactly for the reason of supporting PQexecParams & co. At the time the idea was that an adapter *should* provide a method to convert the adapted object to PostgreSQL binary representation. As Daniele explained the main problem is that not every value can be converted to binary representation so, probably, a separate protocol would be better. About the API I'd probably go for complete transparency: if every parameter supports the new protocol, then adapt to binary, else just fall back to the current implementation. Checking the parameters should be quite fast (the code don't really adapt anything yet). federico -- Federico Di Gregorio federico.digregorio@dndg.it Studio Associato Di Nunzio e Di Gregorio http://dndg.it Lord, defend me from my friends; I can account for my enemies. -- Charles D'Hericault
On Feb 27, 2011, at 5:49 AM, Daniele Varrazzo wrote: > On Sat, Feb 26, 2011 at 11:47 PM, A.M. <agentm@themactionfaction.com> wrote: > >> Have you looked at libpqtypes? It makes dealing with complex datatypes trivial. >> >> http://libpqtypes.esilo.com/ > > Hi A.M., > > David Blewett had suggested the same just a few hours before you :) > > Maybe I'm missing something about libpqtypes: in this case some > explanation would be appreciated. > > What I see is that it offers printf/scanf style parameters passing. > This is greatly helpful if you have C variables containing the > parameters: it saves building the arrays to be passed to PQexecParams. > From their example: > > PGresult *res = PQexecf(conn, > "INSERT INTO t VALUES (%int4, %text)", 654321, "some text"); One uses PQparamExec after constructing a PQparam with PQputf- certainly not elegant and easy to mess up, but hey, it's C.There's an example here: http://libpqtypes.esilo.com/man3/PQputf.html PQexecf is just a convenience wrapper around these functions. > > About the types, using the bare libpq, psycopg should iterate over the > python types in input and map every python type into a numeric OID > (about which psycopg already has knowledge), pack them into a C array > and send it as paramTypes. Using libpqtypes instead psycopg should > map the python types into a string - the Postgres name of the types - > and then mangle the type names into the query string. I don't see > particular saving in doing the latter instead of the former: there is > still types mapping to do, and the result should be sprintf'd into a > new query instead of put into a C array - it seems more clumsy. If psycopg2 doesn't know the type, then it can pass it as text and set the OID to 0 so the server will try to figure it out.And yes, psycopg2 would be able to delete the OID juggling code, but I see that as a positive step. For backwards compatibility,the register_type function and friends could hook into the libpqtypes OID handler table: http://libpqtypes.esilo.com/browse_source.html?file=handler.c For numeric types, psycopg would add ints and doubles to the PQParam, saving the server that parsing step. > > So, I think libpqtypes is a huge saving for a program that would need > otherwise a lot of bureaucracy to operate with the libpq (knowledge of > the Postgres types etc). But psycopg has already sorted out this kind > of layer. Postgresql has a problem with prepared statements in that the execution plan is determined before the values are bound, soI suspect the old escape-string methods will need to be preserved and pumped through the prepare for '_'. (I would hopethat all queries would go through the extended query protocol if libpqtypes were to be used.) Clearly, some runtime optionwould need to be provided through Python. However, I believe (as a matter of opinion) that the multi-statement execution string is the worst misfeature of the v1 protocol.As you mentioned, there is no such option in v2, but I don't believe that to be a loss. However, it would definitelybe a backwards-compatibility issue. I suspect that using libpqtypes would allow psycopg2 to delete a lot of its bureacracy. > > Then, you may be referring to some different libpqtypes functions and > I may have misunderstood both David's and your advice. In this case I > would be grateful if you could illuminate me about what part of > libpqtypes would be great for psycopg to use: I may be too focused on > PQexecf and completely missing the forest for the trees. What is most exciting about libpqtypes is that is uses the binary protocol which can bring orders-of-magnitude performancebenefits for some workloads. So, to summarize, libpqtypes: - is a utility wrapper around libpq - would allow psycopg to delete a bunch of code surrounding escaping and special type handling while supporting C implementationsof user-defined types (fast!) - is actively developed and maintained (as a license-compatible project) with developers who would be receptive to assistingthis project - can offer surprising performance benefits - would give psycopg2 a strong competitive advantage over the other 3000 python postgresql client libraries - would include some backwards compatibility issues for this project (though nothing insurmountable) Cheers, M
On Sun, Feb 27, 2011 at 10:30 AM, P. Christeas <p_christ@hol.gr> wrote: > 5. libpqtypes doesn't seem to help as you suggested. It does text->net-bin > adaptations, which we don't need, since Python internal representation is > already binary. We want a bin->net-bin library (where net-bin is the network > Postgres-native format of data, heavily dependent on server's version).. I have re-(re-re-)read the comments about the binary types in the libpq documentation, and it's not as dire as I thought it was: """ Values passed in binary format require knowledge of the internal representation expected by the backend. For example, integers must be passed in network byte order. Passing numeric values requires knowledge of the server storage format, as implemented in src/backend/utils/adt/numeric.c::numeric_send() and src/backend/utils/adt/numeric.c::numeric_recv(). """ So *it is* an implementation leak, but it doesn't promise breakage at every version. I will ask for further details on the postgres MLs to try to get a better picture. > OTOH, reading the code of libpqtypes /could/ help us write ourselves similar > adapters. This is true of course: even if we decided not to use libpqtypes there's surely a lot to learn from it. -- Daniele
On Sun, Feb 27, 2011 at 4:11 PM, A.M. <agentm@themactionfaction.com> wrote: > > On Feb 27, 2011, at 5:49 AM, Daniele Varrazzo wrote: >> What I see is that it offers printf/scanf style parameters passing. >> This is greatly helpful if you have C variables containing the >> parameters: it saves building the arrays to be passed to PQexecParams. >> From their example: >> >> PGresult *res = PQexecf(conn, >> "INSERT INTO t VALUES (%int4, %text)", 654321, "some text"); > > One uses PQparamExec after constructing a PQparam with PQputf- certainly not elegant and easy to mess up, but hey, it'sC. There's an example here: http://libpqtypes.esilo.com/man3/PQputf.html > > PQexecf is just a convenience wrapper around these functions. I see, but PQexecf is another variadic function, so my point is the same. If you have arguments in variables a, b, c, you can call PQputf("%text, %text, %text", a, b, c) and that's handy. But if you have an array containing three variables v[] and its length L, how do you pass them to PQputf? > Postgresql has a problem with prepared statements in that the execution plan is determined before the values are bound,so I suspect the old escape-string methods will need to be preserved and pumped through the prepare for '_'. (I wouldhope that all queries would go through the extended query protocol if libpqtypes were to be used.) Clearly, some runtimeoption would need to be provided through Python. I know about the suboptimal plan Postgres generates in case of prepared statements. Does it apply for PQexecParams too or just for PQprepare? I've asked on the -general about this. > What is most exciting about libpqtypes is that is uses the binary protocol which can bring orders-of-magnitude performancebenefits for some workloads. If there is guarantee it is stable as much as the textual representation of the data types we'll want to have plenty of it. But maintainibility has a priority over performance in psycopg and I don't want to cause problem in future client-server interoperability to gain some performance. > So, to summarize, libpqtypes: > - is a utility wrapper around libpq > - would allow psycopg to delete a bunch of code surrounding escaping and special type handling while supporting C implementationsof user-defined types (fast!) > - is actively developed and maintained (as a license-compatible project) with developers who would be receptive to assistingthis project > - can offer surprising performance benefits > - would give psycopg2 a strong competitive advantage over the other 3000 python postgresql client libraries > - would include some backwards compatibility issues for this project (though nothing insurmountable) A few things of the library are really interesting, and I have a lot to learn about the binary protocol. I will study it closer to see if it can be helpful: I see its utility from the PoV of an end-user program, but because psycopg is a generic library, and doesn't deal directly with C variables (but with Python values in C structures rich of metadata) I have to understand if interfacing to it is really an improvement respect to interfacing directly to the libpq. Cheers, -- Daniele
On Feb 27, 2011, at 6:35 PM, Daniele Varrazzo wrote: > On Sun, Feb 27, 2011 at 4:11 PM, A.M. <agentm@themactionfaction.com> wrote: >> >> On Feb 27, 2011, at 5:49 AM, Daniele Varrazzo wrote: > > I see, but PQexecf is another variadic function, so my point is the > same. If you have arguments in variables a, b, c, you can call > PQputf("%text, %text, %text", a, b, c) and that's handy. But if you > have an array containing three variables v[] and its length L, how do > you pass them to PQputf? One calls it multiple times. Hopefully this example will clear up the confusion: #include <libpq-fe.h> #include <libpqtypes.h> #include <assert.h> int main(int argc,char *argv[]) { PGconn *conn = PQconnectdb("dbname=test"); assert(conn!=NULL); PQtypesRegister(conn); PGparam *param = PQparamCreate(conn); PQputf(param,"%int4",5); //calling PQputf multiple times PQputf(param,"%text","spam"); PGresult *res=PQparamExec(conn,param,"SELECT $1,$2;",0); PGint4 val1; PGtext val2; assert(PQgetf(res,0,"%int4",0,&val1)>0); assert(PQgetf(res,0,"%text",1,&val2)>0); printf("%d %s\n",val1,val2); PQclear(res); } > I know about the suboptimal plan Postgres generates in case of > prepared statements. Does it apply for PQexecParams too or just for > PQprepare? I've asked on the -general about this. It applies to all prepared statements regardless of the API. However, as a special case, the special "" (empty string) preparedstatement follows this rule: "The unnamed prepared statement is likewise planned during Parse processing if the Parse message defines no parameters. Butif there are parameters, query planning occurs every time Bind parameters are supplied. This allows the planner to makeuse of the actual values of the parameters provided by each Bind message, rather than use generic estimates." http://www.postgresql.org/docs/9.0/interactive/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY This allows one to better mimic the behavior of the simple query protocol. > >> What is most exciting about libpqtypes is that is uses the binary protocol which can bring orders-of-magnitude performancebenefits for some workloads. > > If there is guarantee it is stable as much as the textual > representation of the data types we'll want to have plenty of it. But > maintainibility has a priority over performance in psycopg and I don't > want to cause problem in future client-server interoperability to gain > some performance. libpqtypes supports ALL possible PostgreSQL types (user-defined as well). Right now, creating pycopg2 types involves parsingthe textual representations from the result, however, using libpqtypes, psycopg2 could offer a much more natural APIlike this: class Point(object): def __init__(self,x,y): self.x = x self.y = y psycopg2.register_type(Point,["x","y"]) or psycopg2.register_type(Point,{"x":int,"y":int}) or even psycopg2.register_type(Point) #psycopg2 extracts all properties as a tuple to pass to libpqtypes or something like that- I'm just brainstorming here. Obviously, this wouldn't work if a Point instance needed to be interpolatedinto a query string. > > >> So, to summarize, libpqtypes: >> - is a utility wrapper around libpq >> - would allow psycopg to delete a bunch of code surrounding escaping and special type handling while supporting C implementationsof user-defined types (fast!) >> - is actively developed and maintained (as a license-compatible project) with developers who would be receptive to assistingthis project >> - can offer surprising performance benefits >> - would give psycopg2 a strong competitive advantage over the other 3000 python postgresql client libraries >> - would include some backwards compatibility issues for this project (though nothing insurmountable) > > A few things of the library are really interesting, and I have a lot > to learn about the binary protocol. I will study it closer to see if > it can be helpful: I see its utility from the PoV of an end-user > program, but because psycopg is a generic library, and doesn't deal > directly with C variables (but with Python values in C structures rich > of metadata) I have to understand if interfacing to it is really an > improvement respect to interfacing directly to the libpq. The best thing about libpqtypes is that you don't need to learn about the binary protocol because it is all cleanly wrappedup. For example, the library has to account for endianness of the server and other hurdles. It is all handled transparently. Cheers, M
On Mon, Feb 28, 2011 at 6:49 PM, A.M. <agentm@themactionfaction.com> wrote: > > On Feb 27, 2011, at 6:35 PM, Daniele Varrazzo wrote: >> I see, but PQexecf is another variadic function, so my point is the >> same. If you have arguments in variables a, b, c, you can call >> PQputf("%text, %text, %text", a, b, c) and that's handy. But if you >> have an array containing three variables v[] and its length L, how do >> you pass them to PQputf? > > One calls it multiple times. Hopefully this example will clear up the confusion: > [...] > PQputf(param,"%int4",5); //calling PQputf multiple times > PQputf(param,"%text","spam"); > > PGresult *res=PQparamExec(conn,param,"SELECT $1,$2;",0); > [...] Yes it does, thank you. >> I know about the suboptimal plan Postgres generates in case of >> prepared statements. Does it apply for PQexecParams too or just for >> PQprepare? I've asked on the -general about this. > > It applies to all prepared statements regardless of the API. However, as a special case, the special "" (empty string)prepared statement follows this rule: > [...] > http://www.postgresql.org/docs/9.0/interactive/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY This is interesting. This means that if we wanted to use PQprepare to avoid repeated planning in executemany() we should set a non-empty name for the statement. Good to know... > libpqtypes supports ALL possible PostgreSQL types (user-defined as well). Right now, creating pycopg2 types involves parsingthe textual representations from the result, however, using libpqtypes, psycopg2 could offer a much more natural APIlike this: > > class Point(object):, > def __init__(self,x,y): > self.x = x > self.y = y > [...] > psycopg2.register_type(Point) #psycopg2 extracts all properties as a tuple to pass to libpqtypes FYI, in psycopg 2.4 we have added register_composite() that does about the same, converting user-defined types into [named]tuples. Of course it parses the textual representation. The Point was just an example for a simple adapter, it wasn't meant as an example for composite (e.g. if you wanted to write ad adapter for *yuck* money, that's the recipe) >>> So, to summarize, libpqtypes: > The best thing about libpqtypes is that you don't need to learn about the binary protocol because it is all cleanly wrappedup. For example, the library has to account for endianness of the server and other hurdles. It is all handled transparently. Yup, it seems the good way to go indeed if we wanted to introduce binary communication. Note anyway that using PQexecParams doesn't imply automatically using binary parameters, and not for all the data types the benefit of binary is proven. It is definitely for bytea. I'd argue that for text is the same. I've chatted with Christeas at FOSDEM and he said (correct me If I'm wrong) after some benchmark he has made that the performance with numbers were mixed: sending 8 bytes to represent '3' is slower than sending and parsing the textual representation. Datetime types would definitely benefit from binary, but probably they are the only other types for which there is a sure gain. I see two sequential steps in this development: the first is to lay out the ParamCursor (that should convert the python query into the correct representation for PQexecParams)* and define the ISQLParam protocol. Within this framework we could introduce the binary protocol. So we could proceed: 1. text for everything except bytea (first step) 2a. binary only for selected data types, with our code, or 2b. binary for everything using libpqtypes in case we went for 2b help from people "into" libpqtypes would be greatly appreciated of course :) * what query conversion? To make ParamCursor use easier (migrating previous code, using python dicts) I envise ParamCursor to use the current format ("%s") and pyformat ("%(name)s") not the libpq $1, $2 format. This will require some funny conversion of the query string: - "SELECT %s, %s, %s" would become "SELECT $1, $2, $3" - "SELECT %(foo)s, %(bar)s, %(foo)s" would become "SELECT $1, $2, $1" plus a remapping of the parameters used in dict into a sequence of 2 items. Cheers, -- Daniele