More protocol discussion: breaking down query processing - Mailing list pgsql-interfaces
From | Tom Lane |
---|---|
Subject | More protocol discussion: breaking down query processing |
Date | |
Msg-id | 24009.1049325358@sss.pgh.pa.us Whole thread Raw |
Responses |
FE/BE - Side effects
Re: More protocol discussion: breaking down query processing Re: [HACKERS] More protocol discussion: breaking down query processing |
List | pgsql-interfaces |
Last month's discussions about a revised FE/BE protocol make it clear that there is usefulness in breaking down the query-processing cycle and making individual steps available at the protocol level. This surfaced in several forms in the discussion, including requests for protocol-level PREPARE functionality, concerns about the inefficiency of fetching a RowDescription with each row pulled from a cursor, etc. Here are some thoughts about how to do this. There are two intermediate objects involved in processing a query: a parsed query and a cursor (perhaps we can find better names for them). The parsed query contains all the info needed to execute a query, except for actual values of any parameter placeholders used in the query. A cursor is a parsed query instantiated with actual parameter values; it's ready to execute and return rows (if the query returns any --- a "cursor" for an UPDATE, say, wouldn't return anything). Query processing can then be broken down into several steps. The PARSE step takes a query text string, and optionally some indication of the types of the parameters needed, and produces a parsed query object (this is exactly equivalent to the existing PREPARE command's functionality). Parameter types are induced from the query if not specified. The BIND step takes a parsed-query object and some actual parameter values and produces a ready-to-execute cursor. The FETCH step fetches some or all rows from a cursor. One may CLOSE a cursor when done with it, and may DEALLOCATE a parsed query object when it's no longer needed. We will also want a DESCRIBE operation that can be applied to either cursors or parsed queries (these perhaps should be thought of as two separate operations). Describing a cursor yields info about the output columns (equivalent to RowDescription message in the present protocol). Describing a parsed query yields RowDescription plus some description of the parameter(s). The existing Q"sql statement" command can now be thought of as a macro that invokes PARSE, BIND no parameters, DESCRIBE cursor, FETCH all from cursor, CLOSE, DEALLOCATE. We will still offer that all-in-one functionality, but we'll also offer an extended query command that allows any combination of these functions to be requested at the protocol level. These protocol operations are interoperable with the SQL-level commands for prepared queries and cursors, in the sense that a prepared query or open cursor can be created either way and then used either way. (Alternatively we might think that this is a bad idea, and that protocol-level operations should use a different namespace from SQL commands, so that application-requested operations can't interfere with state that the client-side library has set up. Any opinions?) The BIND operation will allow actual parameter values to be sent in either text or binary form; in either case we'll use a bytecount-followed-by-data format so that it's fully eight-bit-clean. I am inclined to think also that the FETCH operation should allow specification of whether it wants the results in text or binary form. It's not real clear how that should interoperate with the existing DECLARE BINARY CURSOR command, though --- which one should win if there's a conflict? All of these functions can specify the relevant parsed-query and cursor objects by name. A named parsed-query object lives until end of session unless deallocated. A named cursor object lives until end of transaction unless closed. (This might be extended to allow holdable cursors, though I'm more inclined to say that such things can only be created by SQL commands.) Alternatively, the protocol functions can reference nameless (zero-length-name) parsed-query and cursor objects. There can be only one of each of these, living only till the next one is created. The system can optimize nameless objects a little since it knows it need not save them beyond the end of the current query. The all-in-one query command always uses nameless objects. A difficulty with this scheme is that it doesn't scale very well to querystrings containing more than one SQL statement. For the all-in-one query command, we can just define that the BIND-DESCRIBE-FETCH-CLOSE cycle is automatically repeated for each parsetree constructed from the string; this is backwards-compatible with what happens now. It seems fairly messy to extend that to the case where the steps are individually commanded ... especially if you want to assume that the steps are being commanded by a client library that hasn't parsed the querystring and doesn't know how many commands exist therein. Perhaps we can get away with disallowing multiple commands per string when using the extended protocol. Comments? regards, tom lane
pgsql-interfaces by date: