Re: Speed dblink using alternate libpq tuple storage - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Speed dblink using alternate libpq tuple storage |
Date | |
Msg-id | 16223.1333061790@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Speed dblink using alternate libpq tuple storage (Marko Kreen <markokr@gmail.com>) |
Responses |
Re: Speed dblink using alternate libpq tuple storage
|
List | pgsql-hackers |
Marko Kreen <markokr@gmail.com> writes: > My conclusion is that row-processor API is low-level expert API and > quite easy to misuse. It would be preferable to have something more > robust as end-user API, the PQgetRow() is my suggestion for that. > Thus I see 3 choices: > 1) Push row-processor as main API anyway and describe all dangerous > scenarios in documentation. > 2) Have both PQgetRow() and row-processor available in <libpq-fe.h>, > PQgetRow() as preferred API and row-processor for expert usage, > with proper documentation what works and what does not. > 3) Have PQgetRow() in <libpq-fe.h>, move row-processor to <libpq-int.h>. I still am failing to see the use-case for PQgetRow. ISTM the entire point of a special row processor is to reduce the per-row processing overhead, but PQgetRow greatly increases that overhead. And it doesn't reduce complexity much either IMO: you still have all the primary risk factors arising from processing rows in advance of being sure that the whole query completed successfully. Plus it conflates "no more data" with "there was an error receiving the data" or "there was an error on the server side". PQrecvRow alleviates the per-row-overhead aspect of that but doesn't really do a thing from the complexity standpoint; it doesn't look to me to be noticeably easier to use than a row processor callback. I think PQgetRow and PQrecvRow just add more API calls without making any fundamental improvements, and so we could do without them. "There's more than one way to do it" is not necessarily a virtue. > Second conclusion is that current dblink row-processor usage is broken > when user uses multiple SELECTs in SQL as dblink uses plain PQexec(). Yeah. Perhaps we should tweak the row-processor callback API so that it gets an explicit notification that "this is a new resultset". Duplicating PQexec's behavior would then involve having the dblink row processor throw away any existing tuplestore and start over when it gets such a call. There's multiple ways to express that but the most convenient thing from libpq's viewpoint, I think, is to have a callback that occurs immediately after collecting a RowDescription message, before any rows have arrived. So maybe we could express that as a callback with valid "res" but "columns" set to NULL? A different approach would be to add a row counter to the arguments provided to the row processor; then you'd know a new resultset had started if you saw rowcounter == 0. This might have another advantage of not requiring the row processor to count the rows for itself, which I think many row processors would otherwise have to do. regards, tom lane
pgsql-hackers by date: