Re: Speed dblink using alternate libpq tuple storage - Mailing list pgsql-hackers
From | Marko Kreen |
---|---|
Subject | Re: Speed dblink using alternate libpq tuple storage |
Date | |
Msg-id | 20120213233905.GA6225@gmail.com Whole thread Raw |
In response to | Re: Speed dblink using alternate libpq tuple storage (Marko Kreen <markokr@gmail.com>) |
Responses |
Re: Speed dblink using alternate libpq tuple storage
Re: Speed dblink using alternate libpq tuple storage |
List | pgsql-hackers |
On Tue, Feb 07, 2012 at 04:44:09PM +0200, Marko Kreen wrote: > Although it seems we could allow exceptions, at least when we are > speaking of Postgres backend, as the connection and result are > internally consistent state when the handler is called, and the > partial PGresult is stored under PGconn->result. Even the > connection is in consistent state, as the row packet is > fully processed. > > So we have 3 variants, all work, but which one do we want to support? > > 1) Exceptions are not allowed. > 2) Exceptions are allowed, but when it happens, user must call > PQfinish() next, to avoid processing incoming data from > potentially invalid state. > 3) Exceptions are allowed, and further row processing can continue > with PQisBusy() / PQgetResult() > 3.1) The problematic row is retried. (Current behaviour) > 3.2) The problematic row is skipped. I converted the patch to support 3.2), that is - skip row on exception. That required some cleanups of getAnotherTuple() API, plus I made some other cleanups. Details below. But during this I also started to think what happens if the user does *not* throw exceptions. Eg. Python does exceptions via regular return values, which means complications when passing them upwards. The main case I'm thinking of is actually resultset iterator in high-level language. Current callback-only style API requires that rows are stuffed into temporary buffer until the network blocks and user code gets control again. This feels clumsy for a performance-oriented API. Another case is user code that wants to do complex processing. Doing lot of stuff under callback seems dubious. And what if some part of code calls PQfinish() during some error recovery? I tried imaging some sort of getFoo() style API for fetching in-flight row data, but I always ended up with "rewrite libpq" step, so I feel it's not productive to go there. Instead I added simple feature: rowProcessor can return '2', in which case getAnotherTuple() does early exit without setting any error state. In user side it appears as PQisBusy() returned with TRUE result. All pointers stay valid, so callback can just stuff them into some temp area. ATM there is not indication though whether the exit was due to callback or other reasons, so user must detect it based on whether new temp pointers appeares, which means those must be cleaned before calling PQisBusy() again. This actually feels like feature, those must not stay around after single call. It's included in main patch, but I also attached it as separate patch so that it can be examined separately and reverted if not acceptable. But as it's really simple, I recommend including it. It's usage might now be obvious though, should we include example code in doc? New feature: * Row processor can return 2, then PQisBusy() does early exit. Supportde only when connection is in non-blocking mode. Cleanups: * Row data is tagged as processed when rowProcessor is called, so exceptions will skip the row. This simplifies non-exceptional handling as well. * Converted 'return EOF' in V3 getAnotherTuple() to set error instead. AFAICS those EOFs are remnants from V2 getAnotherTuple() not something that is coded deliberately. Note that when v3 getAnotherTuple() is called the row packet is fully in buffer. The EOF on broken packet to signify 'give me more data' does not result in any useful behaviour, instead you can simply get into infinite loop. Fix bugs in my previous changes: * Split the OOM error handling from custom error message handling, previously the error message in PGresult was lost due to OOM logic early free of PGresult. * Drop unused goto label from experimental debug code. -- marko
Attachment
pgsql-hackers by date: