Re: [psycopg] Turbo ODBC - Mailing list psycopg
From | Uwe L. Korn |
---|---|
Subject | Re: [psycopg] Turbo ODBC |
Date | |
Msg-id | 1484650266.264445.850189824.3538CE31@webmail.messagingengine.com Whole thread Raw |
In response to | Re: [psycopg] Turbo ODBC (Jim Nasby <Jim.Nasby@BlueTreble.com>) |
Responses |
Re: [psycopg] Turbo ODBC
|
List | psycopg |
One important thing for fast columnar data access is that you don't want to have the data as Python objects before they will be turned into a DataFrame. Besides much better buffering, this was one of the main advantages we have with Turbodbc. Given that the ODBC drivers for Postgres seem to be in a miserable state, it would be much preferable to have such functionality directly in pyscopg2. Given from meetings with people at some PyData conferences that I showed turbodbc to, I can definitely say that there are some users out there that would like a fast path for Postgres-to-Pandas. In turbodbc, there are two additional functions added to the DB-API cursor object: fetchallnumpy and fetchallarrow. These suffice mostly for the typical pandas workloads. The experience from implementing this is basically that with Arrow it was quite simple to add a columnar interface as most of the data conversions were handled by Arrow. Also there was no need for me to interface with any Python types as the language "barrier" was transparently handled by Arrow. CC'ing Michael König, the creator of Turbodbc, he might be able to give some more input. -- Uwe L. Korn uwelk@xhochy.com On Tue, Jan 17, 2017, at 03:07 AM, Jim Nasby wrote: > On 1/16/17 7:32 PM, Adrian Klaver wrote: > > All of this is very interesting and definitely worth exploring, just not > > sure how much of it ties back to psycopg2 and this list. Not trying to > > rain on anyone's parade, I am wondering if this might not be better > > explored on a 'meta' list, something like the various Python projects > > that deal with Excel do: > > Since this is a user mailing list that might make sense. Though, I'm > getting the impression that there's some disconnect between what data > science users are doing and this list. Tuple-based results vs > vector-based (ie: columnar) results is an example of that. > > I do think there's 3 items that would best be handled at the "bottom" of > the stack (namely, psycopg2), because they'll enable every higher level > as well as make life easier for direct users of psycopg2: > > 1) Performance, both in low-latency (ie: filesystem socket) and > high-latency environments. > 2) Type conversion (in particular, getting rid of strings as the > intermediate representation). > 3) Optionally providing a columnar result set. > > #3 might be in direct opposition to the standard Python DB accessor > stuff, so maybe that would need to be a separate module on top of > psycopg2, but psycopg2 would certainly still need to support it. (IE: > you certainly do NOT want psycopg2 to build a list of dicts only to then > try and convert that to a columnar format). > -- > Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX > Experts in Analytics, Data Architecture and PostgreSQL > Data in Trouble? Get it in Treble! http://BlueTreble.com > 855-TREBLE2 (855-873-2532)