Re: [psycopg] speed concerns with executemany() - Mailing list psycopg
From | Adrian Klaver |
---|---|
Subject | Re: [psycopg] speed concerns with executemany() |
Date | |
Msg-id | 9e943705-466b-c317-87fd-9e87e66f0338@aklaver.com Whole thread Raw |
In response to | [psycopg] speed concerns with executemany() (mike bayer <mike_mp@zzzcomputing.com>) |
Responses |
Re: [psycopg] speed concerns with executemany()
|
List | psycopg |
On 12/23/2016 04:05 PM, mike bayer wrote: > I'm getting more and more regular complaints among users of SQLAlchemy > of the relatively slow speed of the cursor.executemany() call in > psycopg2. In almost all cases, these users have discovered that > Postgresql is entirely capable of running an INSERT or UPDATE of many > values with a high degree of speed using a single statement with a form > like this: > > INSERT INTO table (a, b, c) > VALUES (1, 2, 3), (4, 5, 6), (7, 8, 9), ... > > whereas if they run the same form using a single VALUES insert and > sending the parameters as a sequence via executemany(), they will > observe hyperbolically slow speeds - today, someone claims they can run > approximately 200 sets of three integers each using the multiple VALUES > approach in approximately .02 seconds, whereas running 200 values into a > single executemany() call of the otherwise identical INSERT statement, > they are claiming takes 20 seconds; that is, 100000% slower. I'm not > really sure how that's even possible, considering the single INSERT with > many VALUES is a much larger string to send over the network and be > parsed by the server, if the overhead of a single INSERT is .02 seconds, > we would think an executemany() of 200 INSERT statements each with a > single parameter set would be at most four seconds. > > Obviously something is wrong with these users' environment, although I > will note that the relative speed of psycopg2 executemany() over a 1G > network is still pretty bad, compared to both sending a single INSERT > with a large VALUES clause as well as compared to the executemany() > speed of DBAPIs (even pure Python) for other databases like MySQL, I can > provide some quick benchmarks if that's helpful. > > I understand that psycopg2 does not use prepared statements, and I have > dim recollections that internal use of prepared statements for > executemany is not on the roadmap for psycopg2. However, I'm still not > sure what I should be telling my users when I get reports of these > vastly slower results with executemany(). > > I'm not asking that psycopg2 change anything, I'm just looking to > understand what the heck is going on when people are reporting this. > Should I: > > 1. tell them they have a network issue that is causing executemany() to > have a problem? (even though I can also observe executemany() is kind > of slow, though not as slow as these people are reporting) > > 2. tell them there's some known issue, vacuuming / indexes/ or something > that is known to have this effect? > > 3. tell them that yes, they should use multiple-VALUES within a single > INSERT (which would eventually lead to strong pressure on me to reinvent > executemany() within SQLAlchemy's psycopg2 dialect to use this form) ? > (also if so, why is this the case? can this claimed 100000% slowdown > be real?) I have to go with Christophe's explanation. They are seeing the effects of 200 separate transactions, though like he stated later this more an assumption then something I have tested. > > 4. use a hack to actually make my own prepared statements within > executemany() (I vaguely recall some recipe that you can get a prepared > statement going with psycopg2 by rolling it on the outside) ? > > 5. Other reasons that executemany() is known to sometimes be extremely > slow? > > I'm purposely trying to stay out of the realm of picking apart the libpq > internals, assuming psycopg2 devs can shed some light what's going on > here. Thanks for any guidance you can offer! > > > > > -- Adrian Klaver adrian.klaver@aklaver.com