Thread: avoid prepared statements on complex queries?
I'm running into an issue where a complex query is performing significantly slower if I use a prepared statement.
I'm using:
-postgres 9.0.3
-postgresql-9.0-801.jdbc4.jar JDBC driver
Some notes about the query:
-I'm calling a postgres function, via: select * from foo(?,?,?,?,?,?,?)
-I'm getting back 4 million results
-the underlying query is quite complex joining several tables, and the core table is partitioned over a couple hundred tables.
Here's the java code:
conn.setAutoCommit(false);
pstmt = conn.prepareStatement("select * from foo(?,?,?,?,?,?,?)");
pstmt.setFetchSize(1000);
//and, then I do the usual thing to set parameters in the prepared statement
stmt.setInt(1, SOME_NUMBER);
pstmt.setArray(2, genepoolConn.createArrayOf("integer", ARRAY_OF_INTEGERS));
//and so on
What I have found:
-if ARRAY_OF_INTEGERS has a length of 3, the query performs fast
-if ARRAY_OF_INTEGERS has a length of 150, then query takes 1200 seconds
-if ARRAY_OF_INTEGERS has a length of 150, and I don't use a prepared statement, the query takes the expected 30 seconds
So, obviously when using a prepared statement, postgres is coming up with the wrong execution plan when the the parameter list is unexpectedly large. Pretty understandable.
My question: is there a work around to this? Can I force it not set the execution plan until I bind the variables?
The only reasons I'm using a prepared statement:
-parameterized queries are far easier to work with than building my query via string concatenation
-minimize chances of SQL Injection
If only there was a way to have parameterized queries without using prepared statements....
Thanks!
> If only there was a way to have parameterized queries without using prepared > statements.... Take a look at the prepareThreshold connection string parameter. --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 (650) 242-3500 Main www.truviso.com
Hi Maciek,
I tried playing around with the prepareThreshold parameter, by setting it in the JDBC connection string. I tried setting it to zero, which should cause it to not use a prepared statement, right? Query is still slow. I also tried setting it to 1 and 3, and in all cases it's slow. Is there any way to verify if postgres is using a planned query?
I found this old post:
http://archives.postgresql.org/pgsql-jdbc/2008-02/msg00077.php
"However even when plans are not cached, you will still run into the issue that an unnamed statement using out-of-line parameter values may generate a less efficient plan than an unnamed statement using inline parameter values, because there is no way to tell the planner at the protocol level "I am really only ever using this query once, please give me a specific plan for these values and don't worry about generating a plan that is correct for other values too"."
Maybe that's what's happening?
I tried a different tactic: setting the protocolVersion=2. I can see that forces it to not use a prepared statement, but it seems to ignore that I set the fetch size to 1000 for the prepared statement. I need to be able to set the fetch size, otherwise I run out of memory since I'm getting back 4 million results.
thanks for your help.
Anish
From: Maciek Sakrejda <msakrejda@truviso.com>
To: Anish Kejariwal <anishkej@yahoo.com>
Cc: "pgsql-jdbc@postgresql.org" <pgsql-jdbc@postgresql.org>
Sent: Tuesday, November 15, 2011 5:31 PM
Subject: Re: [JDBC] avoid prepared statements on complex queries?
> If only there was a way to have parameterized queries without using prepared
> statements....
Take a look at the prepareThreshold connection string parameter.
---
Maciek Sakrejda | System Architect | Truviso
1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com
>I tried playing around with the prepareThreshold parameter, by setting it in the JDBC connection string. I tried settingit to zero, which should cause it to not use a prepared statement, right? Query is still >slow. I also tried settingit to 1 and 3, and in all cases it's slow. Is there any way to verify if postgres is using a planned query? I think 0 should do it, yes (technically, it's using an unnamed prepared statement rather than not using one at all, but this is the behavior you're looking for). To verify, you should be able to change log_min_messages (to debug2?) and see all plan/execute steps in the server logs. Alternately, if you're not using SSL, you could just fire up Wireshark and see what's on the wire (it has a built-in PostgreSQL protocol plugin, so this is pretty straightforward). In either case, the unnamed prepared statements should have no name (in Prepare, Bind, and Execute messages); if memory serves, the named ones are something like S_1, S_2, etc.. > "However even when plans are not cached, you will still run into the issue > that an unnamed statement using out-of-line parameter values may generate a > less efficient plan than an unnamed statement using inline parameter values, > because there is no way to tell the planner at the protocol level "I am > really only ever using this query once, please give me a specific plan for > these values and don't worry about generating a plan that is correct for > other values too"." > Maybe that's what's happening? Maybe, although I've never seen that happen and as I understand, the driver always sends parameters along with the statement when using unnamed statements, so I'm not sure what would trigger this. Perhaps Oliver will chime in (he's still active on the list). You may also want to try an EXPLAIN ANALYZE on your query in both protocol versions and see what the plan differences are. --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 (650) 242-3500 Main www.truviso.com
On Wed, 16 Nov 2011, Maciek Sakrejda wrote: > Maybe, although I've never seen that happen and as I understand, the > driver always sends parameters along with the statement when using > unnamed statements, so I'm not sure what would trigger this. The standard example is "WHERE col LIKE ?". If the parameter is something like 'abcdef%' then the query can potentially be converted to a range scan, but only if knows the parameter value. When preparing this with an unnamed statement the server has the parameter value, but cannot assume that the same unnamed statement won't be used with a different parameter value later and must come up with a plan that is correct for all possible parameters even if it's not ideal for the current one. Kris Jurka
On Tue, 15 Nov 2011, Anish Kejariwal wrote: > I'm running into an issue where a complex query is performing significantly > slower if I use a prepared statement. > > Some notes about the query: > -I'm calling a postgres function, via: select * from foo(?,?,?,?,?,?,?) > -I'm getting back 4 million results > -the underlying query is quite complex joining several tables, and the core > table is partitioned over a couple hundred tables. > > Here's the java code: > > conn.setAutoCommit(false); > pstmt = conn.prepareStatement("select * from foo(?,?,?,?,?,?,?)"); > pstmt.setFetchSize(1000); > > //and, then I do the usual thing to set parameters in the prepared statement > stmt.setInt(1, SOME_NUMBER); > pstmt.setArray(2, genepoolConn.createArrayOf("integer", ARRAY_OF_INTEGERS)); > //and so on > > So, obviously when using a prepared statement, postgres is coming up with > the wrong execution plan when the the parameter list is unexpectedly large. You haven't really shown that. Unless you issue the same query, manually interpolating in the query parameters and get a much faster time, all you've shown is that things get slower with a larger number of values. What language is your function written in? Unless it's a sql function which could be inlined, I wouldn't have expected any difference between prepared vs inline query execution. Kris Jurka
Hi Kris,
Sorry, if I wasn't clear. I'm doing everything in JDBC.
If I use a prepared statement, and set my main parameter to be an array of 150 integers, the query takes for ever (1200 seconds). If I instead skip using a prepared statement by doing the following, the query takes 30 seconds:
Statement stmt = conn.createStatement();
stmt.setFetchSize(1000);
ResultSet rs = stmt.executeQuery(QUERY_STRING);
where QUERY_STRING is the entire query with the 150 integers.
In both cases, I'm calling a SQL function:
create or replace function foo(int,int[],int[],int, boolean, boolean, double precision)
.
.
where
dataset_id = any($2)
.
.
$$ language 'sql' immutable;
the second parameter is the one where the array will verify anywhere from 1 to 500 integers.
I'm going to try Maciek's suggestions (server logging, and wire shark), to see if I can verify that when I set ?prepareThreshold=0 in the JDBC string, it is in fact forcing it to be an unnamed prepared statement.
thanks,
Anish
From: Kris Jurka <books@ejurka.com>
To: Anish Kejariwal <anishkej@yahoo.com>
Cc: "pgsql-jdbc@postgresql.org" <pgsql-jdbc@postgresql.org>
Sent: Wednesday, November 16, 2011 1:20 PM
Subject: Re: [JDBC] avoid prepared statements on complex queries?
On Tue, 15 Nov 2011, Anish Kejariwal wrote:
> I'm running into an issue where a complex query is performing significantly
> slower if I use a prepared statement.
>
> Some notes about the query:
> -I'm calling a postgres function, via: select * from foo(?,?,?,?,?,?,?)
> -I'm getting back 4 million results
> -the underlying query is quite complex joining several tables, and the core
> table is partitioned over a couple hundred tables.
>
> Here's the java code:
>
> conn.setAutoCommit(false);
> pstmt = conn.prepareStatement("select * from foo(?,?,?,?,?,?,?)");
> pstmt.setFetchSize(1000);
>
> //and, then I do the usual thing to set parameters in the prepared statement
> stmt.setInt(1, SOME_NUMBER);
> pstmt.setArray(2, genepoolConn.createArrayOf("integer", ARRAY_OF_INTEGERS));
> //and so on
>
> So, obviously when using a prepared statement, postgres is coming up with
> the wrong execution plan when the the parameter list is unexpectedly large.
You haven't really shown that. Unless you issue the same query, manually
interpolating in the query parameters and get a much faster time, all
you've shown is that things get slower with a larger number of values.
What language is your function written in? Unless it's a sql function
which could be inlined, I wouldn't have expected any difference between
prepared vs inline query execution.
Kris Jurka
Kris Jurka <books@ejurka.com> writes: > On Wed, 16 Nov 2011, Maciek Sakrejda wrote: >> Maybe, although I've never seen that happen and as I understand, the >> driver always sends parameters along with the statement when using >> unnamed statements, so I'm not sure what would trigger this. > The standard example is "WHERE col LIKE ?". If the parameter is something > like 'abcdef%' then the query can potentially be converted to a range > scan, but only if knows the parameter value. When preparing this with an > unnamed statement the server has the parameter value, but cannot assume > that the same unnamed statement won't be used with a different parameter > value later and must come up with a plan that is correct for all possible > parameters even if it's not ideal for the current one. BTW, this should be all better in 9.2 ... so if anyone is thinking of expending lots of effort to fix it on the JDBC side, don't bother. regards, tom lane
Thanks Tom. That's great to know this will be addressed in 9.2.
Btw - I turned logging on and verified that by setting by setting prepareThreshold the prepared statement became an unnamed statement. This did not fix the performance issue I've been seeing. I'm going to have to resort to constructing the query myself by building the query string....
thanks for everyones help.
Anish
From: Tom Lane <tgl@sss.pgh.pa.us>
To: Kris Jurka <books@ejurka.com>
Cc: Maciek Sakrejda <msakrejda@truviso.com>; Anish Kejariwal <anishkej@yahoo.com>; "pgsql-jdbc@postgresql.org" <pgsql-jdbc@postgresql.org>
Sent: Wednesday, November 16, 2011 4:35 PM
Subject: Re: [JDBC] avoid prepared statements on complex queries?
Kris Jurka <books@ejurka.com> writes:
> On Wed, 16 Nov 2011, Maciek Sakrejda wrote:
>> Maybe, although I've never seen that happen and as I understand, the
>> driver always sends parameters along with the statement when using
>> unnamed statements, so I'm not sure what would trigger this.
> The standard example is "WHERE col LIKE ?". If the parameter is something
> like 'abcdef%' then the query can potentially be converted to a range
> scan, but only if knows the parameter value. When preparing this with an
> unnamed statement the server has the parameter value, but cannot assume
> that the same unnamed statement won't be used with a different parameter
> value later and must come up with a plan that is correct for all possible
> parameters even if it's not ideal for the current one.
BTW, this should be all better in 9.2 ... so if anyone is thinking of
expending lots of effort to fix it on the JDBC side, don't bother.
regards, tom lane