Prepared Statement Query Planning - Mailing list pgsql-jdbc
From | Brett Henderson |
---|---|
Subject | Prepared Statement Query Planning |
Date | |
Msg-id | 4A9892E5.4010509@bretth.com Whole thread Raw |
Responses |
Re: Prepared Statement Query Planning
Re: Prepared Statement Query Planning |
List | pgsql-jdbc |
Hi, I'm trying to use a PreparedStatement to query a small number of rows from a table containing over 500 million rows. I'm encountering bad query plans due to the query plan being constructed without access to the bind variables. I've read this link which suggests disabling server side prepared statements but it doesn't appear to be helping. I am happy to disable all server side prepared statements because I am effectively performing batch queries where the cost of a query plan is relatively small compared to the execution of the query. http://jdbc.postgresql.org/documentation/83/server-prepare.html I've added "prepareThreshold=0", and "prepareThreshold=3" to the connection URL but neither appear to have any effect. I've checked the prepare threshold on the statement objects and found that they are being set to the values I specify. Is there any way of verifying what type of statement is being issued to the server? This is the query: SELECT e.id, e.version, e.timestamp, e.visible, u.data_public, u.id AS user_id, u.display_name, e.changeset_id, e.latitude, e.longitude FROM nodes e INNER JOIN ( SELECT id, version FROM nodes WHERE timestamp > ? AND timestamp <= ? ) t ON e.id = t.id AND e.version = t.version INNER JOIN changesets c ON e.changeset_id = c.id INNER JOIN users u ON c.user_id = u.id The node table contains approx 500 million rows, the user table approx 100 thousand rows, and the changeset table somewhere in between but closer to the lower end. The server has 32 GB of RAM so can fit smaller tables in RAM, but not the node table. The query retrieves all rows within a timestamp range. I realise the query could be re-written without the sub-select, but it is implemented in this way as a result of the way the query is dynamically constructed and allows the sub-select portion can be switched out for other row selection criteria. The timestamp range varies anywhere from 1 minute to 1 day, but no larger than that. The table contains data over a period of several years. The full JDBC trace with loglevel=2 is below. Is it possible to tell whether server side prepared statements are being used from this? Note that this has been obtained from a windows laptop running PostgreSQL 8.3.5, but the real problem is occurring on an Ubuntu Linux server running PostgreSQL 8.3.7. The same JDBC driver is being used across the board. 17:48:46.077 (1) PostgreSQL 8.3 JDBC4 with SSL (build 603) 17:48:46.091 (1) Trying to establish a protocol version 3 connection to localhost:5432 17:48:46.204 (1) FE=> StartupPacket(user=osm, database=api06_test, client_encoding=UNICODE, DateStyle=ISO, extra_float_digits=2) 17:48:46.382 (1) <=BE AuthenticationReqMD5(salt=xxxxxxxx) 17:48:46.391 (1) FE=> Password(md5digest=md5xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) 17:48:46.397 (1) <=BE AuthenticationOk 17:48:46.408 (1) <=BE ParameterStatus(client_encoding = UNICODE) 17:48:46.408 (1) <=BE ParameterStatus(DateStyle = ISO, DMY) 17:48:46.409 (1) <=BE ParameterStatus(integer_datetimes = off) 17:48:46.409 (1) <=BE ParameterStatus(is_superuser = on) 17:48:46.409 (1) <=BE ParameterStatus(server_encoding = UTF8) 17:48:46.409 (1) <=BE ParameterStatus(server_version = 8.3.5) 17:48:46.409 (1) <=BE ParameterStatus(session_authorization = osm) 17:48:46.409 (1) <=BE ParameterStatus(standard_conforming_strings = off) 17:48:46.409 (1) <=BE ParameterStatus(TimeZone = Australia/Canberra) 17:48:46.409 (1) <=BE BackendKeyData(pid=8292,ckey=224285055) 17:48:46.409 (1) <=BE ReadyForQuery(I) 17:48:46.409 (1) compatible = 8.3 17:48:46.409 (1) loglevel = 2 17:48:46.409 (1) prepare threshold = 0 17:48:46.442 (1) simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandler@1b8e059, maxRows=0, fetchSize=10000, flags=9 17:48:46.442 (1) FE=> Parse(stmt=S_1,query="BEGIN",oids={}) 17:48:46.443 (1) FE=> Bind(stmt=S_1,portal=null) 17:48:46.443 (1) FE=> Execute(portal=null,limit=0) 17:48:46.443 (1) FE=> Parse(stmt=S_2,query="SELECT e.id, e.version, e.timestamp, e.visible, u.data_public, u.id AS user_id, u.display_name, e.changeset_id, e.latitude, e.longitude FROM nodes e LEFT OUTER JOIN changesets c ON e.changeset_id = c.id LEFT OUTER JOIN users u ON c.user_id = u.id WHERE e.timestamp > $1 AND e.timestamp <= $2 ORDER BY e.id, e.version",oids={0,0}) 17:48:46.444 (1) FE=> Describe(statement=S_2) 17:48:46.444 (1) FE=> Bind(stmt=S_2,portal=C_3,$1=<1970-01-01 10:00:00.000000 +10:00:00>,$2=<2009-08-28 17:48:45.932000 +10:00:00>) 17:48:46.445 (1) FE=> Execute(portal=C_3,limit=10000) 17:48:46.445 (1) FE=> Sync 17:48:46.636 (1) <=BE ParseComplete [S_1] 17:48:46.636 (1) <=BE BindComplete [null] 17:48:46.636 (1) <=BE CommandStatus(BEGIN) 17:48:46.636 (1) <=BE ParseComplete [S_2] 17:48:46.636 (1) <=BE ParameterDescription 17:48:46.637 (1) <=BE RowDescription(10) 17:48:46.638 (1) <=BE BindComplete [C_3] 17:48:46.638 (1) <=BE CommandStatus(SELECT) 17:48:46.650 (1) <=BE ReadyForQuery(T) 17:48:46.651 (1) simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Connection$TransactionCommandHandler@7b6889, maxRows=0, fetchSize=0, flags=22 17:48:46.651 (1) FE=> CloseStatement(S_2) 17:48:46.651 (1) FE=> ClosePortal(C_3) 17:48:46.651 (1) FE=> Parse(stmt=S_4,query="COMMIT",oids={}) 17:48:46.651 (1) FE=> Bind(stmt=S_4,portal=null) 17:48:46.651 (1) FE=> Execute(portal=null,limit=1) 17:48:46.651 (1) FE=> Sync 17:48:46.652 (1) <=BE CloseComplete 17:48:46.652 (1) <=BE CloseComplete 17:48:46.652 (1) <=BE ParseComplete [S_4] 17:48:46.652 (1) <=BE BindComplete [null] 17:48:46.652 (1) <=BE CommandStatus(COMMIT) 17:48:46.652 (1) <=BE ReadyForQuery(I) 17:48:46.652 (1) FE=> Terminate If it helps, the following blog entry includes two query plan diagrams with and without bind variables. http://www.odecee.com.au/blogs/?p=134 Any suggestions on what I'm doing wrong? Brett
pgsql-jdbc by date: