Thread: ResultSet.getBytes() of bytea speedup batch
Hi, I spent two lonely nights trying to optimise the retrieving of binary data from bytea column when using ResultSet.getBytes(). The attached patch gives 20-400% speedup depending on the bytea column size. The speedup comes from switching to the binary transfer mode instead of the text transfer mode for bytea columns. The patch also contains the test code which I used to test the performance (ByteaBenchTest.java). The benchmark results are: cvs: column size: transfer speed: 512b speed: 3,8MB/s 2048b speed: 10,0MB/s 4096b speed: 13,1MB/s 16384b speed: 22,0MB/s 65536b speed: 22,4MB/s 1048576b speed: 21,2MB/s patch: column size: transfer speed: 512b speed: 4,6MB/s - 21% speedup 2048b speed: 16,8MB/s - 68% speedup 4096b speed: 27,9MB/s - 112% speedup 16384b speed: 79,2MB/s - 260% speedup 65536b speed: 111,1MB/s - 395% speedup (optimal for my L2 cache?) 1048576b speed: 74,8MB/s - 252% speedup The benchmark was run on Java6rc build 104 with postgresql 8.1.5 running on localhost with Athlon64 2x2GHz, 64bit mode. --- What the patch does: After executing a prepared statement the first time the result set field information (Field[]) is cached into the prepared query object. Subsequent executions of the prepared statements use the field types to decide when to request binary encoding from the server. Currently only bytea types request binary encoding. The caching seems safe because the Field objects are immutable. Still open questions: - is there any better way to obtain the result set field information in QueryExecutor? * could the result set type information be obtained during the prepare? - in which conditions should a prepared statement drop the cached fields * do the result set fields ever change? * any other conditions? - if caching of Field information is allowed (at least in some circumstances), could the driver stop requesting a Describe from the backend? -Mikko
Attachment
Is it possible to use the driver to create a new database (inside a java program), or is it necessary to first create the database using the createdb utility? If it is possible to use the driver, what is the correct syntax? Thanks Charlie Kelly
Mikko Tiihonen wrote: > The attached patch gives 20-400% speedup depending on the bytea column > size. The speedup comes from switching to the binary transfer mode > instead of the text transfer mode for bytea columns. Impressive results! > The patch also contains the test code which I used to test the > performance (ByteaBenchTest.java). The benchmark results are: > > cvs: > column size: transfer speed: > 512b speed: 3,8MB/s > 2048b speed: 10,0MB/s > 4096b speed: 13,1MB/s > 16384b speed: 22,0MB/s > 65536b speed: 22,4MB/s > 1048576b speed: 21,2MB/s > > patch: > column size: transfer speed: > 512b speed: 4,6MB/s - 21% speedup > 2048b speed: 16,8MB/s - 68% speedup > 4096b speed: 27,9MB/s - 112% speedup > 16384b speed: 79,2MB/s - 260% speedup > 65536b speed: 111,1MB/s - 395% speedup (optimal for my L2 cache?) > 1048576b speed: 74,8MB/s - 252% speedup I'm getting similar results, though the drop after 65536b is even steeper: cvs: [junit] 512b speed: 1.8MB/s memory: 2.5MB [junit] 2048b speed: 5.0MB/s memory: 2.0MB [junit] 4096b speed: 7.5MB/s memory: 2.0MB [junit] 16384b speed: 11.8MB/s memory: 2.4MB [junit] 65536b speed: 12.6MB/s memory: 2.4MB [junit] 1048576b speed: 5.3MB/s memory: 5.8MB patch: [junit] 512b speed: 2.1MB/s memory: 2.0MB [junit] 2048b speed: 7.4MB/s memory: 2.2MB [junit] 4096b speed: 13.0MB/s memory: 2.1MB [junit] 16384b speed: 35.9MB/s memory: 2.3MB [junit] 65536b speed: 56.7MB/s memory: 2.0MB [junit] 1048576b speed: 17.2MB/s memory: 2.9MB [junit] ------------- ---------------- --------------- Ran on Sun JRE 1.5.0_06, on Intel Core Duo. I sure would like to find an explanation for the drop. > Still open questions: > - is there any better way to obtain the result set field information in > QueryExecutor? > * could the result set type information be obtained during the > prepare? It could. If the driver issued a Sync message after Describe, and waited for the response before sending the Bind, the driver could use the information in the RowDesription response. But that would require an extra round-trip to the server, so it's probably not worth it in most queries. I wish we could set a default format for each data type in a separate message. Like: SetFormat(oid = 17, format="b") The server would then send all bytea fields as binary unless otherwise stated in the Bind message. It would require a backend-change, of course. > - in which conditions should a prepared statement drop the cached fields > * do the result set fields ever change? > * any other conditions? > - if caching of Field information is allowed (at least in some > circumstances), could the driver stop requesting a Describe from the > backend? I can't imagine how the fields could change. Seems safe to me. > @@ -126,6 +134,8 @@ > private byte[] encodedStatementName; > private PhantomReference cleanupRef; > private int[] preparedTypes; > + private static final byte[] EMPTY_STATEMENTNAME = new byte[] { 0 }; > + private Field[] fields; > > final static SimpleParameterList NO_PARAMETERS = new SimpleParameterList(0); > } This is just some leftover crud, right? Couldn't find a reference to EMPTY_STATEMENTNAME anywhere. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Charlie Kelly wrote: > Is it possible to use the driver to create a new database (inside a java > program), > or is it necessary to first create the database using the createdb utility? > > If it is possible to use the driver, what is the correct syntax? > There is a SQL-command for that, look here: http://www.postgresql.org/docs/8.1/interactive/sql-createdatabase.html First connect to the postgres or template1 schema of the cluster, that you created with initdb. Execute the "CREATE DATABASE". Reconnect to the database you created. Regards, Roland. - -- Dipl.-Phys. Roland Walter mailto: roland (dot)walter (dot) rwa (at) gmx (dot) net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFFYM+vxoOEAkary1ERAmASAJ0YfU9L9hepz68I0A2/yzV3ZjSAjwCeOnqU LqefeHLpkkc2Np7cRYVmawY= =4SOY -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Roland Walter wrote: > Charlie Kelly wrote: >> Is it possible to use the driver to create a new database (inside a java >> program), >> or is it necessary to first create the database using the createdb utility? > >> If it is possible to use the driver, what is the correct syntax? > > > There is a SQL-command for that, look here: > > > http://www.postgresql.org/docs/8.1/interactive/sql-createdatabase.html > > First connect to the postgres or template1 schema of the cluster, that Oops, the word schema is wrong, postgres and template1 are default databases. - -- Dipl.-Phys. Roland Walter mailto: roland (dot) walter (dot) rwa (at) gmx (dot) net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFFYNC9xoOEAkary1ERApjXAJ4tGByZErjSwabxWlH5P+dq95cpNQCdFhP4 S/b2XDCr+I2MWKH5kxEcYGw= =Ddbc -----END PGP SIGNATURE-----
Roland Walter wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Charlie Kelly wrote: >> Is it possible to use the driver to create a new database (inside a java >> program), >> or is it necessary to first create the database using the createdb utility? >> >> If it is possible to use the driver, what is the correct syntax? >> > > There is a SQL-command for that, look here: > > > http://www.postgresql.org/docs/8.1/interactive/sql-createdatabase.html > > First connect to the postgres or template1 schema of the cluster, that > you created with initdb. Execute the "CREATE DATABASE". Reconnect to the > database you created. You don't want to connect to template1, though, if the program is going to be used concurrently by more than one use. Template1 is used as template (hence the name) to make the new database, and it will throw an error if there's any other users connected to it beside the one issuing the CREATE DATABASE command. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
>> First connect to the postgres or template1 schema of the cluster, that >> you created with initdb. Execute the "CREATE DATABASE". Reconnect to the >> database you created. > > You don't want to connect to template1, though, if the program is > going to be used concurrently by more than one use. Template1 is used > as template (hence the name) to make the new database, and it will > throw an error if there's any other users connected to it beside the > one issuing the CREATE DATABASE command. > indeed. In fact you can connect to ANY existing database prior to issusing the CREATE DATABASE command. It is important, however, that you be logged on with credentials that have the CREATEDB privilege