Thread: ResultSet.getBytes() of bytea speedup batch

ResultSet.getBytes() of bytea speedup batch

From

Mikko Tiihonen

Date:

16 November 2006, 19:11:22

Hi,

I spent two lonely nights trying to optimise the retrieving of binary
data from bytea column when using ResultSet.getBytes().

The attached patch gives 20-400% speedup depending on the bytea column
size. The speedup comes from switching to the binary transfer mode
instead of the text transfer mode for bytea columns.

The patch also contains the test code which I used to test the
performance (ByteaBenchTest.java). The benchmark results are:

cvs:
column size:    transfer speed:
    512b speed:   3,8MB/s
   2048b speed:  10,0MB/s
   4096b speed:  13,1MB/s
  16384b speed:  22,0MB/s
  65536b speed:  22,4MB/s
1048576b speed:  21,2MB/s

patch:
column size:    transfer speed:
    512b speed:   4,6MB/s -  21% speedup
   2048b speed:  16,8MB/s -  68% speedup
   4096b speed:  27,9MB/s - 112% speedup
  16384b speed:  79,2MB/s - 260% speedup
  65536b speed: 111,1MB/s - 395% speedup (optimal for my L2 cache?)
1048576b speed:  74,8MB/s - 252% speedup

The benchmark was run on Java6rc build 104 with postgresql 8.1.5 running
on localhost with Athlon64 2x2GHz, 64bit mode.

---

What the patch does:

After executing a prepared statement the first time the result set field
information (Field[]) is cached into the prepared query object.
Subsequent executions of the prepared statements use the field
types to decide when to request binary encoding from the server.
Currently only bytea types request binary encoding. The caching seems
safe because the Field objects are immutable.

Still open questions:
- is there any better way to obtain the result set field information in
QueryExecutor?
  * could the result set type information be obtained during the
prepare?
- in which conditions should a prepared statement drop the cached fields
  * do the result set fields ever change?
  * any other conditions?
- if caching of Field information is allowed (at least in some
circumstances), could the driver stop requesting a Describe from the
backend?

-Mikko

Attachment

getBinary-speedup.patch

Use Driver to Create database?

From

Charlie Kelly

Date:

19 November 2006, 16:17:58

Is it possible to use the driver to create a new database (inside a java
program),
or is it necessary to first create the database using the createdb utility?

If it is possible to use the driver, what is the correct syntax?

Thanks

Charlie Kelly

Re: ResultSet.getBytes() of bytea speedup batch

From

"Heikki Linnakangas"

Date:

19 November 2006, 16:31:45

Mikko Tiihonen wrote:
> The attached patch gives 20-400% speedup depending on the bytea column
> size. The speedup comes from switching to the binary transfer mode
> instead of the text transfer mode for bytea columns.

Impressive results!

> The patch also contains the test code which I used to test the
> performance (ByteaBenchTest.java). The benchmark results are:
>
> cvs:
> column size:    transfer speed:
>     512b speed:   3,8MB/s
>    2048b speed:  10,0MB/s
>    4096b speed:  13,1MB/s
>   16384b speed:  22,0MB/s
>   65536b speed:  22,4MB/s
> 1048576b speed:  21,2MB/s
>
> patch:
> column size:    transfer speed:
>     512b speed:   4,6MB/s -  21% speedup
>    2048b speed:  16,8MB/s -  68% speedup
>    4096b speed:  27,9MB/s - 112% speedup
>   16384b speed:  79,2MB/s - 260% speedup
>   65536b speed: 111,1MB/s - 395% speedup (optimal for my L2 cache?)
> 1048576b speed:  74,8MB/s - 252% speedup

I'm getting similar results, though the drop after 65536b is even steeper:

cvs:
     [junit]     512b speed:   1.8MB/s memory:   2.5MB
     [junit]    2048b speed:   5.0MB/s memory:   2.0MB
     [junit]    4096b speed:   7.5MB/s memory:   2.0MB
     [junit]   16384b speed:  11.8MB/s memory:   2.4MB
     [junit]   65536b speed:  12.6MB/s memory:   2.4MB
     [junit] 1048576b speed:   5.3MB/s memory:   5.8MB

patch:
     [junit]     512b speed:   2.1MB/s memory:   2.0MB
     [junit]    2048b speed:   7.4MB/s memory:   2.2MB
     [junit]    4096b speed:  13.0MB/s memory:   2.1MB
     [junit]   16384b speed:  35.9MB/s memory:   2.3MB
     [junit]   65536b speed:  56.7MB/s memory:   2.0MB
     [junit] 1048576b speed:  17.2MB/s memory:   2.9MB
     [junit] ------------- ---------------- ---------------

Ran on Sun JRE 1.5.0_06, on Intel Core Duo. I sure would like to find an
explanation for the drop.

> Still open questions:
> - is there any better way to obtain the result set field information in
> QueryExecutor?
>   * could the result set type information be obtained during the
> prepare?

It could. If the driver issued a Sync message after Describe, and waited
for the response before sending the Bind, the driver could use the
information in the RowDesription response. But that would require an
extra round-trip to the server, so it's probably not worth it in most
queries.

I wish we could set a default format for each data type in a separate
message. Like:

SetFormat(oid = 17, format="b")

The server would then send all bytea fields as binary unless otherwise
stated in the Bind message. It would require a backend-change, of course.

> - in which conditions should a prepared statement drop the cached fields
>   * do the result set fields ever change?
>   * any other conditions?
> - if caching of Field information is allowed (at least in some
> circumstances), could the driver stop requesting a Describe from the
> backend?

I can't imagine how the fields could change. Seems safe to me.

> @@ -126,6 +134,8 @@
>      private byte[] encodedStatementName;
>      private PhantomReference cleanupRef;
>      private int[] preparedTypes;
> +    private static final byte[] EMPTY_STATEMENTNAME = new byte[] { 0 };
> +    private Field[] fields;
>
>      final static SimpleParameterList NO_PARAMETERS = new SimpleParameterList(0);
>  }

This is just some leftover crud, right? Couldn't find a reference to
EMPTY_STATEMENTNAME anywhere.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Use Driver to Create database?

From

Roland Walter

Date:

19 November 2006, 17:42:17

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Charlie Kelly wrote:
> Is it possible to use the driver to create a new database (inside a java
> program),
> or is it necessary to first create the database using the createdb utility?
>
> If it is possible to use the driver, what is the correct syntax?
>

There is a SQL-command for that, look here:


http://www.postgresql.org/docs/8.1/interactive/sql-createdatabase.html

First connect to the postgres or template1 schema of the cluster, that
you created with initdb. Execute the "CREATE DATABASE". Reconnect to the
database you created.

Regards,
Roland.
- --
Dipl.-Phys. Roland Walter
mailto: roland (dot)walter (dot) rwa (at) gmx (dot) net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFFYM+vxoOEAkary1ERAmASAJ0YfU9L9hepz68I0A2/yzV3ZjSAjwCeOnqU
LqefeHLpkkc2Np7cRYVmawY=
=4SOY
-----END PGP SIGNATURE-----

Re: Use Driver to Create database?

From

Roland Walter

Date:

19 November 2006, 17:46:45

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Roland Walter wrote:
> Charlie Kelly wrote:
>> Is it possible to use the driver to create a new database (inside a java
>> program),
>> or is it necessary to first create the database using the createdb utility?
>
>> If it is possible to use the driver, what is the correct syntax?
>
>
> There is a SQL-command for that, look here:
>
>
> http://www.postgresql.org/docs/8.1/interactive/sql-createdatabase.html
>
> First connect to the postgres or template1 schema of the cluster, that

Oops, the word schema is wrong, postgres and template1 are default
databases.

- --
Dipl.-Phys. Roland Walter
mailto: roland (dot) walter (dot) rwa (at) gmx (dot) net
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFFYNC9xoOEAkary1ERApjXAJ4tGByZErjSwabxWlH5P+dq95cpNQCdFhP4
S/b2XDCr+I2MWKH5kxEcYGw=
=Ddbc
-----END PGP SIGNATURE-----

Re: Use Driver to Create database?

From

"Heikki Linnakangas"

Date:

19 November 2006, 18:04:28

Roland Walter wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Charlie Kelly wrote:
>> Is it possible to use the driver to create a new database (inside a java
>> program),
>> or is it necessary to first create the database using the createdb utility?
>>
>> If it is possible to use the driver, what is the correct syntax?
>>
>
> There is a SQL-command for that, look here:
>
>
> http://www.postgresql.org/docs/8.1/interactive/sql-createdatabase.html
>
> First connect to the postgres or template1 schema of the cluster, that
> you created with initdb. Execute the "CREATE DATABASE". Reconnect to the
> database you created.

You don't want to connect to template1, though, if the program is going
to be used concurrently by more than one use. Template1 is used as
template (hence the name) to make the new database, and it will throw an
error if there's any other users connected to it beside the one issuing
the CREATE DATABASE command.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Use Driver to Create database?

From

John R Pierce

Date:

20 November 2006, 02:01:36

>> First connect to the postgres or template1 schema of the cluster, that
>> you created with initdb. Execute the "CREATE DATABASE". Reconnect to the
>> database you created.
>
> You don't want to connect to template1, though, if the program is
> going to be used concurrently by more than one use. Template1 is used
> as template (hence the name) to make the new database, and it will
> throw an error if there's any other users connected to it beside the
> one issuing the CREATE DATABASE command.
>

indeed.  In fact you can connect to ANY existing database prior to
issusing the CREATE DATABASE command.  It is important, however, that
you be logged on with credentials that have the CREATEDB privilege