Why does the prepareThreshold=0 (to cover pgbouncer transaction mode) disables protocol binary transfers ? (w/ PoC patch and measurements) - Mailing list pgsql-jdbc
From | Jakub Wartak |
---|---|
Subject | Why does the prepareThreshold=0 (to cover pgbouncer transaction mode) disables protocol binary transfers ? (w/ PoC patch and measurements) |
Date | |
Msg-id | AM8PR07MB8248700488FA4B51F435FCFDF6F79@AM8PR07MB8248.eurprd07.prod.outlook.com Whole thread Raw |
Responses |
Re: Why does the prepareThreshold=0 (to cover pgbouncer transaction mode) disables protocol binary transfers ? (w/ PoC patch and measurements)
|
List | pgsql-jdbc |
Hi, Given our usage of pgjdbc' with ?prepareThreshold=0 connecting over pgbouncer with pool_mode=transactional we (with my co-worker- Lukasz Pierzan) have spotted in app/pgjdbc Java profilers that pgjdbc is spending a lot of time in >> convertingthe UUIDs from text to binary << when receiving a lot of UUIDs from DBs during resultSets fetching. Upon furtherinspection we've concluded that even for basic simple SQL queries from pgjdbc ?prepareThreshold=0 like: "select uuidtypefrom tab1mlnrows": a) PostgreSQL will convert from binary ::uuid datatype to String over TCP wire (CPU waste) b) the network will transfer string form of UUIDs (network bandwidth and additional CPU waste for TCP/IP stacks on both sides) c) java/pgjdbc will read it from socket and convert it again from text to native UUID binary (big CPU waste) The important thing here is that pgjdbc is capable of using binary transfers (and thus reduce CPU usage both on sending andreceiving sides) of certain datatypes over sockets and Postgres protocol (and UUID is one of them), HOWEVER due to prepareThreshold=0such option is being disabled and thus transfers happen to be much less efficient that they could be. Thismeans that anybody using pgjdbc+PostgreSQL+pgbouncer/pool_mode=transactional stack (which seems to be most popular combo?)is affected by this. We've hacked custom PoC pgjdbc patch that actually tries to enable binary transfers even with disabled prepared statements.The binary transfer seemed to be disabled in two places: (1) org.postgresql.jdbc.PgStatement.isOneShotQuery() - for some reason getForceBinaryTransfer() was checked (2) org.postgresql.core.v3.QueryExecutorImpl.sendParse() - this seems to be more tricky; fields[] was cleared We only performed this change + enabled non-fully-documented "org.postgresql.forceBinary=true" for a sake of measuring potentialperformance impact of enabling binary transfer and not to fully understand pgjdbc/protocols/methods various usagecases, so the patch may be simply wrong as it stands out. The patch is even probably buggy as we do not have full insight. Performance impact of above bugfix/enhancement for "select uuidtype from tab1mlnrows": env specs: client: 8 VCPU r5.2xlarge (1s4c8t), OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode), DBserver: 4 VCPU i3.2xlarge (1s2c4t), PostgreSQL 13.5, Amazon Linux 2 network: same AWS zone, ping RTT between client and server 0.360ms env preparation/DB: create extension "uuid-ossp" create table t1 as select uuid_generate_v4() u from generate_series(1, 1000000) t; client crude preparation: wget https://repo1.maven.org/maven2/org/postgresql/postgresql/42.3.4/postgresql-42.3.4.jar javac -cp postgresql-42.3.4.jar PostgresTest.java export CLASSPATH=.:postgresql-42.3.4.jar # or the patched pgjdbc java PostgresTest # adjust as required default behavior/no optimization @ most recent pgjdbc (postgresql-42.3.4.jar) avg time of execution on PostgreSQL side: ~348ms +/- 50ms according to pg_stat_statements.mean_exec_time/stddev_exec_time avg end-to-end time of exec+fetching on app side to recieve 1 mln UUIDs: ~1000..1100ms end-to-end 1 call result size on the network: ~45MB (btw no SSL used) 1 app @ client (1 javaapp/1 connection) -> ~460kB/s recieve, ~20% CPU of single PostgreSQL backend on DBserver, ~91%javaapp thread (single-thread pgjdbc bottleneck conversion) 4 apps @ client -> ~30% OS DBserver utilization --> therefore 4/0.3=~13VCPUs theoretically would be necessary to fullyutilize 4VCPU DB PostgreSQL With attached PoC patch, the optimization of binary transfer (postgresql-42.3.5-SNAPSHOT.jar): avg time of execution on PostgreSQL side: ~216ms according to pg_stat_statements.mean_exec_time avg end-to-end time of exec+fetching on app side to recieve 1 mln UUIDs: ~240ms end-to-end 1 call result size on the network: ~28MB 1 app @ client (1 javaapp/1 connection) -> ~1.2MB/s receive stream , ~90% CPU of single PostgreSQL backend on DBserver,~65% javaapp single thread 4 apps @ client -> ~90% OS DBserver utilization Therefore the impact of sending text over binary using this simplistic testcase seems to be: - 1.61x CPU DBserver side - 4.58x end-to-end fetching time (with much more CPU processing time on client too) - 1.60x bandwidth overhead Therefore we would ask for someone more familiar than us in those to matters to see if it isn't simply an omission insidecode to potentially enable binary transfers even without prepared statements (isOneShot=true / prepareThresholds=0)? Is there is any reason why disabling preparingStatements also disables binary transfers? The settingas it stands couples two features into one variable. Or it that we should be more aware of prepareThreshold=-1 option (that's negative 1) that is pretty rare / not well knownand actually it should be hinted on https://www.pgbouncer.org/faq.html#how-to-use-prepared-statements-with-transaction-poolingto actually favor -1 instead of0? Apparently there's even testPreparedStatement_negative1() testcase... however real testing reveal that using -1 with pool_mode=transactionalshows that actually one hits "prepared statement "S_1" already exists" so it enables two things again.Any chance to split those two features? - Lukasz Pierzan & Jakub Wartak
Attachment
pgsql-jdbc by date: