char() overhead on read-only workloads not so insignifcant as the docs claim it is... - Mailing list pgsql-hackers
From | Stefan Kaltenbrunner |
---|---|
Subject | char() overhead on read-only workloads not so insignifcant as the docs claim it is... |
Date | |
Msg-id | 4A337C05.7000008@kaltenbrunner.cc Whole thread Raw |
Responses |
Re: char() overhead on read-only workloads not so
insignifcant as the docs claim it is...
|
List | pgsql-hackers |
I'm currently doing some benchmarking on a Nehalem box(http://www.kaltenbrunner.cc/blog/index.php?/archives/26-Benchmarking-8.4-Chapter-1Read-Only-workloads.html) with 8.4 and while investigating what looks like issues in pgbench I also noticed that using char() has more than a negligable overhead on some (very special) readonly(!) workloads. for example running sysbench in read-only mode against 8.4 results in a profile(for the full run) that looks similiar to: samples % symbol name 981690 11.0656 bcTruelen 359183 4.0487 index_getnext 311128 3.5070 AllocSetAlloc 272330 3.0697 hash_search_with_hash_value 258157 2.9099 LWLockAcquire 195673 2.2056 _bt_compare 190303 2.1451 slot_deform_tuple 168101 1.8948 PostgresMain 164191 1.8508 _bt_checkkeys 126110 1.4215 FunctionCall2 123965 1.3973 SearchCatCache 120629 1.3597 LWLockRelease the default sysbench mode actually uses a number of different queries and the ones dealing with char() are actually only a small part of the full set of queries sent. The specific query is causing bcTruelen to show up in the profile is: "SELECT c from sbtest where id between $1 and $2 order by c" where the parameters are for example $1 = '5009559', $2 = '5009658' - ie ranges of 100. benchmarking only that query results in: samples % symbol name 2148182 23.5861 bcTruelen 369463 4.0565 index_getnext 362784 3.9832 AllocSetAlloc 284198 3.1204 slot_deform_tuple 185279 2.0343 _bt_checkkeys 180119 1.9776 LWLockAcquire 172733 1.8965 appendBinaryStringInfo 144158 1.5828 internal_putbytes 141040 1.5486 AllocSetFree 138093 1.5162 printtup 124255 1.3643 hash_search_with_hash_value 117054 1.2852 heap_form_minimal_tuple at around 46000 queries/s changing the fault sysbench schema from: Table "public.sbtest" Column | Type | Modifiers --------+----------------+----------------------------------------------------- id | integer | not null default nextval('sbtest_id_seq'::regclass) k | integer | not null default 0 c | character(120) | not null default''::bpchar pad | character(60) | not null default ''::bpchar Indexes: "sbtest_pkey" PRIMARY KEY, btree (id) "k" btree (k) to Table "public.sbtest" Column | Type | Modifiers --------+-------------------+----------------------------------------------------- id | integer | not nulldefault nextval('sbtest_id_seq'::regclass) k | integer | not null default 0 c | character varying | not nulldefault ''::character varying pad | character(60) | not null default ''::bpchar Indexes: "sbtest_pkey" PRIMARY KEY, btree (id) "k" btree (k) results in a near 50%(!) speedup in terms of tps to around 67000 queries/s. This is however an extreme case because the c column actually contains no data at all (except for an empty string). the profile for the changed testcase looks like: 430797 5.2222 index_getnext 396750 4.8095 AllocSetAlloc 345508 4.1883 slot_deform_tuple 228222 2.7666 appendBinaryStringInfo 227766 2.7610 _bt_checkkeys 193818 2.3495 LWLockAcquire 179925 2.1811 internal_putbytes 168871 2.0471 printtup 152026 1.8429 AllocSetFree 146333 1.7739 heap_form_minimal_tuple 144305 1.7493 FunctionCall2 128320 1.5555 hash_search_with_hash_value at the very least we should reconsider this part of our docs: " There is no performance difference between these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column." from http://www.postgresql.org/docs/8.4/static/datatype-character.html regards Stefan
pgsql-hackers by date: