Thread: [COMMITTERS] pgsql: Use radix tree for character encoding conversions.
[COMMITTERS] pgsql: Use radix tree for character encoding conversions.
From
Heikki Linnakangas
Date:
Use radix tree for character encoding conversions. Replace the mapping tables used to convert between UTF-8 and other character encodings with new radix tree-based maps. Looking up an entry in a radix tree is much faster than a binary search in the old maps. As a bonus, the radix tree representation is also more compact, making the binaries slightly smaller. The "combined" maps work the same as before, with binary search. They are much smaller than the main tables, so it doesn't matter so much. However, the "combined" maps are now stored in the same .map files as the main tables. This seems more clear, since they're always used together, and generated from the same source files. Patch by Kyotaro Horiguchi, with lot of hacking by me at various stages. Reviewed by Michael Paquier and Daniel Gustafsson. Discussion: https://www.postgresql.org/message-id/20170306.171609.204324917.horiguchi.kyotaro%40lab.ntt.co.jp Branch ------ master Details ------- http://git.postgresql.org/pg/commitdiff/aeed17d00037950a16cc5ebad5b5592e5fa1ad0f Modified Files -------------- src/backend/utils/mb/Unicode/Makefile | 10 +- src/backend/utils/mb/Unicode/UCS_to_BIG5.pl | 12 +- src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl | 10 +- .../utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl | 22 +- src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl | 189 +- src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl | 14 +- src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl | 10 +- src/backend/utils/mb/Unicode/UCS_to_GB18030.pl | 10 +- src/backend/utils/mb/Unicode/UCS_to_JOHAB.pl | 12 +- .../utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl | 21 +- src/backend/utils/mb/Unicode/UCS_to_SJIS.pl | 32 +- src/backend/utils/mb/Unicode/UCS_to_UHC.pl | 12 +- src/backend/utils/mb/Unicode/UCS_to_most.pl | 6 +- src/backend/utils/mb/Unicode/big5_to_utf8.map | 18321 ++------ src/backend/utils/mb/Unicode/convutils.pm | 806 +- src/backend/utils/mb/Unicode/euc_cn_to_utf8.map | 9723 +---- .../utils/mb/Unicode/euc_jis_2004_to_utf8.map | 14744 ++----- .../mb/Unicode/euc_jis_2004_to_utf8_combined.map | 29 - src/backend/utils/mb/Unicode/euc_jp_to_utf8.map | 17337 ++------ src/backend/utils/mb/Unicode/euc_kr_to_utf8.map | 10723 ++--- src/backend/utils/mb/Unicode/euc_tw_to_utf8.map | 31407 ++++---------- src/backend/utils/mb/Unicode/gb18030_to_utf8.map | 41882 +++++-------------- src/backend/utils/mb/Unicode/gbk_to_utf8.map | 28344 +++---------- .../utils/mb/Unicode/iso8859_10_to_utf8.map | 237 +- .../utils/mb/Unicode/iso8859_13_to_utf8.map | 237 +- .../utils/mb/Unicode/iso8859_14_to_utf8.map | 237 +- .../utils/mb/Unicode/iso8859_15_to_utf8.map | 237 +- .../utils/mb/Unicode/iso8859_16_to_utf8.map | 237 +- src/backend/utils/mb/Unicode/iso8859_2_to_utf8.map | 205 +- src/backend/utils/mb/Unicode/iso8859_3_to_utf8.map | 198 +- src/backend/utils/mb/Unicode/iso8859_4_to_utf8.map | 205 +- src/backend/utils/mb/Unicode/iso8859_5_to_utf8.map | 237 +- src/backend/utils/mb/Unicode/iso8859_6_to_utf8.map | 158 +- src/backend/utils/mb/Unicode/iso8859_7_to_utf8.map | 234 +- src/backend/utils/mb/Unicode/iso8859_8_to_utf8.map | 201 +- src/backend/utils/mb/Unicode/iso8859_9_to_utf8.map | 205 +- src/backend/utils/mb/Unicode/johab_to_utf8.map | 23327 +++-------- src/backend/utils/mb/Unicode/koi8r_to_utf8.map | 237 +- src/backend/utils/mb/Unicode/koi8u_to_utf8.map | 237 +- .../utils/mb/Unicode/shift_jis_2004_to_utf8.map | 14503 ++----- .../mb/Unicode/shift_jis_2004_to_utf8_combined.map | 29 - src/backend/utils/mb/Unicode/sjis_to_utf8.map | 10202 ++--- src/backend/utils/mb/Unicode/uhc_to_utf8.map | 23788 +++-------- src/backend/utils/mb/Unicode/utf8_to_big5.map | 17809 ++------ src/backend/utils/mb/Unicode/utf8_to_euc_cn.map | 11487 ++--- .../utils/mb/Unicode/utf8_to_euc_jis_2004.map | 23868 ++++++----- .../mb/Unicode/utf8_to_euc_jis_2004_combined.map | 29 - src/backend/utils/mb/Unicode/utf8_to_euc_jp.map | 20314 ++++----- src/backend/utils/mb/Unicode/utf8_to_euc_kr.map | 14617 +++---- src/backend/utils/mb/Unicode/utf8_to_euc_tw.map | 24574 +++-------- src/backend/utils/mb/Unicode/utf8_to_gb18030.map | 40292 +++++------------- src/backend/utils/mb/Unicode/utf8_to_gbk.map | 26061 ++---------- .../utils/mb/Unicode/utf8_to_iso8859_10.map | 240 +- .../utils/mb/Unicode/utf8_to_iso8859_13.map | 239 +- .../utils/mb/Unicode/utf8_to_iso8859_14.map | 272 +- .../utils/mb/Unicode/utf8_to_iso8859_15.map | 227 +- .../utils/mb/Unicode/utf8_to_iso8859_16.map | 257 +- src/backend/utils/mb/Unicode/utf8_to_iso8859_2.map | 240 +- src/backend/utils/mb/Unicode/utf8_to_iso8859_3.map | 232 +- src/backend/utils/mb/Unicode/utf8_to_iso8859_4.map | 240 +- src/backend/utils/mb/Unicode/utf8_to_iso8859_5.map | 229 +- src/backend/utils/mb/Unicode/utf8_to_iso8859_6.map | 171 +- src/backend/utils/mb/Unicode/utf8_to_iso8859_7.map | 248 +- src/backend/utils/mb/Unicode/utf8_to_iso8859_8.map | 194 +- src/backend/utils/mb/Unicode/utf8_to_iso8859_9.map | 226 +- src/backend/utils/mb/Unicode/utf8_to_johab.map | 23380 +++-------- src/backend/utils/mb/Unicode/utf8_to_koi8r.map | 301 +- src/backend/utils/mb/Unicode/utf8_to_koi8u.map | 312 +- .../utils/mb/Unicode/utf8_to_shift_jis_2004.map | 18954 ++++----- .../mb/Unicode/utf8_to_shift_jis_2004_combined.map | 29 - src/backend/utils/mb/Unicode/utf8_to_sjis.map | 11648 ++---- src/backend/utils/mb/Unicode/utf8_to_uhc.map | 23612 +++-------- src/backend/utils/mb/Unicode/utf8_to_win1250.map | 266 +- src/backend/utils/mb/Unicode/utf8_to_win1251.map | 259 +- src/backend/utils/mb/Unicode/utf8_to_win1252.map | 267 +- src/backend/utils/mb/Unicode/utf8_to_win1253.map | 244 +- src/backend/utils/mb/Unicode/utf8_to_win1254.map | 276 +- src/backend/utils/mb/Unicode/utf8_to_win1255.map | 260 +- src/backend/utils/mb/Unicode/utf8_to_win1256.map | 320 +- src/backend/utils/mb/Unicode/utf8_to_win1257.map | 259 +- src/backend/utils/mb/Unicode/utf8_to_win1258.map | 284 +- src/backend/utils/mb/Unicode/utf8_to_win866.map | 280 +- src/backend/utils/mb/Unicode/utf8_to_win874.map | 225 +- src/backend/utils/mb/Unicode/win1250_to_utf8.map | 232 +- src/backend/utils/mb/Unicode/win1251_to_utf8.map | 236 +- src/backend/utils/mb/Unicode/win1252_to_utf8.map | 232 +- src/backend/utils/mb/Unicode/win1253_to_utf8.map | 220 +- src/backend/utils/mb/Unicode/win1254_to_utf8.map | 230 +- src/backend/utils/mb/Unicode/win1255_to_utf8.map | 214 +- src/backend/utils/mb/Unicode/win1256_to_utf8.map | 237 +- src/backend/utils/mb/Unicode/win1257_to_utf8.map | 225 +- src/backend/utils/mb/Unicode/win1258_to_utf8.map | 228 +- src/backend/utils/mb/Unicode/win866_to_utf8.map | 237 +- src/backend/utils/mb/Unicode/win874_to_utf8.map | 204 +- src/backend/utils/mb/conv.c | 251 +- .../conversion_procs/utf8_and_big5/utf8_and_big5.c | 4 +- .../utf8_and_cyrillic/utf8_and_cyrillic.c | 8 +- .../utf8_and_euc2004/utf8_and_euc2004.c | 6 +- .../utf8_and_euc_cn/utf8_and_euc_cn.c | 4 +- .../utf8_and_euc_jp/utf8_and_euc_jp.c | 4 +- .../utf8_and_euc_kr/utf8_and_euc_kr.c | 4 +- .../utf8_and_euc_tw/utf8_and_euc_tw.c | 4 +- .../utf8_and_gb18030/utf8_and_gb18030.c | 4 +- .../conversion_procs/utf8_and_gbk/utf8_and_gbk.c | 4 +- .../utf8_and_iso8859/utf8_and_iso8859.c | 75 +- .../utf8_and_johab/utf8_and_johab.c | 4 +- .../conversion_procs/utf8_and_sjis/utf8_and_sjis.c | 4 +- .../utf8_and_sjis2004/utf8_and_sjis2004.c | 6 +- .../conversion_procs/utf8_and_uhc/utf8_and_uhc.c | 4 +- .../conversion_procs/utf8_and_win/utf8_and_win.c | 54 +- src/include/mb/pg_wchar.h | 84 +- 111 files changed, 147742 insertions(+), 367346 deletions(-)