Re: CopyReadLineText optimization - Mailing list pgsql-patches
From | Heikki Linnakangas |
---|---|
Subject | Re: CopyReadLineText optimization |
Date | |
Msg-id | 47CEA6AC.9090106@enterprisedb.com Whole thread Raw |
In response to | Re: CopyReadLineText optimization ("Heikki Linnakangas" <heikki@enterprisedb.com>) |
Responses |
Re: CopyReadLineText optimization
|
List | pgsql-patches |
Heikki Linnakangas wrote: > I still need to test the worst-case performance, with input that has a > lot of escapes. Ok, I've done some more performance testing with this. I tested COPY FROM with a table with a single "text" column. There was a million rows in the table, with a 1000 character long string: postgres=# CREATE TABLE narrowtable2 (id text); CREATE TABLE postgres=# INSERT INTO narrowtable2 SELECT repeat(E'\\', 1000) FROM generate_series(1, 1000000); INSERT 0 1000000 After that, I dumped that to a file, and loaded it back using COPY FROM: time ~/installations/cvshead/bin/psql postgres -c "BEGIN; TRUNCATE narrowtable2; COPY narrowtable2 FROM '/home/perftester/narrowtable3.tbl'; ROLLBACK;" I repeated the test with different frequencies of backslashes in the string, with and without the patch, and the took the smallest number of each test case: backslashes with without patch all 24.9 15.6 every 4th 12.7 11.4 every 8th 10.4 10.7 every 16th 8.7 10.3 none 6.8 9.8 So the overhead of using memchr slows us down if there's a lot of escape or quote characters. The breakeven point seems to be about 1 in 8 characters. I'm not sure if that's a good tradeoff or not... I also tested a table with single integer column, and found no meaningful difference (10.5 without patch vs 10.6 with patch). oprofile shows that in this test case, only ~5% of the CPU time is spent in CopyReadLineText, and the patch doesn't change that. Without patch: samples % image name app name symbol name 7563 12.7220 no-vmlinux postgres (no symbols) 4050 6.8127 postgres postgres DoCopy 3334 5.6083 postgres postgres LWLockAcquire 3238 5.4468 postgres postgres CopyReadLine 2900 4.8782 postgres postgres LWLockRelease 2781 4.6780 libc-2.7.so postgres __GI_____strtoll_l_internal 2778 4.6730 postgres postgres heap_formtuple 2636 4.4341 postgres postgres hash_any 2087 3.5106 no-vmlinux no-vmlinux (no symbols) 1748 2.9404 libc-2.7.so postgres memset 1724 2.9000 postgres postgres PinBuffer 1670 2.8092 postgres postgres PageAddItem 1645 2.7671 postgres postgres heap_insert 1459 2.4542 postgres postgres UnpinBuffer 1457 2.4509 postgres postgres ReadBuffer_common 1321 2.2221 postgres postgres hash_search_with_hash_value 1278 2.1498 postgres postgres MarkBufferDirty 1219 2.0505 oprofiled oprofiled (no symbols) 972 1.6350 postgres postgres pg_verify_mbstr_len 756 1.2717 postgres postgres RelationPutHeapTuple 665 1.1186 postgres postgres pg_atoi 631 1.0614 postgres postgres RelationGetBufferForTuple 613 1.0312 postgres postgres AllocSetReset ... With patch: samples % image name app name symbol name 42720 18.1450 no-vmlinux postgres (no symbols) 15367 6.5270 postgres postgres DoCopy 11831 5.0251 postgres postgres LWLockAcquire 11500 4.8845 no-vmlinux no-vmlinux (no symbols) 10182 4.3247 postgres postgres LWLockRelease 9912 4.2100 libc-2.7.so postgres __GI_____strtoll_l_internal 9811 4.1671 postgres postgres hash_any 8824 3.7479 postgres postgres heap_formtuple 7459 3.1682 postgres postgres CopyReadLine 7187 3.0526 postgres postgres PageAddItem 6313 2.6814 libc-2.7.so postgres memset 5842 2.4813 postgres postgres PinBuffer 5230 2.2214 postgres postgres UnpinBuffer 5160 2.1917 postgres postgres heap_insert 4838 2.0549 postgres postgres ReadBuffer_common 4819 2.0468 postgres postgres hash_search_with_hash_value 4691 1.9925 postgres postgres MarkBufferDirty 3675 1.5609 libc-2.7.so postgres memchr 3617 1.5363 postgres postgres AllocSetAlloc 3585 1.5227 postgres postgres pg_verify_mbstr_len 3326 1.4127 postgres postgres AllocSetReset ... These tests were on a test server with a dual-core 64-bit Intel Xeons. I'd still like to hear reports from other platforms. Another thing that seems like an obvious win is to merge CopyReadLine and CopyReadAttributesText/CSV so that we do just one pass over the input. But that seems suspiciously obvious, I wonder if I'm missing something. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-patches by date: