Re: Faster StrNCpy - Mailing list pgsql-hackers
From | Strong, David |
---|---|
Subject | Re: Faster StrNCpy |
Date | |
Msg-id | B6419AF36AC8524082E1BC17DA2506E802579E2C@USMV-EXCH2.na.uis.unisys.com Whole thread Raw |
In response to | Re: Faster StrNCpy ("Strong, David" <david.strong@unisys.com>) |
Responses |
Re: Faster StrNCpy
|
List | pgsql-hackers |
Mark, Thanks for attaching the C code for your test. I ran a few tests on a 3Ghz Intel Xeon Paxville (dual core) system. I hopethe formatting of this table survives: Method Size N=1024*1024 N=1 MEMCPY 63 6964927 us 582494 us MEMCPY 32 7102497 us 582467 us MEMCPY 16 7116358 us 582538 us MEMCPY 8 6965239 us 582796 us MEMCPY 4 6964722 us 583183 us STRNCPY 63 10131174 us 8843010 us STRNCPY 32 10648202 us 9563868 us STRNCPY 16 9187398 us 7969947 us STRNCPY 8 9275353 us 8042777 us STRNCPY 4 9067570 us 8058532 us STRLCPY 63 15045507 us 14379702 us STRLCPY 32 8960303 us 8120471 us STRLCPY 16 7393607 us 4915457 us STRLCPY 8 7222983 us 3211931 us STRLCPY 4 7181267 us 1725546 us LENCPY 63 7608932 us 4416602 us LENCPY 32 7252849 us 3807535 us LENCPY 16 11680927 us 10331487 us LENCPY 8 10409685 us 9660616 us LENCPY 4 10824632 us 9525082 us The first column is the copy method, the second column is the source string size (size of -DSTRING), the 3rd and 4th columnsare the different -DN settings. The memcpy () call is the clear winner, at all source string sizes. The strncpy () call is better than strlcpy (), untilthe size of the string decreases. This is probably due to the zero padding effect of strncpy. The lencpy () call startsout strong, but degrades as the size of the string decreases. This was a little surprising and I don't have an explanationfor this behavior at this time. The AMD optimization manuals have some interesting examples for optimizations for memcpy, along the lines of cache line copiesand prefetching: http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF#search=%22amd%20optimization%20manual%22 h <http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf#search=%22amd%20optimization%20manual%22> ttp://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf#search=%22amd%20optimization%20manual%22 There also used to be an interesting article on the SGI web site called "Optimizing CPU to Memory Accesses on the SGI VisualWorkstations 320 and 540", but this seems to have been pulled. I did find a copy of the article here: http://eunchul.com/database/board/cat.php?data=Win32_API&board_group=D42a8ff5c3a9b9 Obviously, different copy mechanisms suit different data sizes. So, I added a little debug to the strlcpy () function thatwas added to Postgres the other day. I ran a test against Postgres for ~15 minutes that used 2 client backends and theBG writer - 8330804 calls to strlcpy () were generated by the test. Out of the 8330804 calls, 6226616 calls used a maximum copy size of 2213 bytes e.g. strlcpy (dest, src, 2213) and 2104074calls used a maximum copy size of 64 bytes. I know the 2213 size calls come from the set_ps_display () function. I don't know where the 64 size calls come from, yet. In the 64 size case, with the exception of 35 calls, calls for size 64 are only copying 1 byte - I would assume this is aNULL. In the 2213 size case, 1933027 calls copy 20 bytes; 2189415 calls copy 5 bytes; 85550 calls copy 6 bytes and 2018482 callscopy 7 bytes. Based on this data, it would seem that either memcpy () or strlcpy () calls would be better due to the source string size. Call originating from the set_ps_display () function might be able to use the memcpy () call as the size of the source stringshould be known. The other calls probably need something like strlcpy () as the source string might not be known, althoughusing memcpy () to copy in XX byte blocks might be interesting. David ________________________________ From: pgsql-hackers-owner@postgresql.org on behalf of mark@mark.mielke.cc Sent: Fri 9/29/2006 2:59 PM To: Tom Lane Cc: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Faster StrNCpy On Fri, Sep 29, 2006 at 05:34:30PM -0400, Tom Lane wrote: > mark@mark.mielke.cc writes: > > If anybody is curious, here are my numbers for an AMD X2 3800+: > You did not show your C code, so no one else can reproduce the test on > other hardware. However, it looks like your compiler has unrolled the > memcpy into straight-line 8-byte moves, which makes it pretty hard for > anything operating byte-wise to compete, and is a bit dubious for the > general case anyway (since it requires assuming that the size and > alignment are known at compile time). I did show the .s code. I call into x_memcpy(a, b), meaning that the compiler can't assume anything. It may happen to be aligned. Here are results over 64 Mbytes of memory, to ensure that every call is a cache miss: $ gcc -O3 -std=c99 -DSTRING='"This is a very long sentence that is expected to be very slow."' -DN="(1024*1024)" -o x x.cy.c strlcpy.c ; ./x NONE: 767243 us MEMCPY: 6044137 us STRNCPY: 10741759 us STRLCPY: 12061630 us LENCPY: 9459099 us $ gcc -O3 -std=c99 -DSTRING='"Short sentence."' -DN="(1024*1024)" -o x x.c y.c strlcpy.c ; ./x NONE: 712193 us MEMCPY: 6072312 us STRNCPY: 9982983 us STRLCPY: 6605052 us LENCPY: 7128258 us $ gcc -O3 -std=c99 -DSTRING='""' -DN="(1024*1024)" -o x x.c y.c strlcpy.c ; ./x NONE: 708164 us MEMCPY: 6042817 us STRNCPY: 8885791 us STRLCPY: 5592477 us LENCPY: 6135550 us At least on my machine, memcpy() still comes out on top. Yes, assuming that it is aligned correctly for the machine. Here is unaliagned (all arrays are stored +1 offset in memory): $ gcc -O3 -std=c99 -DSTRING='"This is a very long sentence that is expected to be very slow."' -DN="(1024*1024)" -DALIGN=1-o x x.c y.c strlcpy.c ; ./x NONE: 790932 us MEMCPY: 6591559 us STRNCPY: 10622291 us STRLCPY: 12070007 us LENCPY: 10322541 us $ gcc -O3 -std=c99 -DSTRING='"Short sentence."' -DN="(1024*1024)" -DALIGN=1 -o x x.c y.c strlcpy.c ; ./x NONE: 764577 us MEMCPY: 6631731 us STRNCPY: 9513540 us STRLCPY: 6615345 us LENCPY: 7263392 us $ gcc -O3 -std=c99 -DSTRING='""' -DN="(1024*1024)" -DALIGN=1 -o x x.c y.c strlcpy.c ; ./x NONE: 825689 us MEMCPY: 6607777 us STRNCPY: 8976487 us STRLCPY: 5878088 us LENCPY: 6180358 us Alignment looks like it does impact the results for memcpy(). memcpy() changes from around 6.0 seconds to 6.6 seconds. Overall, though, it is still the winner in all cases accept for strlcpy(), which beats it on very short strings (""). Here is the cache hit case including your strlen+memcpy as 'LENCPY': $ gcc -O3 -std=c99 -DSTRING='"This is a very long sentence that is expected to be very slow."' -DN=1 -o x x.c y.c strlcpy.c; ./x NONE: 696157 us MEMCPY: 825118 us STRNCPY: 7983159 us STRLCPY: 10787462 us LENCPY: 6048339 us $ gcc -O3 -std=c99 -DSTRING='"Short sentence."' -DN=1 -o x x.c y.c strlcpy.c ; ./x NONE: 700201 us MEMCPY: 593701 us STRNCPY: 7577380 us STRLCPY: 3727801 us LENCPY: 3169783 us $ gcc -O3 -std=c99 -DSTRING='""' -DN=1 -o x x.c y.c strlcpy.c ; ./x NONE: 706283 us MEMCPY: 792719 us STRNCPY: 7870425 us STRLCPY: 681334 us LENCPY: 2062983 us First call was every call being a cache hit. With this one, every one is a cache miss, and the 64-byte blocks are spread equally over 64 Mbytes of memory. I've attached the code for your consideration. x.c is the routines I used to perform the tests. y.c is the main program. strlcpy.c is copied from the online reference as is without change. The compilation steps are described above. STRING is the string to try out. N is the number of 64-byte blocks to allocate. ALIGN is the number of bytes to offset the array by when storing / reading / writing. ALIGN should be >= 0. At N=1, it's all in cache. At N=1024*1024 it is taking up 64 Mbytes of RAM. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
pgsql-hackers by date: