Re: [HACKERS] Some platform-specific MemSet research - Mailing list pgsql-patches
From | Bruce Momjian |
---|---|
Subject | Re: [HACKERS] Some platform-specific MemSet research |
Date | |
Msg-id | 200602031351.k13Dp6501540@candle.pha.pa.us Whole thread Raw |
List | pgsql-patches |
Rocco Altier wrote: > I wanted to chime in that I also see this speedup from using XLC 6.0 > (IBM's cc), even in 32bit mode. I have tested on AIX 5.2 and 5.1. > > I think this would be good to include in the regular release. > > Not sure how many people are running older versions of AIX that would > want a new version of postgres. > OK, attached patch applied that turns off MemSet on AIX. If we need to tweek it for AIX versions, let us know. I added a constant test in the macro that should allow the optimizer to call memset() directly rather than doing our MemSet comparisons, and the optimizer should eliminate the constant test for non-AIX builds. --------------------------------------------------------------------------- > -rocco > > > > > -----Original Message----- > > From: pgsql-hackers-owner@postgresql.org > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Bruce Momjian > > Sent: Wednesday, February 01, 2006 12:11 PM > > To: Seneca Cunningham > > Cc: Martijn van Oosterhout; pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] Some platform-specific MemSet research > > > > > > > > My guess is that there is some really fast assembler for > > memory copy on > > AIX, and only libc memset() has it. If you want, we can make > > MEMSET_LOOP_LIMIT in c.h a configure value, and allow template/aix to > > set it to zero, causing memset() to be always used. > > > > Are you prepared to make this optimization decision for all AIX users > > using gcc, or only for certain versions? > > > > -------------------------------------------------------------- > > ------------- > > > > Seneca Cunningham wrote: > > > Martijn van Oosterhout wrote: > > > > On Tue, Jan 24, 2006 at 05:24:28PM -0500, Seneca Cunningham wrote: > > > > > > > >>After reading the post on -patches proposing that MemSet > > be changed to > > > >>use long instead of int32 on the grounds that a pair of > > x86-64 linux > > > >>boxes took less time to execute the long code 64*10^6 > > times[1], I took a > > > >>look at how the testcode performed on AIX with gcc. > > While the switch to > > > >>long did result in a minor performance improvement, dropping the > > > >>MemSetLoop in favour of the native memset resulted in the > > tests taking > > > >>~25% the time as the MemSetLoop-like int loop. The 32-bit > > linux system I > > > >>ran the expanded tests on showed that for the buffer size > > range that > > > >>postgres can use the looping MemSet instead of memset > > (size <= 1024 > > > >>bytes), MemSet generally had better performance. > > > > > > > > > > > > Could you please check the asm output to see what's going > > on. We've had > > > > tests like these produce odd results in the past because > > the compiler > > > > optimised away stuff that didn't have any effect. Since > > every memset > > > > after the first is a no-op, you want to make sure it's > > still actually > > > > doing the work... > > > > > > Well, on both linux and AIX, all 30 of the 64000000 iterations loops > > > from the source exist (10 int, 10 long, 10 memset). According to my > > > understanding of the assembler, memset itself is only > > called for values > > > >= 64 bytes on both platforms and the memset is called in > > each iteration. > > > > > > The assembler for the 64 byte loops, with prepended line > > number, first > > > loop MemSetLoop int-variant, second loop memset, third loop > > MemSetLoop > > > long-variant: > > > > > > 64-bit AIX: > > > > > > 419 addi 3,1,112 > > > 420 li 4,0 > > > 421 bl .gettimeofday > > > 422 nop > > > 423 lis 10,0x3d0 > > > 424 cmpld 6,26,16 > > > 425 li 11,0 > > > 426 ori 10,10,36864 > > > 427 L..41: > > > 428 bge 6,L..42 > > > 429 mr 9,26 > > > 430 li 0,0 > > > 431 L..44: > > > 432 stw 0,0(9) > > > 433 addi 9,9,4 > > > 434 cmpld 7,16,9 > > > 435 bgt 7,L..44 > > > 436 L..42: > > > 437 addi 0,11,1 > > > 438 extsw 11,0 > > > 439 cmpw 7,11,10 > > > 440 bne+ 7,L..41 > > > 441 li 4,0 > > > 442 mr 3,22 > > > 443 lis 25,0x3d0 > > > 444 li 28,0 > > > 445 bl .gettimeofday > > > 446 nop > > > 447 li 4,64 > > > 448 addi 5,1,112 > > > 449 ld 3,LC..9(2) > > > 450 mr 6,22 > > > 451 ori 25,25,36864 > > > 452 bl .print_time > > > 453 addi 3,1,112 > > > 454 li 4,0 > > > 455 bl .gettimeofday > > > 456 nop > > > 457 L..46: > > > 458 mr 3,26 > > > 459 li 4,0 > > > 460 li 5,64 > > > 461 bl .memset > > > 462 nop > > > 463 addi 0,28,1 > > > 464 extsw 28,0 > > > 465 cmpw 7,28,25 > > > 466 bne+ 7,L..46 > > > 467 li 4,0 > > > 468 mr 3,22 > > > 469 bl .gettimeofday > > > 470 nop > > > 471 li 4,64 > > > 472 addi 5,1,112 > > > 473 ld 3,LC..11(2) > > > 474 mr 6,22 > > > 475 bl .print_time > > > 476 addi 3,1,112 > > > 477 li 4,0 > > > 478 bl .gettimeofday > > > 479 nop > > > 480 lis 10,0x3d0 > > > 481 cmpld 6,26,16 > > > 482 li 11,0 > > > 483 ori 10,10,36864 > > > 484 L..48: > > > 485 bge 6,L..49 > > > 486 mr 9,26 > > > 487 li 0,0 > > > 488 L..51: > > > 489 std 0,0(9) > > > 490 addi 9,9,8 > > > 491 cmpld 7,9,16 > > > 492 blt 7,L..51 > > > 493 L..49: > > > 494 addi 0,11,1 > > > 495 extsw 11,0 > > > 496 cmpw 7,11,10 > > > 497 bne+ 7,L..48 > > > 498 li 4,0 > > > 499 mr 3,22 > > > 500 bl .gettimeofday > > > 501 nop > > > 502 li 4,64 > > > 503 addi 5,1,112 > > > 504 ld 3,LC..13(2) > > > 505 mr 6,22 > > > 506 bl .print_time > > > > > > > > > 32-bit Linux: > > > > > > 387 popl %ecx > > > 388 popl %edi > > > 389 pushl $0 > > > 390 leal -20(%ebp), %edx > > > 391 pushl %edx > > > 392 call gettimeofday > > > 393 xorl %edx, %edx > > > 394 addl $16, %esp > > > 395 .L41: > > > 396 movl -4160(%ebp), %eax > > > 397 cmpl %eax, -4144(%ebp) > > > 398 jae .L42 > > > 399 movl -4144(%ebp), %eax > > > 400 .L44: > > > 401 movl $0, (%eax) > > > 402 addl $4, %eax > > > 403 cmpl %eax, -4160(%ebp) > > > 404 ja .L44 > > > 405 .L42: > > > 406 incl %edx > > > 407 cmpl $64000000, %edx > > > 408 jne .L41 > > > 409 subl $8, %esp > > > 410 pushl $0 > > > 411 leal -28(%ebp), %edx > > > 412 pushl %edx > > > 413 call gettimeofday > > > 414 leal -28(%ebp), %eax > > > 415 movl %eax, (%esp) > > > 416 leal -20(%ebp), %ecx > > > 417 movl $64, %edx > > > 418 movl $.LC5, %eax > > > 419 call print_time > > > 420 popl %eax > > > 421 popl %edx > > > 422 pushl $0 > > > 423 leal -20(%ebp), %edx > > > 424 pushl %edx > > > 425 call gettimeofday > > > 426 xorl %edi, %edi > > > 427 addl $16, %esp > > > 428 .L46: > > > 429 pushl %eax > > > 430 pushl $64 > > > 431 pushl $0 > > > 432 movl -4144(%ebp), %ecx > > > 433 pushl %ecx > > > 434 call memset > > > 435 incl %edi > > > 436 addl $16, %esp > > > 437 cmpl $64000000, %edi > > > 438 jne .L46 > > > 439 subl $8, %esp > > > 440 pushl $0 > > > 441 leal -28(%ebp), %eax > > > 442 pushl %eax > > > 443 call gettimeofday > > > 444 leal -28(%ebp), %edx > > > 445 movl %edx, (%esp) > > > 446 leal -20(%ebp), %ecx > > > 447 movl $64, %edx > > > 448 movl $.LC6, %eax > > > 449 call print_time > > > 450 popl %eax > > > 451 popl %edx > > > 452 pushl $0 > > > 453 leal -20(%ebp), %eax > > > 454 pushl %eax > > > 455 call gettimeofday > > > 456 xorl %edx, %edx > > > 457 addl $16, %esp > > > 458 .L48: > > > 459 movl -4160(%ebp), %eax > > > 460 cmpl %eax, -4144(%ebp) > > > 461 jae .L49 > > > 462 movl -4144(%ebp), %eax > > > 463 .L51: > > > 464 movl $0, (%eax) > > > 465 addl $4, %eax > > > 466 cmpl -4160(%ebp), %eax > > > 467 jb .L51 > > > 468 .L49: > > > 469 incl %edx > > > 470 cmpl $64000000, %edx > > > 471 jne .L48 > > > 472 subl $8, %esp > > > 473 pushl $0 > > > 474 leal -28(%ebp), %edx > > > 475 pushl %edx > > > 476 call gettimeofday > > > 477 leal -28(%ebp), %eax > > > 478 movl %eax, (%esp) > > > 479 leal -20(%ebp), %ecx > > > 480 movl $64, %edx > > > 481 movl $.LC7, %eax > > > 482 call print_time > > > > > > -- > > > Seneca Cunningham > > > scunning@ca.afilias.info > > > > > > ---------------------------(end of > > broadcast)--------------------------- > > > TIP 5: don't forget to increase your free space map settings > > > > > > > -- > > Bruce Momjian | http://candle.pha.pa.us > > pgman@candle.pha.pa.us | (610) 359-1001 > > + If your life is a hard drive, | 13 Roberts Road > > + Christ can be your backup. | Newtown Square, > > Pennsylvania 19073 > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 9: In versions below 8.0, the planner will ignore your desire to > > choose an index scan if your joining column's datatypes do not > > match > > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: configure =================================================================== RCS file: /cvsroot/pgsql/configure,v retrieving revision 1.473 diff -c -c -r1.473 configure *** configure 17 Jan 2006 23:52:27 -0000 1.473 --- configure 3 Feb 2006 13:48:02 -0000 *************** *** 21516,21521 **** --- 21516,21532 ---- SHMEM_IMPLEMENTATION="src/backend/port/sysv_shmem.c" + # If not set in template file, set bytes to use libc memset() + if test x"$MEMSET_LOOP_LIMIT" = x"" ; then + MEMSET_LOOP_LIMIT=1024 + fi + + cat >>confdefs.h <<_ACEOF + #define MEMSET_LOOP_LIMIT ${MEMSET_LOOP_LIMIT} + _ACEOF + + + if test "$enable_nls" = yes ; then echo "$as_me:$LINENO: checking for library containing gettext" >&5 Index: configure.in =================================================================== RCS file: /cvsroot/pgsql/configure.in,v retrieving revision 1.443 diff -c -c -r1.443 configure.in *** configure.in 17 Jan 2006 23:52:30 -0000 1.443 --- configure.in 3 Feb 2006 13:48:06 -0000 *************** *** 1249,1254 **** --- 1249,1261 ---- SHMEM_IMPLEMENTATION="src/backend/port/sysv_shmem.c" + # If not set in template file, set bytes to use libc memset() + if test x"$MEMSET_LOOP_LIMIT" = x"" ; then + MEMSET_LOOP_LIMIT=1024 + fi + AC_DEFINE_UNQUOTED(MEMSET_LOOP_LIMIT, ${MEMSET_LOOP_LIMIT}, [Define bytes to use libc memset().]) + + if test "$enable_nls" = yes ; then PGAC_CHECK_GETTEXT fi Index: src/include/c.h =================================================================== RCS file: /cvsroot/pgsql/src/include/c.h,v retrieving revision 1.194 diff -c -c -r1.194 c.h *** src/include/c.h 5 Jan 2006 03:01:37 -0000 1.194 --- src/include/c.h 3 Feb 2006 13:48:09 -0000 *************** *** 614,622 **** * overhead. However, we have also found that the loop is faster than * native libc memset() on some platforms, even those with assembler * memset() functions. More research needs to be done, perhaps with ! * platform-specific MEMSET_LOOP_LIMIT values or tests in configure. ! * ! * bjm 2002-10-08 */ #define MemSet(start, val, len) \ do \ --- 614,620 ---- * overhead. However, we have also found that the loop is faster than * native libc memset() on some platforms, even those with assembler * memset() functions. More research needs to be done, perhaps with ! * MEMSET_LOOP_LIMIT tests in configure. */ #define MemSet(start, val, len) \ do \ *************** *** 629,635 **** if ((((long) _vstart) & INT_ALIGN_MASK) == 0 && \ (_len & INT_ALIGN_MASK) == 0 && \ _val == 0 && \ ! _len <= MEMSET_LOOP_LIMIT) \ { \ int32 *_start = (int32 *) _vstart; \ int32 *_stop = (int32 *) ((char *) _start + _len); \ --- 627,638 ---- if ((((long) _vstart) & INT_ALIGN_MASK) == 0 && \ (_len & INT_ALIGN_MASK) == 0 && \ _val == 0 && \ ! _len <= MEMSET_LOOP_LIMIT && \ ! /* \ ! * If MEMSET_LOOP_LIMIT == 0, optimizer should find \ ! * the whole "if" false at compile time. \ ! */ \ ! MEMSET_LOOP_LIMIT != 0) \ { \ int32 *_start = (int32 *) _vstart; \ int32 *_stop = (int32 *) ((char *) _start + _len); \ *************** *** 640,647 **** memset(_vstart, _val, _len); \ } while (0) - #define MEMSET_LOOP_LIMIT 1024 - /* * MemSetAligned is the same as MemSet except it omits the test to see if * "start" is word-aligned. This is okay to use if the caller knows a-priori --- 643,648 ---- *************** *** 657,663 **** \ if ((_len & INT_ALIGN_MASK) == 0 && \ _val == 0 && \ ! _len <= MEMSET_LOOP_LIMIT) \ { \ int32 *_stop = (int32 *) ((char *) _start + _len); \ while (_start < _stop) \ --- 658,665 ---- \ if ((_len & INT_ALIGN_MASK) == 0 && \ _val == 0 && \ ! _len <= MEMSET_LOOP_LIMIT && \ ! MEMSET_LOOP_LIMIT != 0) \ { \ int32 *_stop = (int32 *) ((char *) _start + _len); \ while (_start < _stop) \ *************** *** 679,684 **** --- 681,687 ---- #define MemSetTest(val, len) \ ( ((len) & INT_ALIGN_MASK) == 0 && \ (len) <= MEMSET_LOOP_LIMIT && \ + MEMSET_LOOP_LIMIT != 0 && \ (val) == 0 ) #define MemSetLoop(start, val, len) \ Index: src/include/pg_config.h.in =================================================================== RCS file: /cvsroot/pgsql/src/include/pg_config.h.in,v retrieving revision 1.90 diff -c -c -r1.90 pg_config.h.in *** src/include/pg_config.h.in 17 Jan 2006 23:52:31 -0000 1.90 --- src/include/pg_config.h.in 3 Feb 2006 13:48:12 -0000 *************** *** 576,581 **** --- 576,584 ---- /* Define as the maximum alignment requirement of any C data type. */ #undef MAXIMUM_ALIGNOF + /* Define bytes to use libc memset(). */ + #undef MEMSET_LOOP_LIMIT + /* Define to the address where bug reports for this package should be sent. */ #undef PACKAGE_BUGREPORT Index: src/template/aix =================================================================== RCS file: /cvsroot/pgsql/src/template/aix,v retrieving revision 1.17 diff -c -c -r1.17 aix *** src/template/aix 2 Dec 2004 18:11:40 -0000 1.17 --- src/template/aix 3 Feb 2006 13:48:13 -0000 *************** *** 8,10 **** --- 8,14 ---- ;; esac fi + + # native memset() is faster, 2006-02-03 + # XLC 6.0, (IBM's cc), tested on AIX 5.2 and 5.1 + MEMSET_LOOP_LIMIT=0
pgsql-patches by date: