* Warren D Smith <warren.wds@gmail.com> [Apr 08. 2015 17:38]:
[...]
Another paper by Klimov+Shamir remarks that x --> x + 1 + 4*x*x (mod 2^W) yields a single cycle and may be computed using only 2 machine instructions, one to compute y=x*x, the other to compute x+1+4*y using the x86 "lea" (load effective address) instruction. Bet your compiler won't realize that.
cat foo.cc unsigned foo(unsigned x) { unsigned y = x*x; return x + 1 + 4*y; } g++ -fverbose-asm -S -g -O2 -Wall -Werror -c -o asm.s foo.cc as -alhnd asm.s > asm.lst rm asm.s less asm.lst [...] 9:foo.cc **** unsigned foo(unsigned x) 10:foo.cc **** { 64 .loc 1 10 0 65 .cfi_startproc 66 .LVL0: 67 .LBB2: 11:foo.cc **** unsigned y = x*x; 68 .loc 1 11 0 69 0000 89F8 movl %edi, %eax # x, y 70 0002 0FAFC7 imull %edi, %eax # x, y 71 .LVL1: 12:foo.cc **** return x + 1 + 4*y; 72 .loc 1 12 0 73 0005 8D448701 leal 1(%rdi,%rax,4), %eax #, D.2236 74 .LVL2: 75 .LBE2: 13:foo.cc **** } [...] And compilers do MUCH more sophisticated thinks than this. Best regards, jj
[...]