* Warren D Smith <warren.wds@gmail.com> [Mar 29. 2014 06:08]:
Let w be a 64-bit word ("unsigned long long," in C language).
unsigned long, unless you are using a platform whose makers are known imbeciles.
The following C instruction sequence
w ^= w>>32; w ^= w<<32; w ^= w>>32; w ^= w>>16; w ^= w<<16; w ^= w>>16; w ^= w>>8; w ^= w<<8; w ^= w>>8; w ^= w>>4; w ^= w<<4; w ^= w>>4; w ^= w>>2; w ^= w<<2; w ^= w>>2; w ^= w>>1; w ^= w<<1; w ^= w>>1;
will overwrite w with its bit-order-reversal.
Nope.
In general for a (2^k)-bit long word this will use 3*k shifts and 3*k XOR operations and with zero(?) or one(?) extra registers consumed for temp storage and no funny masking constants needed.
You can generate the masks needed dynamically. static inline ulong revbin(ulong x) { ulong s = 64 >> 1; ulong m = ~0UL >> s; while ( s ) { x = ( (x & m) << s ) ^ ( (x & (~m)) >> s ); s >>= 1; m ^= (m<<s); } return x; } The masks (m) are nothing mysterious. In binary (16 bits, dots for zeros): ........11111111 ....1111....1111 ..11..11..11..11 .1.1.1.1.1.1.1.1 Looks like binary counting to me.
-- Warren D. Smith [...]
As for machine instructions, assume we have a byte-reversing instruction (as in host<-->network ordering). On AMD64 (and intel clones) that would be (GCC inline asm) static inline ulong asm_bswap(ulong x) { asm ("bswap %0" : "=r" (x) : "0" (x)); return x; } Then revbin can be done quite cheaply as static inline ulong revbin(ulong x) { x = bit_swap_1(x); x = bit_swap_2(x); x = bit_swap_4(x); x = bswap(x); return x; } where static inline ulong bit_swap_1(ulong x) // Return x with neighbor bits swapped. { ulong m = 0x5555555555555555UL; return ((x & m) << 1) | ((x & (~m)) >> 1); } static inline ulong bit_swap_2(ulong x) // Return x with groups of 2 bits swapped. { ulong m = 0x3333333333333333UL; return ((x & m) << 2) | ((x & (~m)) >> 2); } static inline ulong bit_swap_4(ulong x) // Return x with groups of 4 bits swapped. { ulong m = 0x0f0f0f0f0f0f0f0fUL; return ((x & m) << 4) | ((x & (~m)) >> 4); } This is very fast in practice. Beware of table-lookup (byte-wise), it excels in benchmarks but will be a dog as soon as the table fails to be in first level cache. Best regards, jj (everything above is from the fxtbook) P.S.: something for the weekend: RLL word (lex) Fib word (Gray) 0: .1.1.1.1. ......... 1: .1.1.1.11 ........1 2: .1.1.11.. ......1.1 3: .1.1.11.1 ......1.. 4: .1.11..1. ....1.1.. 5: .1.11..11 ....1.1.1 6: .1.11.1.. ....1...1 7: .1.11.1.1 ....1.... 8: .1.11.11. ....1..1. 9: .11..1..1 ..1.1..1. 10: .11..1.1. ..1.1.... 11: .11..1.11 ..1.1...1 12: .11..11.. ..1.1.1.1 13: .11..11.1 ..1.1.1.. 14: .11.1..1. ..1...1.. 15: .11.1..11 ..1...1.1 16: .11.1.1.. ..1.....1 17: .11.1.1.1 ..1...... 18: .11.1.11. ..1....1. 19: .11.11..1 ..1..1.1. 20: .11.11.1. ..1..1... 21: .11.11.11 ..1..1..1 22: 1..1..1.. 1.1..1..1 23: 1..1..1.1 1.1..1... 24: 1..1..11. 1.1..1.1. 25: 1..1.1..1 1.1....1. 26: 1..1.1.1. 1.1...... 27: 1..1.1.11 1.1.....1 28: 1..1.11.. 1.1...1.1 29: 1..1.11.1 1.1...1.. 30: 1..11..1. 1.1.1.1.. 31: 1..11..11 1.1.1.1.1 http://jjj.de/fxt/demo/bits/#bit-rll2