* Joerg Arndt <arndt@jjj.de> [Apr 07. 2014 07:22]:
[...]
I didn't manage to find anything for 32 bits (neither 31 bits) better than this.
I offer 100 Euro for a routine for 32 bits (or 31, your choice) that uses one op less than my routine for 64 (or 63 when you pick 31 above).
It is indeed possible! For 32 bits: uint jj3_revbin32_4a(uint x) { uint z; #if 1 z = (x ^ (x >> 1)) & 0x4a52a529U; x ^= z ^ (z << 1); z = (x ^ (x >> 3)) & 0x18c60c63U; x ^= z ^ (z << 3); z = (x ^ (x >> 10)) & 0x003e001fU; x ^= z ^ (z << 10); z = (x ^ (x >> 17)) & 0x00007fffU; x ^= z ^ (z << 17); #else // same code using GCC's binary literals z = (x ^ (x >> 1)) & 0b01001010010100101010010100101001U; x ^= z ^ (z << 1); z = (x ^ (x >> 3)) & 0b00011000110001100000110001100011U; x ^= z ^ (z << 3); z = (x ^ (x >> 10)) & 0b00000000001111100000000000011111U; x ^= z ^ (z << 10); z = (x ^ (x >> 17)) & 0b00000000000000000111111111111111U; x ^= z ^ (z << 17); #endif return x; } Just 4 steps! And so we can do 64 in 5 steps: static inline ulong jj3_revbin64_5(ulong x) { ulong z; z = (x ^ (x >> 1)) & 0x4a52a5294a52a529UL; x ^= z ^ (z << 1); z = (x ^ (x >> 3)) & 0x18c60c6318c60c63UL; x ^= z ^ (z << 3); z = (x ^ (x >> 10)) & 0x003e001f003e001fUL; x ^= z ^ (z << 10); z = (x ^ (x >> 17)) & 0x00007fff00007fffUL; x ^= z ^ (z << 17); z = (x ^ (x >> 32)) & 0x00000000ffffffffUL; x ^= z ^ (z << 32); return x; } Variants for 32 bits are (from here on binary literals only) uint jj3_revbin32_4b(uint x) { uint z; z = (x ^ (x >> 1)) & 0b01010000101000010100001010000101U; x ^= z ^ (z << 1); z = (x ^ (x >> 2)) & 0b00110010011001001100100110010011U; x ^= z ^ (z << 2); z = (x ^ (x >> 7)) & 0b00000001111000000011100000001111U; x ^= z ^ (z << 7); z = (x ^ (x >> 21)) & 0b00000000000000000000011111111111U; x ^= z ^ (z << 21); return x; } uint jj3_revbin32_4c(uint x) { uint z; z = (x ^ (x >> 1)) & 0b01001001001001001001001001001001U; x ^= z ^ (z << 1); z = (x ^ (x >> 3)) & 0b00011000100011000100011000100011U; x ^= z ^ (z << 3); z = (x ^ (x >> 9)) & 0b00000000011111000000000000011111U; x ^= z ^ (z << 9); z = (x ^ (x >> 18)) & 0b00000000000000000011111111111111U; x ^= z ^ (z << 18); return x; } uint jj3_revbin32_4d(uint x) { uint z; z = (x ^ (x >> 2)) & 0b00100100010010010010010010001001U; x ^= z ^ (z << 2); z = (x ^ (x >> 3)) & 0b00011100001110001110001110000111U; x ^= z ^ (z << 3); z = (x ^ (x >> 7)) & 0b00000001111110000000000000111111U; x ^= z ^ (z << 7); z = (x ^ (x >> 19)) & 0b00000000000000000001111111111111U; x ^= z ^ (z << 19); return x; } Now we have seen 4 possible variants (different sets of shifts), there are at most 29 variants, with possible shift sets 1: [ 1 2 7 21 ] 2: [ 1 2 8 20 ] 3: [ 1 3 6 21 ] 4: [ 1 3 7 20 ] 5: [ 1 3 8 19 ] 6: [ 1 3 9 18 ] 7: [ 1 3 10 17 ] 8: [ 1 4 6 20 ] 9: [ 1 4 7 19 ] 10: [ 1 4 8 18 ] 11: [ 1 4 10 16 ] 12: [ 1 4 12 14 ] 13: [ 2 3 6 20 ] 14: [ 2 3 7 19 ] 15: [ 2 3 8 18 ] 16: [ 2 3 9 17 ] 17: [ 2 3 10 16 ] 18: [ 2 3 12 14 ] 19: [ 2 4 5 20 ] 20: [ 2 4 7 18 ] 21: [ 2 4 10 15 ] 22: [ 2 4 11 14 ] 23: [ 2 4 12 13 ] 24: [ 2 5 6 18 ] 25: [ 2 6 7 16 ] 26: [ 2 6 8 15 ] 27: [ 2 6 9 14 ] 28: [ 2 6 10 13 ] 29: [ 2 6 11 12 ] (I have checked that some of them cannot work). No such routine reversing 64 bits in 4 steps exists. For 5 steps there are at most 481 variants. For 16 bits in 3 steps the possible shift sets are 1: [ 1 4 10 ] 2: [ 2 3 10 ] 3: [ 2 6 7 ] But (modulo mistake) none of these work, so we apparently need 4 steps. Now I see that http://oeis.org/A125173 list all my (polished) values for the minumum number of steps. Note the comment there. If anyone can access D. E. Knuth, Problem 11264, Amer. Math. Monthly 114(2007)77. I'd be grateful for a copy. The JSTOR link is http://www.jstor.org/stable/i27642112 Best, jj