Re: [math-fun] MXOR

1 Apr 2016

      ...
Gareth wrote:
As Henry Baker says, Knuth has proposed something very much
along these lines: see
http://mmix.cs.hm.edu/doc/instructions-en.html#Bit_Operations
and look in the section headed "Mix and Match". His MXOR
operation is an 8x8 matrix multiplication over GF(2), and
his MOR operation is the same with OR instead of XOR for
the addition.
It's matrix * matrix rather than matrix * vector, and the
size is chosen so that the whole matrix (rather than just
one row or column) fits in a (64-bit) machine word.
Since MXOR doesn't exist, what's the most efficient way to emulate
it using existing instructions (representing matrices as uint64s)?

I tried the obvious place http://www.hackersdelight.org/hdcode.htm
to no avail. That contains a nice implementation to transpose such
a matrix, but not to multiply matrices.

A.B^T is easier to calculate than A.B, I believe. In particular,
you can take the eight uint64s, where >>> indicates cyclic shift:

C_0 = A & B
C_1 = A & (B >>> 8)
C_2 = A & (B >>> 16)
...
C_7 = A & (B >>> 56)

and apply the following operations in parallel (using SSE or AVX):

C_i ^= (C_i >> 1)
C_i ^= (C_i >> 2)
C_i ^= (C_i >> 4)
C_i &= 1

which will result in 64 bytes, each of which contains either 0 or
1. We want to pack these bits into a single uint64, like so:

C = C_0 | (C_1 << 1) | (C_2 << 2) | ... | (C_7 << 7)

and possibly do some annoying permutation on these bits.

Just out of interest, *why* isn't MXOR in the Intel instruction
set? It doesn't seem too complicated to implement, and it's useful
both in cryptography and data analysis (see 'persistent homology').

Best wishes,

Adam P. Goucher