[math-fun] Comments on "number wall" paper by Lunnon, and fast algorithms

14 Mar 2014

      W. F. Lunnon : The Number-Wall Algorithm: an LFSR Cookbook,
Journal of Integer Sequences 4 (2001) Article 01.1.1
https://cs.uwaterloo.ca/journals/JIS/VOL4/LUNNON/numbwall10.html

J.H.Conway, R.K.Guy: The Book of Numbers, Springer 1996, see pages 85-89.

A 2D square grid of numbers where any stencil of 5 adjacent numbers
 ...N...
 W..X..E     obeys the relation X*X = N*S+E*W
 ...S...

is a "number wall."  These authors mainly are interested in the
"mainstream case"
where the wall contains a row of all 1s then the row immediately below
is some (bi-infinite) integer sequence, then all wall entries below
these can be completed using the stencil relation and will
automatically be all-integer.  (All entries above the all-1-row are
automatically 0.)

Except: actually sometimes the stencil relation tells you to divide 0/0 in which
case it is not clear how to fill in that entry of the number wall.
Lunnon defines
all entries of the number wall as certain Toeplitz determinants -- in fact the
stencil relation is exactly Lewis Carroll's determinant method applied
to Toeplitz
matrices. Lunnon then proves, he says for the first time, the
fundamental theorems about how to fill in all number wall entries
efficiently even in the 0/0 cases.  Remarkably,
0 entries in mainstream number walls always form gXg square "windows"
surrounded by 4g+4 nonzero entries forming the "inner rim" then 4g+12 entries
forming the  "outer rim" of the window.  The 4 inner rim sides each
automatically are geometric sequences.  Lunnon gives fundamental
theorems on page 11 relating inner and outer rim entries.  On page 16
he explains how to use them to calculate a mainstream number wall.
On page 23 he is somewhat confusing to the poor reader about
computational complexity.  To unconfuse:
    The stuff on page 16 allows one to calculate mainstream number walls
row by row going downward, where the algorithm stores the most recent 3 rows;
and also for each block of 0s in a row, it stores/knows which row of
their window that block
is.  Then the calculation will compute an NxN chunk of the number wall in
O(N^2) arithmetic ops and time, and using O(N) storage.   This is optimal for
any algorithm that computes the entire wall (not skipping any entries)
and which has
to store the input (top row).

Other constructs which are equivalent modulo disguises to Lunnon's number wall
include Heinz Rutishauser 1954's "quotient-difference algorithm,"  Peter Wynn's
"epsilon algorithm" (and "rho algorithm"?),  Shanks' sequence
transformation, Aitken's delta-squared process, and values of "Pade
tables" at specific x.   Which of these is the "best" formulation is
not immediately obvious but Lunnon's formulation has the advantages of
preserving integrality and of obeying the amazing "window" laws.

If the seed sequence obeys a linear recurrence of order k (e.g. the
Fibonacci numbers
obey one with k=2) then the number wall automatically will have exactly k+1
nonzero rows.   The bottommost of them will automatically
be a geometric sequence.   The recurrence-defining polynomial J(x)
can be computed using number-wall techniques but using polynomials as
entries, not numbers (Lunnon shows) which is a rather inefficient way
to find J(x), but it works.
However, I point out the constant term J(0) in the scaled-to-be-monic
version of J(x)
can be computed faster; it is exactly (-1)^(k+1) times the growth
ratio of the geometric sequence in the bottommost nonzero row of the
number wall.

Now if one wants to compute certain entries of the number wall but NOT
all of them
(skipping many entries is ok) then considerably faster algorithms than
those Lunnon discusses, become possible.   If you want to compute one
entry m down in the number
wall, you do not need m*m operations, you can do it using "ultrafast
Toeplitz determinant" algorithms in  O(m*log(m)^2)  ops.   And the
entire J(x) polynomial (if degree=m) can be computed in this time
bound e.g.

R. P. Brent, F. G. Gustavson and D. Y. Y. Yun: Fast solution of Toeplitz
systems of equations and computation of Pade approximants,
J. Algorithms 1 (1980) 259-295
R. R. Bitmead and B. D. O. Anderson: Asymptotically fast solution of
Toeplitz and related systems of linear equations,
 J. Linear Algebra Appl. 34 (1980) 103-116.
Frank R. de Hoog: A new algorithm for solving Toeplitz systems of equations,
J. Linear Algebra Appl. 88/89 (1987) 122-138.
Victor Y. Pan: Structured Matrices and Polynomials: Unified Superfast
Algorithms,
Birkhauser 2001.

These results of course are optimal except for log factors.

It also is of interest to "skip ahead" in number walls, not only in
the South direction, but also in the East direction.  (Or West.) That
is because Lunnon page 31 says the roots X[m] of the
recurrence-defining-polynomial J(x)
if all real and decreasing (i.e. iff all have distinct magnitudes) are
given by
    X[m] = lim(n-->infinity)  W[m, n+1] * W[m-1, n] / (W[m-1, n+1] * W[m,n])
where W[down, across] are the number-wall entries.  If there are two
roots X[m] and X[m+1]
forming a complex conjugate pair then they will be roots of the quadratic
   W[m,n]/W[m-1,n]  * x^2 - W[m,n+1]/W[m-1,n+1] * x + W[m,n+2]/W[m-1,n+2] = 0.

An excellent application of this fact is a way to compute
approximations to the roots of a given degree-k polynomial by using
the polynomial to define a recurrence, using recurrence to compute
seed sequence out far far to the East, then finally computing number
wall entries going up to k South from there.   The speed of
convergence behaves roughly like |error| proportional to C^n where
  C=max(i,j with |X[i]|<|X[j]|)of  |X[i]/X[j]|  and  n=how far East you walk.
This is poor speed.   However, if we were to make n=2^t  be
exponentially large, then the error would be like C^(2^t)   which
would be excellent speed ("quadratic convergence"
as opposed to mere "linear convergence").

This can be accomplished via the following "fast skip ahead East" method.
The linear recurrence defined by the input polynomial can be regarded as its
kXk "companion matrix."  Compute the (2^t)th power of that matrix
using t consecutive
matrix squarings, then use it to compute entries  2^t, 1+2^t, ...,
2*k+2^t  of the sequence
from its k starting entries.  Now compute chunk of size about kXk of
the number wall going South using Lunnon's usual method.  In this way
we get approximations to all the roots (and/or quadratic factors) of
the input polynomial which have error of order  C^(2^t),  and the work
(number of arithmetic operations) required is
    O( t * MatMult(k) + k*k )
i.e polynomial in k,t  operations, and convergence is guaranteed.
MatMult(k)=O(k^2.376).

This is not that bad a rootfinding method, but I think it can be
improved further, e.g.
I think the matrix powering can be replaced by modular polynomial
powering which would speed it up to   O(t*PolyMult(k) + k*k)
operations;  PolyMult(k) = O(k*log(k)).

Wynn's epsilon algorithm and rho algorithm are intended to extrapolate
sequences to infinity under certain assumptions about their behavior
(thus enabling, e.g.
acceleration of series).  The whole acceleration process for N terms
of the sequence
as input could be run, I would now claim, in O(N*log(N)^2) operations
(via fast Toeplitz), as opposed to Wynn's processes based on
number-wall-like tableaus, which had required order N^2.     For
example,
  http://en.wikipedia.org/wiki/Shanks_transformation
shows the "Shanks transformation" acceleration method arises from
solving NxN Toeplitz linear system, and E.J.Weninger in his review
paper
         http://arxiv.org/abs/math.NA/0306302v1
discusses many other acceleration methods, which although Weniger
never mentions it can also be done "ultrafast" using structured-matrix
and/or "fast wall" techniques
e.g. Levin's transform and the new transform in Weniger section 8.2
can be done using fast high order differencing via fast operator
powering, and various other accelerators
(Romberg integrator extrapolation, RIchardson extrapolation...) can be
done fast using fast polynomial interpolation and multipoint
evaluation algorithms.  I'm not currently
sure whether Brezinski's "theta algorithm" can be done ultrafast.

These realizations about fast series/sequence
acceleration/extrapolation, which I have not seen mentioned before
(there are several books by Claude Brezinski on this topic -- do they
mention ultrafast algorithms?), may have wide computational complexity
implications:
basically, slow-converging series & stuff, ARE fast-converging in the sense you
can compute N terms and accelerate in O(N*log(N)^2) steps.

In all of these arithmetic-op-counts I have ignored issues about
inexact arithmetic.

-- 
Warren D. Smith
http://RangeVoting.org  <-- add your endorsement (by clicking
"endorse" as 1st step)

[math-fun] Comments on "number wall" paper by Lunnon, and fast algorithms

Warren D Smith