[math-fun] Smith's SMODA factorer self-destructs

6 Mar 2012

      My factoring algorithm SMODA has self-destructed to the extent it
now seems less attractive than the quadratic sieve as a factorer.

----More detailed explanation:
The problem was embarrassingly stupid and would have been noticed as soon as an
implementation was built but Andy JENNINGS saved time by spotting the
problem before that.

To review, SMODA depended on the first idea that, if there were a magic oracle
whom you could ask "O Oracle, here is B and N, now please generate for
me a set of B-smooth numbers
near to N or near to multiples of N... such that these numbers are (i)
otherwise pretty random,
(ii) come approximately as close (in terms of additive error)
to being a multiple of N as possible existencewise,
(iii) come at a small cost, e.g. polynomial in logN cost, per number found",
then factoring would be possible at speeds rivalling or exceeding the
Number Field Sieve.

The second idea was that instead of depending on magic, it was
actually possible to build
the oracle by combining a certain fairly small N-independent database
with some simple
greedy construction algorithms.  The database was hard to find, hence
still somewhat oracle-like,
but it seemed possible to find it fast enough that challenging world
records might be feasible.

Now various potential problems were pointed out, which actually did
not seem to be problems,
such as
  1) database too expensive to find?
but actually just today my computer run is ending, and has constructed
preliminary database up to 2^400 size.  Jim White & I were building
datasets of the "world's most extensive Stormer pair collection" and
it seems clear it with feasible number of computers
our current codes could reach 2^500 -- and likely a good deal further
with as-yet-unimplemented algorithms.
  2) Why not use LLL lattice reduction methods to construct database?
As far as I can tell they are an uncompetitively poor idea.

The real problem is this.  The greedoid which uses the database to
find smooth numbers
(call them S) near to N or near to multiples of N,  works in a manner
somewhat analogous to the way binary numbers can be constructed lying
arbitrarily near to any real.  One keeps
on adding smaller and smaller "correction vectors" in a certain vector
space, and proves that a certain natural vector logarithmic distance
measure keeps getting smaller, indeed
at least a factor 2 smaller each whack.

That works.  BUT, the problem is, when you make that natural
logarithmic distance smaller, that does NOT imply that the actual
difference  |S - multipleofN|   keeps getting smaller back in the
ordinary world of integer numbers, rather than the vector world.
Instead, this difference gets smaller for a while, but once  |S-N|  is
of order about squareroot(N), it almost always would start to grow
again.

Simple example:  if I had, say,
   10111 =approx= 10000   with  error=111
and I used multiplication by the rational 101/100 to try to improve
the approximation, I'd get
   1011100 =approx= 1010000   with  error=1100
which is the opposite of an improvement in the additive error
(although it does improve the *relative* error).

Improving the relative error is easy and was essentially what I was doing.
Although in principle it basically is always possible to improve the
additive error,
(e.g. if we use 271/274 not 100/101 then get 2740081 which is 81 mod 10000,
or if we use 991/1002 we get 10020001 which is 1 mod 10000)
I don't see how SMODA can gaurantee that without an unfeasibly
enormous database
far larger than I had in mind.
So the net effect is that SMODA should stop working approximately at
the point where you try to exceed the performance of the quadratic
sieve.  And it probably will be worse than QS since it is more
complicated.

This is a pity because SMODA and associated theory and data was quite big and
aesthetic looking, but now is seen to be a "castle in the air."

Is there any hope to salvage SMODA?  I'm not sure. You'd need to
invent a way to use
the database that would be far more powerful than the simple greedoids
I had in mind.
(But it would be permissible to consume a good deal more compute-time
than the greedoid
consumed.)  One way to try do that would be to find a rational number
(such as 271/274
in example above) that yields improvement (this is easy by Euclid
alg.), but doing so in
such a way its numerator was smooth (not so easy).  For example based
on the two good
rationals above  271/264 and 991/1002, and 91/92 which is also good,
we could look for smooth numbers  271a+991b+91c
in various ways such as sieving or lattice techniques.
   Another salvage attempt would be to have stronger guarantees about
the database
than those I'd had in mind.  For example, we believe based on both theoretical
models and computer evidence that very small databases based on very small prime
sets [of order only about (logN)^2  primes] should *exist* but I had given up
on trying to construct databases that small because they asymptotically seemed
much too hard to find.  If, however, we nevertheless assumed
SMODA had access to such a super-small super-good database, then SMODA
might be able to become a lot more powerful due to "cancellations"
arising from common prime factors re-occuring enough of the time.

So, salvaging SMODA may or may not be possible.  It is worth some thought.
If anybody wants to have my SMODA parts I/II paper draft/shipwrecks and/or our
the Smith+White datasets of Stormer pairs, let me know.

Warren D. Smith
http://rangevoting.org

Warren Smith

tags

participants (1)