Very interesting! Thanks! I would imagine that the "zip"-type programs would do better with larger dictionaries -- especially on larger files -- so perhaps this would help to improve these numbers. Perhaps there is an additional argument to these programs to allow for larger dictionaries? (Many of these compression programs were developed in an earlier era of PC's having main memories in the megabyte range instead of the gigabyte range; so perhaps their idea of a "large" dictionary needs to be revised.)) I wonder if "zip"-type programs would do even as well as your 2-level scheme; i.e., could these programs notice the patterns of 0's when the number is divisible by 2, 3, 5 ? At 11:35 AM 5/22/2012, Tom Rokicki wrote:
Well, I do have a sieve program that outputs bitmaps, but these bitmaps represent a sequence of 30 integers with a single byte (the eight values mod 30 that are not divisible by 2, 3, or 5.)
Running this to a 1GB data file (representing all primes less than 32,212,254,720) and compressing the resulting file, I see the following sizes:
1073741824 primes.dat 798639547 primes.dat.bz2 761814389 primes.dat.gz 761814523 primes.zip
I'm a bit surprised the compressors don't do better; just using arithmetic coding and the probability of a zero or one bit should give about a 0.639 compression ratio, but the best we see above is 0.709.
On Tue, May 22, 2012 at 11:00 AM, Henry Baker <hbaker1@pipeline.com> wrote:
Has anyone analyzed the expected size of bitmaps of the primes?
Specifically, consider the set of primes less than some number N, and then represent those primes as "1" bits in a bit vector of length N.
Then compress this bit vector using common techniques such as zip, gzip, 7zip, etc.
What is the growth rate of such compressed bitmaps as a function of N? Are these schemes any better than a simple run-length encoding?
Has anyone even performed any experiments?
Thanks in advance for any info.
P.S. Yes, I know, you can encode the entire infinite bitmap in a fixed/finite number of bits, where these bits encode a Turing Machine that spits out all of the primes. That's not what I'm asking here.
-- -- http://cube20.org/ -- http://golly.sf.net/ --