I'm amazed that Google is not using the "text behind image" feature of PDF files to handle this. Modern OCR programs produce text behind the page image, which is then searchable and selectable for cut and paste. I wonder why this obviously useful idea is not used in their scanning and OCR. I guess they want to be the only people who can search and index the pages. On Nov 2, 2007, at 9:27 AM, Henry Baker wrote:
At 03:39 AM 11/2/2007, Joshua Zucker wrote:
http://mathworld.wolfram.com/CunninghamChain.html and http://hjem.get2net.dk/jka/math/Cunningham_Chain_records.htm
might be a good enough starting point for learning about these things.
Enjoy, --Joshua Zucker
On the upper right hand corner of this page, there is a link called "Download PDF 9.5M". I clicked on it, and downloaded what appears to be the entire book of 308 pages. You can also click on "View plain text", which will give you some idea of how the character recognition program is working (used for indexing). Since algebraic equation recognition isn't doing so well, yet, the equations get trashed. It might be nice to be able to download both the pdf and the ascii text, so that you can search the book yourself, as most books have completely useless indices (the major exception being Knuth's).
http://books.google.com/books? id=aC0PAAAAIAAJ&printsec=frontcover&dq=intitle:quaternions+inauthor: tait&as_brr=0
Good luck!
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun