Re: [math-fun] Prime question

2 Nov 2007

      On 11/2/07, Tom Knight <tk@csail.mit.edu> wrote:
...
I'm amazed that Google is not using the "text behind image" feature of
PDF files to handle this.  Modern OCR programs produce text behind the
page image, which is then searchable and selectable for cut and paste.
I wonder why this obviously useful idea is not used in their scanning
and OCR.  I guess they want to be the only people who can search and
index the pages.
I applaud Google's efforts to digitize old books and have been downloading
some occasionally for quite a while now.  Unless my memory is seriously
flawed (quite possible :-( ), my early downloaded pdf files were searchable,
but in the last 6 months or so none have been.  This makes me think
Google has changed policy on this, perhaps for copyright liability reasons.
In any case, this greatly diminishes the usefulness for me of these books.
Since the online versions are searchable it is clear that OCR has been
done on them, but is not made available in the downloads.

I have been somewhat disappointed in quality as well, with missing pages
and illegible pages quite common.  This seems to be true across the board
both in old math books and others.

If Google employees or anyone can shed more light on these matters
I would be interested to know what is going on.

Jim

Re: [math-fun] Prime question

James Buddenhagen