2 Nov
2007
2 Nov
'07
11:04 a.m.
On Fri, 2 Nov 2007, James Buddenhagen wrote:
On 11/2/07, Tom Knight <tk@csail.mit.edu> wrote:
I'm amazed that Google is not using the "text behind image" feature of PDF files to handle this. Modern OCR programs produce text behind the page image, which is then searchable and selectable for cut and paste. I wonder why this obviously useful idea is not used in their scanning and OCR. I guess they want to be the only people who can search and index the pages.
I couple of book search guys work in my building. I'll ask what's up. -J