Re: [math-fun] Prime question
At 03:39 AM 11/2/2007, Joshua Zucker wrote:
http://mathworld.wolfram.com/CunninghamChain.html and http://hjem.get2net.dk/jka/math/Cunningham_Chain_records.htm
might be a good enough starting point for learning about these things.
Enjoy, --Joshua Zucker
On the upper right hand corner of this page, there is a link called "Download PDF 9.5M". I clicked on it, and downloaded what appears to be the entire book of 308 pages. You can also click on "View plain text", which will give you some idea of how the character recognition program is working (used for indexing). Since algebraic equation recognition isn't doing so well, yet, the equations get trashed. It might be nice to be able to download both the pdf and the ascii text, so that you can search the book yourself, as most books have completely useless indices (the major exception being Knuth's). http://books.google.com/books?id=aC0PAAAAIAAJ&printsec=frontcover&dq=intitle... Good luck!
I'm amazed that Google is not using the "text behind image" feature of PDF files to handle this. Modern OCR programs produce text behind the page image, which is then searchable and selectable for cut and paste. I wonder why this obviously useful idea is not used in their scanning and OCR. I guess they want to be the only people who can search and index the pages. On Nov 2, 2007, at 9:27 AM, Henry Baker wrote:
At 03:39 AM 11/2/2007, Joshua Zucker wrote:
http://mathworld.wolfram.com/CunninghamChain.html and http://hjem.get2net.dk/jka/math/Cunningham_Chain_records.htm
might be a good enough starting point for learning about these things.
Enjoy, --Joshua Zucker
On the upper right hand corner of this page, there is a link called "Download PDF 9.5M". I clicked on it, and downloaded what appears to be the entire book of 308 pages. You can also click on "View plain text", which will give you some idea of how the character recognition program is working (used for indexing). Since algebraic equation recognition isn't doing so well, yet, the equations get trashed. It might be nice to be able to download both the pdf and the ascii text, so that you can search the book yourself, as most books have completely useless indices (the major exception being Knuth's).
http://books.google.com/books? id=aC0PAAAAIAAJ&printsec=frontcover&dq=intitle:quaternions+inauthor: tait&as_brr=0
Good luck!
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
I tried searching the downloaded PDF document for Tait's Quaternions, but nothing was there. So it appears that Google isn't using "text behind image", at least not for this book. However, since both the ascii text & PDF images are available, someone could easily automate the task of creating such PDF/text files -- even someone outside of Google. Mike & Michael: are you listening? At 07:52 AM 11/2/2007, Tom Knight wrote:
I'm amazed that Google is not using the "text behind image" feature of PDF files to handle this. Modern OCR programs produce text behind the page image, which is then searchable and selectable for cut and paste. I wonder why this obviously useful idea is not used in their scanning and OCR. I guess they want to be the only people who can search and index the pages.
On Nov 2, 2007, at 9:27 AM, Henry Baker wrote:
At 03:39 AM 11/2/2007, Joshua Zucker wrote:
http://mathworld.wolfram.com/CunninghamChain.html and http://hjem.get2net.dk/jka/math/Cunningham_Chain_records.htm
might be a good enough starting point for learning about these things.
Enjoy, --Joshua Zucker
On the upper right hand corner of this page, there is a link called "Download PDF 9.5M". I clicked on it, and downloaded what appears to be the entire book of 308 pages. You can also click on "View plain text", which will give you some idea of how the character recognition program is working (used for indexing). Since algebraic equation recognition isn't doing so well, yet, the equations get trashed. It might be nice to be able to download both the pdf and the ascii text, so that you can search the book yourself, as most books have completely useless indices (the major exception being Knuth's).
http://books.google.com/books? id=aC0PAAAAIAAJ&printsec=frontcover&dq=intitle:quaternions+inauthor: tait&as_brr=0
Good luck!
On 11/2/07, Tom Knight <tk@csail.mit.edu> wrote:
I'm amazed that Google is not using the "text behind image" feature of PDF files to handle this. Modern OCR programs produce text behind the page image, which is then searchable and selectable for cut and paste. I wonder why this obviously useful idea is not used in their scanning and OCR. I guess they want to be the only people who can search and index the pages.
I applaud Google's efforts to digitize old books and have been downloading some occasionally for quite a while now. Unless my memory is seriously flawed (quite possible :-( ), my early downloaded pdf files were searchable, but in the last 6 months or so none have been. This makes me think Google has changed policy on this, perhaps for copyright liability reasons. In any case, this greatly diminishes the usefulness for me of these books. Since the online versions are searchable it is clear that OCR has been done on them, but is not made available in the downloads. I have been somewhat disappointed in quality as well, with missing pages and illegible pages quite common. This seems to be true across the board both in old math books and others. If Google employees or anyone can shed more light on these matters I would be interested to know what is going on. Jim
On Fri, 2 Nov 2007, James Buddenhagen wrote:
On 11/2/07, Tom Knight <tk@csail.mit.edu> wrote:
I'm amazed that Google is not using the "text behind image" feature of PDF files to handle this. Modern OCR programs produce text behind the page image, which is then searchable and selectable for cut and paste. I wonder why this obviously useful idea is not used in their scanning and OCR. I guess they want to be the only people who can search and index the pages.
I couple of book search guys work in my building. I'll ask what's up. -J
I checked some documents downloaded from ACM's Digital Library. The ones I checked were searchable, so they must have text behind them somewhere. Interestingly, I looked at the PDF document's "properties", and nothing was said about whether there was text behind or not. Which PDF programs produce searchable text? Does the standard dvips/gsview program produce searchable PDF ? Is there a recommended way to go from TeX to searchable PDF? At 10:04 AM 11/2/2007, Jason wrote:
On Fri, 2 Nov 2007, James Buddenhagen wrote:
On 11/2/07, Tom Knight <tk@csail.mit.edu> wrote:
I'm amazed that Google is not using the "text behind image" feature of PDF files to handle this. Modern OCR programs produce text behind the page image, which is then searchable and selectable for cut and paste. I wonder why this obviously useful idea is not used in their scanning and OCR. I guess they want to be the only people who can search and index the pages.
I couple of book search guys work in my building. I'll ask what's up.
-J
I compile TeX directly to PDF using free software called TeXShop --- I've found this system robust and reliable, if lacking fancy facilities such as inline WYSIWIG availalble on some other (PC) systems. It runs on pretty much any available harware/software, and the output produced is searchable by other PDF viewers (I tried). WFL On 11/2/07, Henry Baker <hbaker1@pipeline.com> wrote:
I checked some documents downloaded from ACM's Digital Library. The ones I checked were searchable, so they must have text behind them somewhere. Interestingly, I looked at the PDF document's "properties", and nothing was said about whether there was text behind or not.
Which PDF programs produce searchable text? Does the standard dvips/gsview program produce searchable PDF ? Is there a recommended way to go from TeX to searchable PDF?
At 10:04 AM 11/2/2007, Jason wrote:
On Fri, 2 Nov 2007, James Buddenhagen wrote:
On 11/2/07, Tom Knight <tk@csail.mit.edu> wrote:
I'm amazed that Google is not using the "text behind image" feature of PDF files to handle this. Modern OCR programs produce text behind the page image, which is then searchable and selectable for cut and paste. I wonder why this obviously useful idea is not used in their scanning and OCR. I guess they want to be the only people who can search and index the pages.
I couple of book search guys work in my building. I'll ask what's up.
-J
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
When I need a PDF for math I write the paper in Word (and MS Equation) and use BullZip Printer to make a PDF file. I use Tex only as a very last resort because I can't remember how to use it for equations, graphs, etc. from one use to the next, and I dislike the fact that user-visible embedded codes should have disappeared 25 years ago. BullZip seems to work well for what I've used it for so far, and it's about $20. I am ready to be called an iconoclastic crank for this opinion. Steve Gray Fred lunnon wrote:
I compile TeX directly to PDF using free software called TeXShop --- I've found this system robust and reliable, if lacking fancy facilities such as inline WYSIWIG availalble on some other (PC) systems. It runs on pretty much any available harware/software, and the output produced is searchable by other PDF viewers (I tried).
WFL
I, also, do not use Tex. I convert MS word files (including equations) to pdf using a free utility, PrimoPdf. It installs as a printer, so any file that has a file/print menu option can convert its file's contents to pdf: just select PrimoPdf as your "printer". You can download it from http://www.primopdf.com/ Bob --- Steve Gray wrote:
When I need a PDF for math I write the paper in Word (and MS Equation) and use BullZip Printer to make a PDF file. I use Tex only as a very last resort because I can't remember how to use it for equations, graphs, etc. from one use to the next, and I dislike the fact that user-visible embedded codes should have disappeared 25 years ago. BullZip seems to work well for what I've used it for so far, and it's about $20. I am ready to be called an iconoclastic crank for this opinion.
Steve Gray
Fred lunnon wrote:
I compile TeX directly to PDF using free software called TeXShop --- I've found this system robust and reliable, if lacking fancy facilities such as inline WYSIWIG availalble on some other (PC) systems. It runs on pretty much any available harware/software, and the output produced is searchable by other PDF viewers (I tried).
WFL
_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun
James Buddenhagen wrote:
...This makes me think Google has changed policy on this, perhaps for copyright liability reasons. ... If Google employees or anyone can shed more light on these matters I would be interested to know what is going on.
I'm afraid we're into "I can neither confirm nor deny" territory here, so I'm going to keep my mouth shut. Maybe when I've been here longer I'll know enough to be allowed to say more.
In any case, this greatly diminishes the usefulness for me of these books. Since the online versions are searchable it is clear that OCR has been done on them, but is not made available in the downloads.
I'll just point out that the full text is often downloadable from the same page as the PDF of the images. --Michael Kleber -- It is very dark and after 2000. If you continue you are likely to be eaten by a bleen.
participants (8)
-
Fred lunnon -
Henry Baker -
James Buddenhagen -
Jason -
Michael Kleber -
Robert Baillie -
Steve Gray -
Tom Knight