Google has announced the release of the source of an old OCR software called Tesseract in source. “In a nutshell, we are all about making information available to users, and when this information is in a paper document, OCR is the process by which we can convert the pages of this document into text that can then be used for indexing.”
On the surface, this seems like a very nice contribution, that could be useful in lots of other applications.
But Sourceforge doesn’t have anything listed under “License”, so hopefully, it’ll get sorted out.
It’s licensed under the Apache License 2.0. See http://tesseract-ocr.cvs.sourceforge.net/tesseract-ocr/tesseract/RE…
Google has stated before they want to organise the world’s information, and now they gain mindshare with people who would add substantive globs of information to the internet.
…I haven’t had a chance to look at the potential of the code here, but could this be leveraged to provide another string to the bow of desktop search? OCR of images (png, jpg, gif etc) by Beagle would be fantastic – picking out signs and all sorts of text would be a sweet feature!
get people to ocr their own works, then google can help u share it with he world 🙂
Nice to know there’s a big company that knows what FOSS is all about.
The Chinese like them too.
“Nice to know there’s a big company that knows what FOSS is all about.”
Really? I thought this article was about Google?
I’m not aware of any other.
I dunno…
Synaptic reveals: Clara, gocr, and ocrad
but I have all of the repositories enabled… so I’m not sure how free they are…
“The University of Nevada in Las Vegas”.
::headdesk::
It’s The University of Nevada, Las Vegas.
—
Apache 2.0 license … interesting. It’s nearly as flexible as the BSD license in terms of what it permits.
—
And in the meantime, I’m interested in seeing who grabs the technology and runs with it and what interesting projects it spawns.
Edited 2006-09-01 21:16