Linked by Thom Holwerda on Wed 23rd May 2007 23:45 UTC, submitted by Austin
Linux "The one 'hole' in my workflow has been OCR. For years, people have been able to scan a document and have it converted into real text. One of my old printers even came with OCR software included - for Windows of course. But when I've really needed OCR, I've just assumed that there were no high quality packages available for Linux. Recently I decided to find out for myself (a complete OCR virgin) what is available, how to use it, and what the results are like. I installed every free OCR package I could find, and systematically tested them. They all work very differently, so I tried to design a simple test for my specific needs."
Permalink for comment 243102
To read all comments associated with this story, please click here.
RE: Octopus
by shadow303 on Fri 25th May 2007 15:44 UTC in reply to "Octopus"
shadow303
Member since:
2005-06-29

99% accuracy would be fine for the postal service. Postal OCR systems have an advantage in that there is a lot that can be done with contextual analysis (addresses are generally in a known format and there are databases which contain all of the addresses). Postal systems are always trying to find a balance between speed and accuracy in order to correctly process the most mail per unit of time.

I can't help but wonder if some of the DPI performance is based on some size constraints in the code. From the review, it doesn't appear that few (if any) of the OCR engines know what the DPI is, so a higher DPI would make the characters seem larger. Incidently, many of the cameras used in postal applications spit out images at 212 DPI.

Edited 2007-05-25 15:45

Reply Parent Score: 1