
"The one 'hole' in my workflow has been OCR. For years, people have been able to scan a document and have it converted into real text. One of my old printers even came with OCR software included - for Windows of course. But when I've really needed OCR, I've just assumed that there were no high quality packages available for Linux. Recently I decided to find out for myself (a complete OCR virgin) what is available, how to use it, and what the results are like. I installed every free OCR package I could find, and
systematically tested them. They all work very differently, so I tried to design a simple test for my specific needs."
Member since:
2005-08-06
But like someone said earlier, it's more of a Man Vs Machine fight than Winvs*Nix fight.
I've been looking for proper software for converting images to text (bulk) and there simply isn't anything free which is relevant.
Now here how this should be done for those interested.
First of all, Neural networks and similar techniques can "teach" your box to handle handwritten stuff etc. So anyone building an OCR software has to be really good in this expertise. Secondly, a "training" part is obviously a need for these kind of softwares in order to get better.
There used to be a proprietary software called Eyes on Hands which has some really good features (Cost: 10 000$+).
For instance, when scanning plenty of docs with numbers, you set up scan fields and say "here we'll have a number as input". Then it matches numbers or characters against what it thinks it is and lists all of them in a logic order in columns. Saying something like "We believe these are 9's in descending order based on likelyhood". Then you just look at it's interpretation and can easily correct what it has done wrong. By using AI/NEural NEts in the software, it can actually then get better at interpreting.
HOwever, as with any software.. I seem to have the experience that any "Niche software" which isn't of great value to sysadmins have very few OSS counterparts I'm afraid. Which is simply just sad =(.