The first idea resembling something like the idea of OCR got developed in 1870 as a reading machine for the blind - the Optophone. This was the first step to solve a problem that sounds pretty simple: How do we get writing on paper inside a computer?
150 years of research, engineering breakthroughs and hundreds of IDP products later we were finally able to scan a receipt and have the fields be filled out - if it looked nice and friendly enough to the OCR model. Heureka.
Unfortunately for Tesseract, Abbyy and co. they suffer from the complication that documents are written by humans. And humans love to do things like:
- Stamp over the most critical data because it feels like the right spot
- Organize data in tables with four nested of levels of groupings
- Disagree on standards of data, thus abandoning any approach of standardization and simply sending their very own format around
- Add handwritten comments in their own language
- Create documents that need at least a college degree in their field to correctly understand.
This meant OCR models were basically just helpers for data scientists, handling cleanups, routings, and post-validations to get something only vaguely close to real automation at work.
continue reading on cloudsquid.substack.com
⚠️ This post links to an external website. ⚠️
If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts, subscribe use the RSS feed.