Optical Character Recognition (OCR) is as old as computers are, there have been numerous techniques to extract text out of digital images. All the way from predefined old-school image processing to deep learning on a cloud. In this article, we look at open source frame tesseract and custom trained model to see how we can perform OCR with high accuracy and Non-English languages.
There is no one size fit all solution when it comes to languages. With a diverse set of languages that we have, it becomes difficult to build a classifier which can identify and extract different characters. Some languages are written right to left and some have similar looking character set.
Tesseract is an open source text recognizer engine which allows you to build models using LSTM deep learning for any language. There are a decent set of models available which are ready to use. I think Tesseract is a good framework to get your hands dirty.
I’ve come across few services which use tesseract to prove high accuracy image to text for Indic Languages, check out https://www.ocrnow.com.
It’s a free service which supports the wide range of languages to convert images to text. It supports the following languages :
Haitian; Haitian Creole
Romanian; Moldavian; Moldovan
Slovak – Fraktur
Chinese – Simplified
Chinese – Traditional
APIs are written in Go-Lang and data processing is done using Python and Tensorflow.
Feel free to leave a comment in case you want me to focus on any specific area.
There are things people thought would not be possible, things like recognization person using how they walk using standard surveillance cameras, building a key logger which can identify keystrokes using just a mic and pretrained keyboard sound data.
It don’t seem to be difficult to crack things like fingerprint scannor or mechanisms like FaceID with machine learning and data science.
NYU Tandon Researchers Create Synthetic Fingerprints Capable of Spoofing Smartphone Fingerprint Sensors New York University logo Fingerprint authentication systems are a widely trusted, ubiquitous form of biometric authentication, deployed on billions of smartphones and other devices worldwide. Yet a new study from New York University Tandon School of Engineering […]
Google Cloud is launching two new tools to help customers design machine learning algorithms (Image monsitj / iStockPhoto) Google Cloud is launching two new tools to help customers design, launch and keep track of their machine learning algorithms. Following on from the release of its pre-packaged machine learning use […]
The choice of a Gaussian can be justified by the principle of maximum entropy , which suggests that if the true distribution is unknown, one should use the distribution with the greatest entropy among those which satisfy whatever constraints we wish to impose. This essentially makes the fewest possible […]
Qyresearchreports include new market research report “Global Cloud Machine Learning Market Size, Status and Forecast 2018-2025” to its huge collection of research reports. Machine learning is a field of artificial intelligence that uses statistical techniques to give computer systems the ability to “learn” (e.g., progressively improve performance on a […]
Data Science, Machine Learning, Deep Learning, and Artificial Intelligence are really hot at this moment and offering a lucrative career to programmers with high pay and exciting work. It’s a great opportunity for programmers who are willing to learn these new skills and upgrade themselves. It’s also important from […]
More and more companies demand data scientists to apply statistical methods and technical computing tools in their design processes. Perhaps the solution for the shortage lies in the engineering community, writes Stéphane Marouani, Country Manager at MathWorks Australia. According to Deloitte, the number of data science workers in Australia […]
Say No to Terrorism For years, content that promotes terrorism has thrived on social media platforms like Facebook and Twitter. Fighting it is an uphill battle that has forced tech companies to open war rooms and hire new specialists . One solution that companies including Facebook are now betting […]
In 2012, HBR dubbed data scientist “the sexiest job of the 21st century ”. It is also, arguably, the vaguest. To hire the right people for the right roles, it’s important to distinguish between different types of data scientist. There are plenty of different distinctions that one can draw, […]
As more and more automation come in our life, it has become important to address human bias in machine learning (as a result on data on which we train these systems). It may not seem significant for some use cases however things like content filtering, there can be huge impact due to bias in ML system.
At this moment in history it’s impossible not to see the problems that arise from human bias. Now magnify that by compute and you start to get a sense for just how dangerous human bias via machine learning can be. The damage can be twofold: Influence . If the […]