Using Machine Learning to Recognize and Extract Text from Images.

Playing with text to image library

Optical Character Recognition (OCR) is as old as computers are, there have been numerous techniques to extract text out of digital images. All the way from predefined old-school image processing to deep learning on a cloud. In this article, we look at open source frame tesseract and custom trained model to see how we can perform OCR with high accuracy and Non-English languages.

There is no one size fit all solution when it comes to languages. With a diverse set of languages that we have, it becomes difficult to build a classifier which can identify and extract different characters. Some languages are written right to left and some have similar looking character set.

Tesseract is an open source text recognizer engine which allows you to build models using LSTM deep learning for any language. There are a decent set of models available which are ready to use. I think Tesseract is a good framework to get your hands dirty.

I’ve come across few services which use tesseract to prove high accuracy image to text for Indic Languages, check out https://www.ocrnow.com

It’s a free service which supports the wide range of languages to convert images to text. It supports the following languages :

AfrikaansKannadaOriya
AmharicGujaratiPanjabi; Punjabi
ArabicHaitian; Haitian CreolePolish
AssameseHebrewPortuguese
AzerbaijaniHindiPushto; Pashto
BelarusianCroatianRomanian; Moldavian; Moldovan
BengaliHungarianRussian
TibetanInuktitutSanskrit
BosnianIndonesianSinhala; Sinhalese
BulgarianIcelandicSlovak
Catalan; ValencianItalianSlovak – Fraktur
CebuanoJavaneseSlovenian
CzechJapaneseSpanish; Castilian
Chinese – SimplifiedKazakhYiddish
Chinese – TraditionalCentral KhmerAlbanian
CherokeeKirghiz; KyrgyzSerbian
WelshKoreanVietnamese
DanishKurdishSwahili
GermanLaoSwedish
DzongkhaLatinSyriac
EnglishLatvianTamil
EsperantoLithuanianTelugu
EstonianMalayalamTajik
BasqueMarathiTagalog
PersianMacedonianThai
FinnishMalteseTigrinya
FrenchMalayTurkish
FrankishBurmeseUighur; Uyghur
GeorgianNepaliUkrainian
IrishDutch; FlemishUrdu
GalicianNorwegianUzbek

APIs are written in Go-Lang and data processing is done using Python and Tensorflow.

Feel free to leave a comment in case you want me to focus on any specific area.

Machine Learning Masters the Fingerprint to Fool Biometric Systems

There are things people thought would not be possible, things like recognization person using how they walk using standard surveillance cameras, building a key logger which can identify keystrokes using just a mic and pretrained keyboard sound data.

It don’t seem to be difficult to crack things like fingerprint scannor or mechanisms like FaceID with machine learning and data science.

NYU Tandon Researchers Create Synthetic Fingerprints Capable of Spoofing Smartphone Fingerprint Sensors New York University logo Fingerprint authentication systems are a widely trusted, ubiquitous form of biometric authentication, deployed on billions of smartphones and other devices worldwide. Yet a new study from New York University Tandon School of Engineering […]

Google Cloud aims to simplify machine learning deployment

Google Cloud is launching two new tools to help customers design machine learning algorithms (Image monsitj / iStockPhoto) Google Cloud is launching two new tools to help customers design, launch and keep track of their machine learning algorithms. Following on from the release of its pre-packaged machine learning use […]

Why is it reasonable in Machine Learning (e.g. Generative Models in Deep Learning) to describe unknown distributions as Gaussian?

The choice of a Gaussian can be justified by the principle of maximum entropy , which suggests that if the true distribution is unknown, one should use the distribution with the greatest entropy among those which satisfy whatever constraints we wish to impose. This essentially makes the fewest possible […]

Global Cloud Machine Learning Market Expanding At a Average Growth Rate in Forecast 2018-2025

Qyresearchreports include new market research report “Global Cloud Machine Learning Market Size, Status and Forecast 2018-2025” to its huge collection of research reports. Machine learning is a field of artificial intelligence that uses statistical techniques to give computer systems the ability to “learn” (e.g., progressively improve performance on a […]

Top 10 Machine Learning, Deep Learning, and Data Science Courses for Beginners (Python and R)

Data Science, Machine Learning, Deep Learning, and Artificial Intelligence are really hot at this moment and offering a lucrative career to programmers with high pay and exciting work. It’s a great opportunity for programmers who are willing to learn these new skills and upgrade themselves. It’s also important from […]

Can engineers help fill the data scientist gap?

More and more companies demand data scientists to apply statistical methods and technical computing tools in their design processes. Perhaps the solution for the shortage lies in the engineering community, writes Stéphane Marouani, Country Manager at MathWorks Australia. According to Deloitte, the number of data science workers in Australia […]

How Facebook Flags Terrorist Content With Machine Learning

Say No to Terrorism For years, content that promotes terrorism has thrived on social media platforms like Facebook and Twitter. Fighting it is an uphill battle that has forced tech companies to open war rooms and hire new specialists . One solution that companies including Facebook are now betting […]

Three ways to avoid bias in machine learning

As more and more automation come in our life, it has become important to address human bias in machine learning (as a result on data on which we train these systems). It may not seem significant for some use cases however things like content filtering, there can be huge impact due to bias in ML system.

At this moment in history it’s impossible not to see the problems that arise from human bias. Now magnify that by compute and you start to get a sense for just how dangerous human bias via machine learning can be. The damage can be twofold: Influence . If the […]

Christian Miller Authentic Jersey