Quality Analysis of Optical Character Recognition of Hindi Language Approaches

Chen Kim Lim; Omar Hamid Flayyih

doi:10.32595/iirjet.org/v5i2.2019.101

Published Jun 8, 2022

https://doi.org/10.32595/iirjet.org/v5i2.2019.101

Download

PDF

Statistic

Vol. 5 No. 2 (2019)

Chen Kim Lim

Omar Hamid Flayyih

Abstract

Optical Character Recognition (OCR) is the digital conversion from a scanned document or a photo into machine encoded text of images of typed, handwritten or printed text. Typically, OCR engines are designed and used to read typed (machine-printed) characters of popular languages such as English, which are usually used as a primary communication mode. While Hindi language is a preferred communication medium in many parts of India, in the domain of Hindi character recognition, much work has not been carried out. The aim of this paper is to investigate the OCR principles and to compare the effectiveness of the functionality of character recognition and its subsequent electronic conversion to Hindi language text. This work involves the use of the MSER algorithm (Maximally Stable Extreme Regions) and various additional pre-processing techniques to improve the performance of OCR for Hindi in the MATLAB environment. Comparison of the results of this work with an existing open-source Tesser act OCR engine.

How to Cite

Chen Kim Lim, & Omar Hamid Flayyih. (2022). Quality Analysis of Optical Character Recognition of Hindi Language Approaches. IIRJET, 5(2). https://doi.org/10.32595/iirjet.org/v5i2.2019.101

About Journal

##plugins.themes.academic_pro.article.sidebar##

##plugins.themes.academic_pro.article.main##

Abstract

##plugins.themes.academic_pro.article.details##