Quality Analysis of Optical Character Recognition of Hindi Language Approaches

Author : Chen Kim LimOmar Hamid Flayyih

Abstract

Optical Character Recognition (OCR) is the digital conversion from a scanned document or a photo into machine encoded text of images of typed, handwritten or printed text. Typically, OCR engines are designed and used to read typed (machine-printed) characters of popular languages such as English, which are usually used as a primary communication mode. While Hindi language is a preferred communication medium in many parts of India, in the domain of Hindi character recognition, much work has not been carried out. The aim of this paper is to investigate the OCR principles and to compare the effectiveness of the functionality of character recognition and its subsequent electronic conversion to Hindi language text. This work involves the use of the MSER algorithm (Maximally Stable Extreme Regions) and various additional pre-processing techniques to improve the performance of OCR for Hindi in the MATLAB environment. Comparison of the results of this work with an existing open-source Tesser act OCR engine

Keywords : Recognition of optical character (OCR), maximum stable extreme regions (MSER), accuracy of character (CA), error rate of character (CER).

Volume 5 | Issue 2

DOI :

Download PDF