ABSTRACT
Nowadays, in document analysis, optical character recognition (OCR) finds application in the exhaustive searching processes within memory hard drives and web. In OCR-related applications, separation between machine-printed text and handwritten text is an important task because recognition methodologies for both types of texts are different. Therefore, to increase the overall efficiency, separation between printed and handwritten texts is necessary. The texts separation is challenging due to the complexity involved in the structural layouts of various scripts. In this paper, a new algorithm is proposed for the separation of printed and handwritten texts using correlation coefficients and probabilities-based moments’ features. These features are computed using the combination of two-dimensional discrete wavelet transform and semi decimated discrete wavelet transform. Wavelet transform has the capability of extracting information at different resolutions, which helps in the formation of significant features for characterizing a texture. Here, noise is also considered as one of the classes because noise is generally present in the document images. At last, a set of binary support vector machine classifiers is employed as a decision scheme to identify machine-printed text, handwritten text, and noise. Comprehensive experiments are performed on Tobacco-800, IFN-ENIT, and proprietary databases containing texts of different scripts. Benchmarking analysis shows that the proposed algorithm offers better performance compared to other state-of-the-art approaches, where an average identification rate of 98.02% has been obtained.
ORCID
Parul Sahare http://orcid.org/0000-0002-9342-1159
Additional information
Notes on contributors
Parul Sahare
Parul Sahare received his post-graduate degree (MTech) in VLSI DESIGN from VNIT, Nagpur, Maharastra, India in 2010. Currently, he is pursuing his PhD from VNIT, Nagpur, Maharashtra, India. He has a total of three years of academic experience. His area of interest includes signal processing, image processing, and pattern recognition.
Sanjay B. Dhok
Sanjay B Dhok is an Associate Professor in Center for VLSI & Nanotechnology at Visvesvaraya National Institute of Technology, Nagpur (India). He received his PhD in electronics engineering from VNIT Nagpur, India. He is a member of IEEE society. He has published many research papers in national and international journals and conferences. His area of interest includes signal processing, image processing, data compression, wireless sensor networks and VLSI design. Email: [email protected].