150
Views
6
CrossRef citations to date
0
Altmetric
Articles

Separation of Machine-Printed and Handwritten Texts in Noisy Documents using Wavelet Transform

ORCID Icon &
Pages 341-361 | Published online: 13 Jun 2018
 

ABSTRACT

Nowadays, in document analysis, optical character recognition (OCR) finds application in the exhaustive searching processes within memory hard drives and web. In OCR-related applications, separation between machine-printed text and handwritten text is an important task because recognition methodologies for both types of texts are different. Therefore, to increase the overall efficiency, separation between printed and handwritten texts is necessary. The texts separation is challenging due to the complexity involved in the structural layouts of various scripts. In this paper, a new algorithm is proposed for the separation of printed and handwritten texts using correlation coefficients and probabilities-based moments’ features. These features are computed using the combination of two-dimensional discrete wavelet transform and semi decimated discrete wavelet transform. Wavelet transform has the capability of extracting information at different resolutions, which helps in the formation of significant features for characterizing a texture. Here, noise is also considered as one of the classes because noise is generally present in the document images. At last, a set of binary support vector machine classifiers is employed as a decision scheme to identify machine-printed text, handwritten text, and noise. Comprehensive experiments are performed on Tobacco-800, IFN-ENIT, and proprietary databases containing texts of different scripts. Benchmarking analysis shows that the proposed algorithm offers better performance compared to other state-of-the-art approaches, where an average identification rate of 98.02% has been obtained.

Additional information

Notes on contributors

Parul Sahare

Parul Sahare received his post-graduate degree (MTech) in VLSI DESIGN from VNIT, Nagpur, Maharastra, India in 2010. Currently, he is pursuing his PhD from VNIT, Nagpur, Maharashtra, India. He has a total of three years of academic experience. His area of interest includes signal processing, image processing, and pattern recognition.

Sanjay B. Dhok

Sanjay B Dhok is an Associate Professor in Center for VLSI & Nanotechnology at Visvesvaraya National Institute of Technology, Nagpur (India). He received his PhD in electronics engineering from VNIT Nagpur, India. He is a member of IEEE society. He has published many research papers in national and international journals and conferences. His area of interest includes signal processing, image processing, data compression, wireless sensor networks and VLSI design. Email: [email protected].

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 182.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.