Abstract
Multimedia data has increased rapidly in recent years. Textual information present in multimedia contains important information about the image/video content. The proposed method provides very efficient way to extract text from Born-Digital images. Firstly, edges are extracted from a grayscale image. New edge detection technique is introduced in this research, which gives better results for low-contrast web images. Then morphological operators are applied on the image. These operators are used to connect the broken edges of objects present in an image. Each object is classified as text or non-text on the basis of text features such as size, height to width ratio, and binary transitions, using K-Means clustering. Two new features, namely horizontal fluctuation count and vertical fluctuation count, are introduced in the proposed work. Dataset of International Conference on Document Analysis and Recognition 2011 Robust Reading Competition, Challenge 1: “Reading Text in Born-Digital Images (Web and Email)” is used in this research. The proposed method performed best in the above-mentioned competition.
Additional information
Notes on contributors
Samabia Tehsin
Samabia Tehsin is a PhD Scholar at MCS, NUST. She did his MS Software Engineering from NUST in 2007. Her areas of research are digital image processing, computer vision and document analysis. E-mail: [email protected]
Asif Masood
Asif Masood did his BE in software Engineering from Military College of Signals (MCS), NUST in 1999. He completed his MS and PhD in Computer Science from University of Engineering and Technology Lahore in 2007. Currently, He is working as head of Information Security Department at MCS, NUST. E-mail: [email protected]
Sumaira Kausar
Sumaira Kausar is a PhD Scholar at CEME NUST. Her research interests are digital image processing, computer vision and machine learning. E-mail: [email protected]
Younus Javed
Younus Javed He is a Dean computer Engineering at CEME, NUST. His research areas are algorithms, adaptive and predictive modelling, digital image processing, operating systems and encoding/decoding systems, parallel processing. E-mail: [email protected]