135
Views
0
CrossRef citations to date
0
Altmetric
Research Articles

A multiscale feature fusion method for cursive text detection in natural scene images

, , , &
Pages 302-318 | Received 07 May 2022, Accepted 08 Dec 2022, Published online: 06 Jan 2023
 

ABSTRACT

Text detection in natural images is a challenging problem due to variations in text size, aspect ratio, alignment and background complexity. This paper proposes a multiscale feature fusion convolutional neural network method to detect cursive and multi-language text in natural images. The proposed method combines VGG-16 features at multi-scales and multi-layers and creates a new convolutional feature map of shallow and deep layers. On top of convolutional feature map, a vertical text proposal generation method is used that generates fixed-size text proposals. A recurrent layer is implemented which takes the convolutional feature maps of 3×3 window as sequential input and updates the recurrent state internally in the hidden layers. The output of recurrent layer is mapped to the two fully connected layers to predict the text/non-text region proposals and bounding boxes regression. The model is evaluated on a custom-developed Urdu scene text dataset and the ICDAR-MLT17 Arabic text image dataset.

Acknowledgments

The first author is thankful to the University of New South Wales, Australia for supporting his PhD candidature with a scholarship. The first author is also thankful to the NVIDIA Corporation for donating a Quadro RTX 6000 GPU to accomplish this research work.

Disclosure statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability statement

The data and material used in this research paper are publicly available at https://data.mendeley.com/datasets/k5fz57zd9z/1.

Code availability

The custom code developed for this research paper will be available upon request to the corresponding author.

Additional information

Notes on contributors

Asghar Ali Chandio

Asghar Ali Chandio received his B.S Degree in information technology from the University of Sindh, Pakistan and M.S degree in information technology from Quaid-e-Awam University of Engineering, Science and Technology (QUEST) Pakistan in 2008 and 2014. He received his Ph.D. degree from the School of Engineering and Information Technology, University of New South Wales Canberra, Australia, in 2020. Currently, he is an Associate Professor at the Department of Information Technology, QUEST, Pakistan. In addition, he is also an associate editor of QUEST research journal. He has published more than 25 papers in National and International journals and conferences. His research interests include machine learning, deep learning, handwritten text recognition, text extraction in natural scene images, document analysis, and semantic text similarity matching.

Mehwish Leghari

Mehwish Leghari received her B.S degree in information technology from the University of Sindh, Pakistan and M.S degree from QUEST, Pakistan. She received her Ph.D. degree from the Faculty of Engineering and Technology, University of Sindh, Pakistan in 2021. She is currently working as an Assistant Professor at the Department of Information Technology, QUEST, Pakistan. In addition, she is also head of Data Science Department. She has published more than 20 papers in National and International journals and conferences. Her research interests include biometrics security, multimodal biometrics systems, handwritten text recognition and language translations.

Muhammad Ali Soomro

Muhammad Ali Soomro received his B.E degree in Computer Systems Engineering from Mehran University of Engineering, Science and Technology, Pakistan. He received his Ph.D. degree from QUEST, Pakistan in 2021. Currently, he is working as an Assistant Professor at the Department of Computer Systems Engineering, QUEST, Pakistan. He has published more than 15 papers in National and International journals and conferences. His research interests include Ad Hoc wireless networks, routing protocols and wireless sensor networks.

Shah Zaman Nizamani

Shah Zaman Nizamani received his B.S degree in Computer Science from the University of Sindh, Pakistan and M.S degree in Computer Science from Muhammad Ali Jinnah University, Pakistan. He received his Ph.D. degree from QUEST, Pakistan in 2018. Currently, he is working as an Associate Professor at the Department of Information Technology, QUEST, Pakistan. He has worked for more than 7 years in the reputable software houses at Pakistan. He has published more than 25 papers in National and International journals and conferences. His research interests include cybersecurity and information security.

Saifullah Memon

Saifullah Memon received his M.S degree from QUEST, Pakistan and is currently pursuing his Ph.D. degree from the State Key Laboratory of Networking and Switching Technology, BUPT, Beijing, China. He has published more than 12 papers in National and International journals and conferences.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.