189
Views
4
CrossRef citations to date
0
Altmetric
Computers and Computing

Recurrent Neural Network-Based Model for Named Entity Recognition with Improved Word Embeddings

, &
Pages 6970-6976 | Published online: 16 Dec 2021
 

Abstract

Extraction of meaningful information from a huge amount of data available on the web is a quite challenging task. The challenges faced in information extraction can be overcome with the help of an efficient named entity recognition (NER) system. Named entities are the proper names that play an important role in searching important information of interest. In this study, an efficient deep learning-based NER technique has been proposed which recognizes the named entities belonging to the general domain from Hindi, Punjabi, and bilingual Hindi and Punjabi text. An important variant of recurrent neural network, namely bidirectional long short-term memory-based model using improved word embeddings has been developed. Improved word embeddings are the combination of character convolutional neural network embeddings and part of speech embeddings. The main findings of the study include the development of a NER system that can extract named entities not only from Hindi and Punjabi datasets individually but also from mixed Hindi and Punjabi text. Besides, improved word embeddings are the combination of character-level features and word-level features which we find as the novel work as per our knowledge. Improved word embeddings are found to be effective in achieving better results than the results obtained by earlier NER models with deep feature extraction tasks.

Additional information

Notes on contributors

Archana Goyal

Archana Goyal is a research scholar in Department of Computer Science and Applications in Panjab University, Chandigarh, India. Currently, she is doing research on named entity recognition for Hindi and Punjabi text. Her research interests include artificial intelligence, machine/deep leaning and natural language processing. Email: [email protected]

Vishal Gupta

Vishal Gupta is an associate professor in University Institute of Engineering and Technology, Panjab University, Chandigarh, India. Vishal Gupta received BTech degree in computer science & engineering from SBS CET Ferozepur, Punjab, India. He achieved his MTech in computer science and engineering from Department of Computer Science, Punjabi University Patiala, Punjab, India. In 2013, he was awarded PhD in Faculty of Engineering and Technology for his research in the field of Automatic Text Summarization for Punjabi Language. His research interests include artificial intelligence, machine/deep leaning and natural language processing, automatic text summarization.

Manish Kumar

Manish Kumar is a professor in computer science and applications, Panjab University Regional Centre Muktsar, Punjab, India. Manish Kumar received BSc degree from Punjabi University, Patiala, Punjab, India. He achieved his MCA degree from Department of Computer Science, Punjabi University Patiala, Punjab, India. In 2008, he was awarded PhD from Department of Computer Science and Engineering, Thapar University, Patiala on the topic Degraded Text Recognition of Gurmukhi Script. His research interests include pattern recognition, optical character recognition, natural language processing. Email: [email protected]

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 100.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.