Abstract
Extraction of meaningful information from a huge amount of data available on the web is a quite challenging task. The challenges faced in information extraction can be overcome with the help of an efficient named entity recognition (NER) system. Named entities are the proper names that play an important role in searching important information of interest. In this study, an efficient deep learning-based NER technique has been proposed which recognizes the named entities belonging to the general domain from Hindi, Punjabi, and bilingual Hindi and Punjabi text. An important variant of recurrent neural network, namely bidirectional long short-term memory-based model using improved word embeddings has been developed. Improved word embeddings are the combination of character convolutional neural network embeddings and part of speech embeddings. The main findings of the study include the development of a NER system that can extract named entities not only from Hindi and Punjabi datasets individually but also from mixed Hindi and Punjabi text. Besides, improved word embeddings are the combination of character-level features and word-level features which we find as the novel work as per our knowledge. Improved word embeddings are found to be effective in achieving better results than the results obtained by earlier NER models with deep feature extraction tasks.
Additional information
Notes on contributors
![](/cms/asset/188de193-bcfa-4e87-a870-ec18d7210946/tijr_a_2006805_ilg0001.gif)
Archana Goyal
Archana Goyal is a research scholar in Department of Computer Science and Applications in Panjab University, Chandigarh, India. Currently, she is doing research on named entity recognition for Hindi and Punjabi text. Her research interests include artificial intelligence, machine/deep leaning and natural language processing. Email: [email protected]
![](/cms/asset/f1efc00a-041b-43ca-89be-16bf8ac7d05a/tijr_a_2006805_ilg0002.gif)
Vishal Gupta
Vishal Gupta is an associate professor in University Institute of Engineering and Technology, Panjab University, Chandigarh, India. Vishal Gupta received BTech degree in computer science & engineering from SBS CET Ferozepur, Punjab, India. He achieved his MTech in computer science and engineering from Department of Computer Science, Punjabi University Patiala, Punjab, India. In 2013, he was awarded PhD in Faculty of Engineering and Technology for his research in the field of Automatic Text Summarization for Punjabi Language. His research interests include artificial intelligence, machine/deep leaning and natural language processing, automatic text summarization.
![](/cms/asset/c792afec-4e83-4cb8-b527-7312ceec8ab3/tijr_a_2006805_ilg0003.gif)
Manish Kumar
Manish Kumar is a professor in computer science and applications, Panjab University Regional Centre Muktsar, Punjab, India. Manish Kumar received BSc degree from Punjabi University, Patiala, Punjab, India. He achieved his MCA degree from Department of Computer Science, Punjabi University Patiala, Punjab, India. In 2008, he was awarded PhD from Department of Computer Science and Engineering, Thapar University, Patiala on the topic Degraded Text Recognition of Gurmukhi Script. His research interests include pattern recognition, optical character recognition, natural language processing. Email: [email protected]