136
Views
0
CrossRef citations to date
0
Altmetric
Research Articles

Information extraction for different layouts of invoice images

& ORCID Icon
Pages 417-429 | Received 12 Sep 2022, Accepted 05 Dec 2022, Published online: 03 Mar 2023
 

ABSTRACT

In the organization, they purchased goods or services from different suppliers and used invoice documents to confirm the payment. The invoice documents contained information that can be used for a business decision but, the process of information extraction required many resources to collect the data. The traditional way used template matching-based methods. This process identifies the parts on an image that match a predefined template and requires new manual annotation when processing the new image layout. Therefore, developing a system for robustly extracting entities from different layouts of invoices is necessary. Existing research applied deep learning and Name Entity Recognition (NER) for information extraction but, extracting invoice information was widely done in English and Chinese languages. In this study, we constructed a deep learning model using BiLSTM-CRF (Bidirectional Long Short-Term Memory-Conditional Random Fields) with word and character embedding for information extraction from different layouts of Thai invoice images. The model was evaluated by Semantic Evaluation at a full named-entity level. Our experimental results showed that this method can achieve a precision of 0.9557, recall of 0.9486, and F1-score of 0.9521 for the partial match; precision of 0.9329, recall of 0.9259, and F1-score of 0.9294 for the exact match and the result of the F1-score was significantly influenced by the quality of images and text result from Optical Character Recognition (OCR).

Abbreviations: BERT: bidirectional encoder representations from transformers; BiLSTM: bidirectional long short-term memory; COR: correct; CRF: conditional random fields; CV: computer vision; ELMO: embeddings from language model; INC: incorrect; MIS: missing; MSE: mean squared error; MUC: message understanding conference; NER: named entity recognition; NLP: natural language processing; OCR: optical character recognition; PAR: partial; SemEval: semantic evaluation; SPU: spurius

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Krittin Satirapiwong

Krittin Satirapiwong is currently studying the master's degree in Business Analytics and Data Science from the Graduate School of Applied Statistics, National Institute of Development Administration, Bangkok, Thailand. His research interests are in Data Analytics, Data Science, and Applied Statistics.

Thitirat Siriborvornratanakul

Thitirat Siriborvornratanakul received the B.Eng. degree (first class honors) in computer engineering from Chulalongkorn University, Bangkok, Thailand, in 2005, and the master's degree in engineering and the Ph.D. degree in engineering from the University of Tokyo, Tokyo, Japan, in 2008 and 2011, respectively. She is currently an Assistant Professor of Computer Science in the Graduate School of Applied Statistics, National Institute of Development Administration, Bangkok, Thailand. She has served as a reviewer for many peer-review journals and international conferences in Computer Science. Her research interests are in Artificial Intelligence, Deep Learning, Computer Vision, Augmented Reality, and Human-Computer Interaction.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 305.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.