Browse
We’re here to help

Find guidance on Author Services

Search
Browse
We’re here to help

Find guidance on Author Services

Home
All Journals
The Imaging Science Journal
List of Issues
Volume 69, Issue 5-8
Information extraction for different lay ....

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Search in:

Advanced search

The Imaging Science Journal Volume 69, 2021 - Issue 5-8

Submit an article Journal homepage

149

Views

CrossRef citations to date

Altmetric

Research Articles

Information extraction for different layouts of invoice images

Krittin SatirapiwongGraduate School of Applied Statistics, National Institute of Development Administration, Bangkok, ThailandView further author information

Thitirat SiriborvornratanakulGraduate School of Applied Statistics, National Institute of Development Administration, Bangkok, ThailandCorrespondence[email protected]

https://orcid.org/0000-0002-6530-5302 View further author information

Pages 417-429 | Received 12 Sep 2022, Accepted 05 Dec 2022, Published online: 03 Mar 2023

Cite this article
https://doi.org/10.1080/13682199.2022.2157367
CrossMark

Sample our Engineering & Technology journals, sign in here to start your access, latest two full volumes FREE to you for 14 days

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
Read this article /doi/full/10.1080/13682199.2022.2157367?needAccess=true

ABSTRACT

In the organization, they purchased goods or services from different suppliers and used invoice documents to confirm the payment. The invoice documents contained information that can be used for a business decision but, the process of information extraction required many resources to collect the data. The traditional way used template matching-based methods. This process identifies the parts on an image that match a predefined template and requires new manual annotation when processing the new image layout. Therefore, developing a system for robustly extracting entities from different layouts of invoices is necessary. Existing research applied deep learning and Name Entity Recognition (NER) for information extraction but, extracting invoice information was widely done in English and Chinese languages. In this study, we constructed a deep learning model using BiLSTM-CRF (Bidirectional Long Short-Term Memory-Conditional Random Fields) with word and character embedding for information extraction from different layouts of Thai invoice images. The model was evaluated by Semantic Evaluation at a full named-entity level. Our experimental results showed that this method can achieve a precision of 0.9557, recall of 0.9486, and F1-score of 0.9521 for the partial match; precision of 0.9329, recall of 0.9259, and F1-score of 0.9294 for the exact match and the result of the F1-score was significantly influenced by the quality of images and text result from Optical Character Recognition (OCR).

Abbreviations: BERT: bidirectional encoder representations from transformers; BiLSTM: bidirectional long short-term memory; COR: correct; CRF: conditional random fields; CV: computer vision; ELMO: embeddings from language model; INC: incorrect; MIS: missing; MSE: mean squared error; MUC: message understanding conference; NER: named entity recognition; NLP: natural language processing; OCR: optical character recognition; PAR: partial; SemEval: semantic evaluation; SPU: spurius

KEYWORDS:

Artificial intelligence
deep learning
named entity recognition
optical character recognition
bidirectional long short-term memory
conditional random fields
information extraction
invoice processing

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Krittin Satirapiwong

Krittin Satirapiwong is currently studying the master's degree in Business Analytics and Data Science from the Graduate School of Applied Statistics, National Institute of Development Administration, Bangkok, Thailand. His research interests are in Data Analytics, Data Science, and Applied Statistics.

Thitirat Siriborvornratanakul

Thitirat Siriborvornratanakul received the B.Eng. degree (first class honors) in computer engineering from Chulalongkorn University, Bangkok, Thailand, in 2005, and the master's degree in engineering and the Ph.D. degree in engineering from the University of Tokyo, Tokyo, Japan, in 2008 and 2011, respectively. She is currently an Assistant Professor of Computer Science in the Graduate School of Applied Statistics, National Institute of Development Administration, Bangkok, Thailand. She has served as a reviewer for many peer-review journals and international conferences in Computer Science. Her research interests are in Artificial Intelligence, Deep Learning, Computer Vision, Augmented Reality, and Human-Computer Interaction.

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Username Password

Forgot password?

Keep me logged in (not suitable for shared devices).

You will otherwise be logged out automatically, after a limited period, and will need to log in again.

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later Item saved, go to cart

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 61.00 Add to cart

PDF download + Online access - Online Checkout

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 305.00 Add to cart

Issue Purchase - Online Checkout

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references