388
Views
5
CrossRef citations to date
0
Altmetric
Original Articles

Identification of approximately duplicate material records in ERP systems

, , &
Pages 434-451 | Received 17 Apr 2014, Accepted 21 Jun 2015, Published online: 09 Jul 2015
 

ABSTRACT

The quality of master data is crucial for the accurate functioning of the various modules of an enterprise resource planning (ERP) system. This study addresses specific data problems arising from the generation of approximately duplicate material records in ERP databases. Such problems are mainly due to the firm’s lack of unique and global identifiers for the material records, and to the arbitrary assignment of alternative names for the same material by various users. Traditional duplicate detection methods are ineffective in identifying such approximately duplicate material records because these methods typically rely on string comparisons of each field. To address this problem, a machine learning-based framework is developed to recognise semantic similarity between strings and to further identify and reunify approximately duplicate material records – a process referred to as de-duplication in this article. First, the keywords of the material records are extracted to form vectors of discriminating words. Second, a machine learning method using a probabilistic neural network is applied to determine the semantic similarity between these material records. The approach was evaluated using data from a real case study. The test results indicate that the proposed method outperforms traditional algorithms in identifying approximately duplicate material records.

Disclosure statement

No potential conflict of interest was reported by the authors.

Additional information

Funding

This work was supported by the National Natural Science Foundation [grant number 71428003], [grant number 71471144], [grant number 71071126].

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 199.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.