ABSTRACT
Information Quality (IQ) is a critical factor for the success of many activities in the information age, including the development of data warehouses and implementation of data mining. The issue of IQ risk is recognized during the process of data mining; however, there is no formal methodological approach to dealing with such issues.
Consequently, it is essential to measure the risk of IQ in a data warehouse to ensure success in implementing data mining. This article presents a methodology to determine three IQ risk characteristics: accuracy, comprehensiveness, and non-membership. The methodology provides a set of quantitative models to examine how the quality risks of source information affect the quality for information outputs produced using the relational algebra operations: Restriction, Projection, and Cubic product. It can be used to determine how quality risks associated with diverse data sources affect the derived data. The study also develops a data cube model and associated algebra to support IQ risk operations.
ACKNOWLEDGMENTS
This research was supported by the National Natural Science Foundation of China (Project No: 70772021, 70372004) and the China Postdoctoral Science Foundation (20060400077). The authors also thank HERA Guest Editors Dash Wu and David Olson for their very helpful comments and suggestions.