562
Views
3
CrossRef citations to date
0
Altmetric
Quality and Reliability Engineering

Monitoring the data quality of data streams using a two-step control scheme

, &
Pages 985-998 | Received 19 Jan 2018, Accepted 23 Sep 2018, Published online: 08 Feb 2019

References

  • Alwan, L.C. and Roberts, H.V. (1988) Time-series modeling for statistical process control. Journal of Business and Economic Statistics, 6, 87–95.
  • Ballou, D.P. and Pazer, H.L. (1985) Modeling data and process quality in multi-input, multi-output information system. Management Science, 31, 150–162.
  • Berti-Equille, L., Dasu, T. and Srivastava, D. (2011) Discovery of complex glitch patterns: A novel approach to quantitative data cleaning, in the IEEE 27th International Conference on Data Engineering, IEEE Press, Piscataway, NJ, pp. 146–158.
  • Chiu, J.E. and Kuo, T.I. (2007) Attribute control chart for multivariate Poisson distribution. Communications in Statistics-Theory and Methods, 37, 146–158.
  • Chu, X. and Ilyas, I.F. (2016) Qualitative data cleaning. Proceedings of the VLDB Endowment, 9, 1605–1608.
  • Cozzucoli, P.C. and Marozzi, M. (2018) Monitoring multivariate Poisson processes: A review and some new results. Quality Technology and Quantitative Management, 15, 53–68.
  • Dasu, T., Duan, R. and Srivastava, D. (2016) Data quality for temporal streams. IEEE Data Engineering Bulletin, 39, 78–92.
  • Dasu, T. and Loh, J.M. (2012) Statistical distortion: Consequences of data cleaning. Proceedings of the VLDB Endowment, 5, 1674–1683.
  • De Maesschalck, R., Jouan-Rimbaud, D. and Massart D.L. (2000) The Mahalanobis distance. Chemometrics and Intelligent Laboratory Systems, 50, 1–18.
  • Delone, W.H. and McLean, E.R. (1992) Information systems success: The quest for the dependent variable. Information Systems Research, 3, 60–95.
  • Goodhue, D.L. (1995) Understanding user evaluations of information systems. Management Science, 41, 1827–1844.
  • Harris, T.J. and Ross, W.H. (1991) Statistical process control procedures for correlated observations. Canadian Journal of Chemical Engineering, 69, 48–57.
  • Haug, A., Zachariassen, F. and van Liempd, D. (2011) The costs of poor data quality. Journal of Industrial Engineering and Management, 4, 168–193.
  • Hazen, B.T., Boone, C.A., Ezell, J.D. and Jones-Farmer, L.A. (2014) Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications. International Journal of Production Economics, 154, 72–80.
  • He, S., He, Z. and Wang, G.A. (2014) CUSUM control charts for multivariate Poisson distribution. Communications in Statistics-Theory and Methods, 43, 1192–1208.
  • Jones, L.A., Woodall, W.H. and Conerly, M.D. (1999) Exact properties of demerit control charts. Journal of Quality Technology, 31, 207–216.
  • Jones-Farmer, L.A., Ezell, J.D. and Hazen, B.T. (2014) Applying control chart methods to enhance data quality. Technometrics, 56, 29–41.
  • Kenett, R.S. and Shmueli, G. (2016) Information Quality: The Potential of Data and Analytics to Generate Knowledge, John Wiley and Sons, Chichester, UK.
  • Lee, Y.W., Strong, D.M., Kahn, B.K. and Wang, R.Y. (2002) Aimq: A methodology for information quality assessment. Informantion and Management, 40, 133–146.
  • Li, J., Tsung, F. and Zou, C. (2012) Directional control schemes for multivariate categorical processes. Journal of Quality Technology, 44, 136–154.
  • Li, Y. and Tsung, F. (2009) False discovery rate-adjusted charting schemes for multistage process monitoring and fault identification. Technometrics, 51, 186–205.
  • Liu, R.Y. (1995) Control charts for multivariate processes. Journal of the American Statistical Association, 90, 1380–1387.
  • Patel, H.I. (1973) Quality control methods for multivariate binomial and Poisson distributions. Technometrics, 15, 103–112.
  • Pierchala, C.E. and Surti, J. (2009) Control chart as a tool for data quality control. Journal of Official Statistics, 25, 167–191.
  • Pipino, L.L., Lee, Y.W. and Wang, R.Y. (2002) Data quality assessment. Communications of the ACM, 45, 211–218.
  • Qi, D., Li, Z. and Wang, Z. (2016) On-line monitoring data quality of high-dimensional data streams. Journal of Statistical Computation and Simulation, 86, 2204–2216.
  • Qiu, P. (2013) Introduction to Statistical Process Control, Taylor and Francis Group, Boca Raton, FL.
  • Redman T.C. (1992) Data Quality: Management and Technology, Bantam Books, New York, NY.
  • Redman, T.C. (1998) The impact of poor data quality on the typical enterprise. Communications of the ACM, 41, 79–82.
  • Ross, G.J., Tasoulis, D.K. and Adams, N.M. (2011) Nonparametric monitoring of data streams for changes in location and scale. Technometrics, 53, 379–389.
  • Silvola, R., Harkonen, J., Vilppola, O., Kropsu-Vehkapera, H. and Haapasalo, H. (2016) Data quality assessment and improvement. International Journal of Business Information Systems, 22, 62–81.
  • Sparks, R. and Okugami, C. (2012) Data quality: Algorithms for automatic detection of unusual measurements. Frontiers in Statistical Quality Control, 10, 385–400.
  • Tayi, G.K. and Ballou, D.P. (1998) Examining data quality. Communications of the ACM, 41, 54–57.
  • Topalidou, E. and Psarakis, S. (2009) Review of multinomial and multiattribute quality control charts. Quality and Reliability Engineering International, 25, 773–804.
  • Wand, Y. and Wang, R.Y. (1996) Anchoring data quality dimensions in ontological foundations. Communications of the ACM, 39, 86–95.
  • Wang, R.Y. and Strong, D.M. (1996) Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 12, 5–33.
  • Xian, X., Wang, A. and Liu, K. (2018) A nonparametric adaptive sampling strategy for online monitoring of big data streams. Technometrics, 60, 14–25.
  • Zmud, R. (1978) Concepts, theories and techniques: An empirical investigation of the dimensionality of the concept of information. Decision Sciences, 9, 187–195.
  • Zou, C., Wang, Z., Zi, X. and Jiang, W. (2015) An efficient online monitoring method for high-dimensional data streams. Technometrics, 57, 374–387.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.