Abstract
Data-rich environments provide unprecedented opportunities for monitoring data quality. This article focuses on the quality of data streams. We use indicator variables to measure the six dimensions of data quality and a glitch index to indicate the poor level of quality. A two-step control scheme is proposed considering two relationships: the inter- and intra-correlation. In the first step, the Mahalanobis distance is applied to an χ2-type control chart to monitor the quality of a data stream. In the second step, a Shewhart control chart is built based on a weighted-sum statistic, which measures the quality of the whole process. The feasibility and effectiveness of the control scheme are illustrated through detailed simulation studies and one landslide example. The simulated results, considering the three cases of no correlation, low correlation, and high correlation, show that the proposed approach can detect the mean shift in multi-attribute data sensitively and robustly. The example, in which sensors are used to collect data on accelerations in Taiwan, demonstrates the superiority of our design over four traditional control charts, producing the closest type-I error to the given level and the highest power under the same type-I error.
Acknowledgements
The authors thank the editor, associate editor, and anonymous referees for their many helpful comments that have resulted in significant improvements in the article.
Notes on contributors
Miaomiao Yu is a Ph.D. candidate in the School of Statistics and Management at Shanghai University of Finance and Economics. She received both her B.Sc. and M.Sc. from the Shanghai University of Finance and Economics. Her research interests include statistical process control and data mining.
Chunjie Wu is a professor and also Vice Dean of School of Statistics and Management, at Shanghai University of Finance and Economics. He received both his B.Sc. and Ph.D. from the Nankai University. His research fields include statistical process control and applied statistics.
Fugee Tsung is a professor of the Department of Industrial Engineering and Decision Analytics, Director of the Quality and Data Analytics Lab, at the Hong Kong University of Science and Technology. He is a Fellow of the Institute of Industrial and Systems Engineers Fellow of the American Society for Quality, Fellow of the American Statistical Association, Academician of the International Academy for Quality, and Fellow of the Hong Kong Institution of Engineers. He is Editor-in-Chief of Journal of Quality Technology. He has authored over 100 refereed journal publications, and is the winner of the Best Paper Award for IISE Transactions in 2003, 2009, and 2017. He received both his M.Sc. and Ph.D. from the University of Michigan, Ann Arbor and his B.Sc. from National Taiwan University. His research interests include industrial big data and quality analytics.