Abstract
The age of Internet of Things (IoT) has witnessed the rapid development of modern data acquisition devices and communicating-actuating networks, which enables the generation of big data streams shared across platforms for remote and efficient decision making of many critical systems. The monitoring of big data streams remains a challenging task in various practical applications mainly due to their complexity in interrelationships, large volume, and high velocity, which places prohibitive demands on monitoring methodologies and resources. To tackle the challenges of monitoring unexchangeable and correlated big data streams with only partial observations available under resource constraints, we propose a method by incorporating spatial rank-based statistics with effective data augmentation techniques for the online unobservable data streams that can analytically inform the monitoring and sampling decisions based only on partially observed data streams. By exploiting historical data, the proposed method preserves strong descriptive power of general big data streams under partial observations and can explicitly use the correlation among data streams, and thus allows effective monitoring and equitable sampling over general heterogeneous and correlated big data streams, which is free of simplified assumptions (e.g., exchangeability) compared to existing methods. Theoretical investigations are carried out to evaluate the effectiveness of the augmentation statistics as well as the sampling strategy, which guarantee the superiority of the sampling performance over existing methods. Simulations under various scenarios and two real case studies are also conducted to evaluate and validate the performance of the proposed method.
Supplementary Materials
The file “Supplementary sections.pdf” contains: (i) properties 1.1–2 as well as proofs of all properties in Section 3.2; (ii) the parameter settings of the proposed method; (iii) additional simulation study to justify the estimation performance in Section 4; and (iv) another case study of COVID-19 pandemic surveillance for further performance evaluation. The file “codes&data.zip” contains: (i) the codes for the proposed SRAS algorithm; and (ii) the aggregated data for the two real case studies.
Acknowledgments
The authors gratefully acknowledge the support provided by the funding agencies. The authors would also like to thank the editor, the associate editor and three reviewers’ helpful comments.
Disclosure Statement
The authors report that there are no competing interests to declare.