593
Views
1
CrossRef citations to date
0
Altmetric
Articles

Spatial Rank-Based Augmentation for Nonparametric Online Monitoring and Adaptive Sampling of Big Data Streams

, &
Pages 243-256 | Received 18 Mar 2021, Accepted 29 Oct 2022, Published online: 01 Dec 2022
 

Abstract

The age of Internet of Things (IoT) has witnessed the rapid development of modern data acquisition devices and communicating-actuating networks, which enables the generation of big data streams shared across platforms for remote and efficient decision making of many critical systems. The monitoring of big data streams remains a challenging task in various practical applications mainly due to their complexity in interrelationships, large volume, and high velocity, which places prohibitive demands on monitoring methodologies and resources. To tackle the challenges of monitoring unexchangeable and correlated big data streams with only partial observations available under resource constraints, we propose a method by incorporating spatial rank-based statistics with effective data augmentation techniques for the online unobservable data streams that can analytically inform the monitoring and sampling decisions based only on partially observed data streams. By exploiting historical data, the proposed method preserves strong descriptive power of general big data streams under partial observations and can explicitly use the correlation among data streams, and thus allows effective monitoring and equitable sampling over general heterogeneous and correlated big data streams, which is free of simplified assumptions (e.g., exchangeability) compared to existing methods. Theoretical investigations are carried out to evaluate the effectiveness of the augmentation statistics as well as the sampling strategy, which guarantee the superiority of the sampling performance over existing methods. Simulations under various scenarios and two real case studies are also conducted to evaluate and validate the performance of the proposed method.

Supplementary Materials

The file “Supplementary sections.pdf” contains: (i) properties 1.1–2 as well as proofs of all properties in Section 3.2; (ii) the parameter settings of the proposed method; (iii) additional simulation study to justify the estimation performance in Section 4; and (iv) another case study of COVID-19 pandemic surveillance for further performance evaluation. The file “codes&data.zip” contains: (i) the codes for the proposed SRAS algorithm; and (ii) the aggregated data for the two real case studies.

Acknowledgments

The authors gratefully acknowledge the support provided by the funding agencies. The authors would also like to thank the editor, the associate editor and three reviewers’ helpful comments.

Disclosure Statement

The authors report that there are no competing interests to declare.

Additional information

Funding

This work was supported in part by the National Science Foundation under grant 2032734, National Science Foundation of China under grant 72101148, Shanghai Sailing Program under grant 21YF1420100, and National Science Foundation of Shanghai under grant 22ZR1433000.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.