Abstract
There is a notable characteristic of the data access pattern: 80% I/O requests only access 20% data. This feature brings about the concept of hotspot data, which refer to the data in the most frequent requested area. The access to these hotspot data has direct influence upon the performance of the storage system's applications. Therefore, how to predict hotspot data is a critical research focus in the optimization of the storage system. In this paper, we propose a hotspot data prediction model based on a Zipf-like distribution, which can estimate and dynamically adjust parameters according to the present statistics of I/O access. We classify the hotspot data from every trace, and analyse the prediction rate through the classified hotspot data's characteristic. We synthesize the analysis results in different time granularities and hotspot data prediction queue lengths. Finally, we use block I/O traces to discuss the effectiveness of this model. The discussion and analysis results indicate that this model can predict the hotspot data efficiently.
Acknowledgements
The authors wish to thank referees for their constructive comments and recommendations which have significantly improved the presentation of this paper. This work is sponsored in part by the National Basic Research Program of China (973 Program) under Grant No. 2011CB302303 and the National Natural Science Foundation of China under Grant No. 60933002, and the HUST Fund under Grant Nos 2011QN053 and 2011QN032, and the Fundamental Research Funds for the Central Universities.