ABSTRACT
Due to the increasing volume of data for applications running on geographically distributed Cloud systems, the need for efficient data management has emerged as a crucial performance factor. Alongside basic task scheduling, the management of input data on distributed Cloud systems has become a genuine challenge, particularly with data-intensive applications. Ideally, each dataset should be stored in the same data center as its consumer tasks so as to lead to local data accesses only. However, when a given task does not need all items within one of its input datasets, sending that dataset entirely might lead to a severe time overhead. To address this concern, a data fragmentation strategy can be considered in order to partition the datasets and process them in that form. Such a strategy should be flexible enough to support any user-defined partitioning, and suitable enough to minimize the overhead of transferring the data in their fragmented form. To simulate and estimate the basic statistics of both fragmentation and migration mechanisms prior to an implementation in a real Cloud, we chose Cloudsim, with the goal of enhancing it with the corresponding extensions. Cloudsim is a popular simulator for Cloud Computing investigations. Our proposed extension is named DFMCloudsim, its goal is to provide an efficient module for implementing fragmentation and data migration strategies. We validate our extension using various simulated scenarios. The results indicate that our extension effectively achieves its main objectives and can reduce data transfer overhead by 74.75% compared to our previous work.
Acknowledgments
L. B.: prepared the manuscript, and performed analysis and experiments. M. Z., C. T.: helped in the initial solution design. All authors reviewed the paper and approved the final version of the manuscript.
Availability of data and materials
All of the material is owned by the authors and can be accessed by email request.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Additional information
Notes on contributors
Laila Bouhouch
Laila Bouhouch received her engineer degree in Computer Science at ENSA (National School of Applied Sciences) at Ibn Zohr University, Agadir, Morocco, in 2017. She is currently a Ph.D. student in the Department of Computer Science, Laboratory CEDOC ST2I, ENSIAS, Rabat, Morocco. Her research interests include big data management in workflow systems, cloud computing and distributed systems.
Mostapha Zbakh
Mostapha Zbakh received his Ph.D. in computer sciences from Polytechnic Faculty of Mons, Belgium, in 2001. He is currently a Professor at ENSIAS (National School of Computer Science and System Analysis) at Mohammed V University, Rabat, Morocco, since 2002. His research interests include load balancing, parallel and distributed systems, HPC, Big data and Cloud computing.
Claude Tadonki
Claude Tadonki currently holds a research position at Mines ParisTech/CRI, working on HPC topics and automatic code transformations. His background is a combination of mathematics and computer science. From his Ph.D. and during his different positions afterwards, he has been involved in cutting-edge researches related to high-performance computing and operation research, following the sequence model, method, and implementation. He is still interested in fundamental questions about difficult genuine problems, while striving to understand how the advances in optimization, algorithmic, programming, and supercomputers can be efficiently combined to provide the best answer.