Search in:

Advanced search

International Journal of Computers and Applications Volume 42, 2020 - Issue 7

Submit an article Journal homepage

274

Views

CrossRef citations to date

Altmetric

Articles

Fault tolerance for a scientific workflow system in a Cloud computing environment

Miloud KhaldiComputer Science Department, Faculty of Exact Sciences, University of Mascara, Mascara, AlgeriaCorrespondence[email protected]
View further author information

Mohammed RebbahComputer Science Department, Faculty of Exact Sciences, University of Mascara, Mascara, AlgeriaView further author information

Boudjelal MeftahComputer Science Department, Faculty of Exact Sciences, University of Mascara, Mascara, AlgeriaView further author information

Omar SmailComputer Science Department, Faculty of Exact Sciences, University of Mascara, Mascara, AlgeriaView further author information

Pages 705-714 | Received 31 Mar 2019, Accepted 19 Jul 2019, Published online: 30 Jul 2019

Cite this article
https://doi.org/10.1080/1206212X.2019.1647651
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Taylor IJ, Deelman E, et al. Workflows for e-Science. In: Taylor IJ, Deelman E, Gannon DB, Shields M, editors. Scientific workflows for grids. London: Springer-Verlag; 2007. p. 526.
Google Scholar
Katz DS, Anagnostou N, Berriman GB, et al. Astronomical Image Mosaicking on a Grid: initial Experiences. In: BD Martino, J Dongarra, A Hoisie, LT Yang, H Zima, editor. Engineering the Grid: Status and Perspective. American Scientific Publishers; 2006.
Google Scholar
Maechling P, Deelman E, et al. SCEC CyberShake workflows— Automating Probabilistic Seismic Hazard Analysis calculations. workflows for e-Science. scientific workflows for Grids. London, UK: Springer; 2007.
Google Scholar
Deelman E, Ellisman MH. Enabling parallel scientific applications with workflow tools. presented at Challenges of Large Applications in Distributed Environments, 2006 IEEE; 2006.
Google Scholar
Juve G, Chervenak A, Deelman E, et al. Characterizing and profiling scientific workflows. Futur Gener Comput Syst. 2013;29(3):682–692.
Web of Science ®Google Scholar
Singh G, Su M, Vahi K, et al. Workflow task clustering for best effort systems with pegasus. Proceedings of the 15th ACM Mardi Gras Conference; 2008. p. 9.
Google Scholar
Chen W, Deelman E. Fault tolerant clustering in scientific workflows. Proceedings of the IEEE 8th World Congr. Services; 2012. pp. 9–16.
Google Scholar
Ferreira da Silva R, Glatard T, Desprez F. On-line, non-clairvoyant optimization of workflow activity granularity on grids. Proc Euro-Par Parallel Process. 2013;8097:255–266.
Google Scholar
Zhang Y, Squillante MS. Performance implications of failures in large-scale cluster scheduling. Proceedings of the 10th Workshop Job Scheduling Strategies Parallel Process; Jun. 2004. pp. 233–252.
Google Scholar
Schroeder B, Gibson GA. A large-scale study of failures in high-performance computing systems. International Conference on Dependable Systems and Networks; 2006. pp. 249–258.
Google Scholar
Deelman E, Blythe J, Gil Y, et al. Pegasus: Mapping scientific workflows onto the grid. Proceedings of the 2nd Eur. AcrossGrid Conference; 2004. pp. 11–20.
Google Scholar
Duan R, Prodan R, Fahringer T. Run-time optimisation of grid workflow applications. Proceedings of the 7th IEEE/ACM International Conference on Grid Computing; 2006. pp. 33–40.
Google Scholar
Ferreira da Silva R, Glatard T. A science-gateway workload archive to study pilot jobs, user activity, bag of tasks, task substeps, and workflow executions. Proceedings of the 18th International Conference on Parallel Processing Workshops; 2013. pp. 79–88.
Google Scholar
Bresnahan J, Freeman T, La Bissoniere D, et al. Managing appliance launches in infrastructure clouds. Proceedings of the Teragrid Conference; 2011. p. 12.
Google Scholar
Deelman E, Singh G, Livny M, et al. The cost of doing science on the cloud: The montage example. Proceedings of the ACM/IEEE Conference Supercomput; 2008. p. 50.
Google Scholar
Berriman GB, Juve G, Deelman E, et al. The application of cloud computing to astronomy: A study of cost and performance. Proceedings of the Workshop e-Science Challenges Astron. Astrophys; 2010. pp. 1–7.
Google Scholar
Vinay K, Dilip Kumar SM, Raghavendra S, et al. Multimed Tools Appl. 2018;77:10171. https://doi.org/10.1007/s11042-017-5304-7.
Google Scholar
Maheshwari K, Espinosa A, Wilde M, et al. Job and data clustering for aggregate use of multiple production cyberinfrastructures. Proceedings of the 5th International Workshop Data-Intensive Distributed Computing; 2012. pp. 3–12.
Google Scholar
Ferreira da Silva R, Glatard T, Desprez F. Controlling fairness and task granularity in distributed, online, non-clairvoyant workflow executions. Concurrency Comput, Practice Exp. 2014;26(14):2347–2366.
Web of Science ®Google Scholar
Chen W, Deelman E. Integration of workflow partitioning and resource provisioning. 12th IEEE/ACM International Symposium on Cluster Cloud and Grid Computing (CCGrid) 2012; May 2012. pp. 764–768,
Google Scholar
Chen W, da Silva RF, Deelman E, et al. Dynamic and fault-tolerant clustering for scientific workflows. IEEE Trans Cloud Comput. 2016;4(1):49–62.
Web of Science ®Google Scholar
Zhang Y, Mandal A, Koelbel C, et al. Combined fault tolerance and scheduling techniques for workflow applications on computational grids. Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing Grid; 2009. pp. 244–251.
Google Scholar
Montagnat J, Glatard T, Reimert D, et al. Workflow-based comparison of two distributed computing infrastructures. Proceedings of the 5th Workshop Workflows Support Large-Scale Sci.; 2010. pp. 1–10.
Google Scholar
Kandaswamy G, Mandal A, Reed D. Fault tolerance and recovery of scientific workflows on computational grids. Proceedings of the 8th IEEE International Symposium on Cluster Computing Grid; 2008. pp. 777–782.
Google Scholar
Plankensteiner K, Prodan R, Fahringer T. A new fault tolerance heuristic for scientific workflows in highly distributed environments based on resubmission impact. Proceedings of the 5th IEEE International Conference e-Sci.; 2009. pp. 313–320.
Google Scholar
Juve G, Deelman E, Vahi K, et al. Experiences with resource provisioning for scientific workflows using Corral. Sci Program. April 2010;18(2):77–92.
Web of Science ®Google Scholar
Amoon M, El-Bahnasawy N, Sadi S, et al. On the design of reactive approach with flexible checkpoint interval to tolerate faults in cloud computing systems. J Ambient Intell Human Comput. 2018. doi:10.1007/s12652-018-1139-y.
PubMed Web of Science ®Google Scholar
Ying C, Yu J, He J. Towards fault tolerance optimization based on checkpoints of in-memory framework spark. J Ambient Intell Human Comput. 2018. doi:10.1007/s12652-018-1018-6.
PubMedGoogle Scholar
Zhao W, Melliar-Smith PM, Moser LE. (2010). Fault tolerance middleware for cloud computing. Proceedings 2010 IEEE 3rd International Conference on Cloud Computing CLOUD; 2010. pp 67–74. https://doi.org/10.1109/CLOUD.2010.26.
Google Scholar
Jhawar R, Piuri V, Santambrogio M. (2012). A comprehensive conceptual system-level approach to fault tolerance in cloud computing. SysCon 2012–2012 IEEE International Systems Conference, Proceedings; pp 601–605. https://doi.org/10.1109/SysCon.2012.61895 03.
Google Scholar
Poola D, Ramamohanarao K, Buyya R. Fault-tolerant workflow scheduling using spot instances on clouds. Procedia Comput Sci. 2014;29:523–533. https://doi.org/10.1016/j.procs.2014.05.047.
Google Scholar
Padmakumari P, Umamakeswari A. J Ambient Intell Human Comput. 2019. doi:10.1007/s12652-019-01174-9.
PubMedGoogle Scholar
Deelman E, Vahi K, Juve G, et al. Pegasus, a workflow management system for science automation,” future Gen. Comput Syst: 17–35. doi:10.1016/j.future.2015.10.008, 2015.
Google Scholar
Samak T, Gunter D, Goode M, et al. Failure prediction and localization in large scientific workflows. Proceedings of the 6th Workshop Workflows Supporting Large-Scale Sci.; Nov. 2011. pp. 107–116.
Google Scholar
Plankensteiner K, Prodan R, Fahringer T, et al. Fault detection, prevention and recovery in current grid workflow systems. In: Grid and Services Evolution. New York, NY: Springer; 2009. p. 1–13.
Google Scholar
Muthuvelu N, Liu J, Soe NL, et al. A dynamic job grouping-based scheduling for deploying applications with fine grained tasks on global grids. Proceedings of the Australasian Workshop Grid Comput. e-res; 2005. pp. 41–48.
Google Scholar
Muthuvelu N, Chai I, Eswaran C. An adaptive and parameterized job grouping algorithm for scheduling grid jobs. Proceedings of the 10th International Conference on Advanced Communication Technology; 2008. pp. 975 –980.
Google Scholar
Muthuvelu N, Chai I, Chikkannan E, et al. On-line task granularity adaptation for dynamic grid applications. Proceedings of the 10th International Conference on Algorithms Archit. Parallel Process.; 2010. pp. 266–277.
Google Scholar
Ng WK, Ang T, Ling T, et al. Scheduling framework for bandwidth-aware job grouping-based scheduling in grid computing. Malaysian J Comput Sci. 2006;19(2):117–126.
Google Scholar
Ang T, Ng W, Ling T, et al. A bandwidth-aware job grouping-based scheduling on grid environment. Inf Technol J. 2009;8:372–377.
Google Scholar
Liu Q, Liao Y. Grouping-based fine-grained job scheduling in grid computing. Proceedings of the 1st Int. Workshop Educ. Technol. Comput. Sci.; Mar. 2009., pp. 556–559.
Google Scholar
Stergiou C, Psannis KE, Kim B-G, et al. Secure integration of IoT and cloud computing. Future Gener Comput Syst. 2018;78:964–975. doi:10.1016/j.future.2016.11.031.
Web of Science ®Google Scholar
Dharwadkar NV, Poojara SR, Kadam PM. Fault tolerant and Optimal task clustering for scientific workflow in Cloud. Int J Cloud Appl Comput. 2018;8(3):1–19. https://doi.org/10.4018/ijcac.2018070101.
Google Scholar
M. Shojafar, S. Javanmardi, S. Abolfazli, N. Cordeschi, “FUGE: A joint meta-heuristic approach to cloud job scheduling algorithm using fuzzy theory and a genetic method”, Cluster Comput, Vol. 18, Issue 2, pp 829–844, June 2015.
Web of Science ®Google Scholar
Topcuoglu H, Hariri S, Wu M-Y. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst. Mar. 2002;13(3):260–274.
Web of Science ®Google Scholar
Blythe J, Jain S, Deelman E, et al. Task scheduling strategies for workflow-based applications in grids. Proceedings of the 5th IEEE International Symposium on Cluster Computing Grid; 2005. pp. 759–767.
Google Scholar
Kalayci S, Dasgupta G, Fong L, et al. Distributed and adaptive execution of condor DAGMan workflows. Proceedings of the 22nd International Conference on Software Engineering and Knowledge Engineering; 2010. pp. 587–590.
Google Scholar
Fahringer T, Prodan R, Duan R, et al. ASKALON: A development and grid computing environment for scientific workflows. Proceedings of the Workflows e-Sci.; 2007. pp. 450–471.
Google Scholar
Oinn T, Addis M, Ferris J, et al. Taverna: A tool for the composition and enactment of bioinformatics workflows. Bioinformatics. 2004;20(17):3045–3054.
PubMed Web of Science ®Google Scholar
Topcuoglu H, Hariri S, Wu M-Y. Task scheduling algorithms for heterogeneous processors. In 8th Proceedings of the of Heterogeneous Computing Workshop; 1999.
Google Scholar
Amazon EC2. 2019. http://aws.amazon.com/ec2.
Google Scholar
Rodriguez MA, Buyya R. Deadline based resource Provisioningand scheduling algorithm for scientific workflows on Clouds. IEEE Trans Cloud Comput. 2014;2(2):222–235.
Google Scholar
Li Z, Ge J, Hu HY, et al. Cost and energy Aware scheduling algorithm for scientific workflows with deadline constraint in Clouds. IEEE Trans Ser Comput. 2018;11(4):713–726.
Web of Science ®Google Scholar
Li Z, Ge J, Yang H, et al. A security and cost aware scheduling algorithm for heterogeneous tasks of scientific workflow in Clouds. Future Gener Comput Syst. 2016;65:140–152.
Web of Science ®Google Scholar
Yao G, Ding Y, Hao K. Using imbalance characteristic for fault-tolerant workflow scheduling in Cloud systems. Trans Parallel Distrib Syst. 2017;28(12):3671–3683.
Web of Science ®Google Scholar
Manimaran G, Murthy CSR. A fault-tolerant dynamic scheduling algorithm for multiprocessor real-time systems and its analysis. IEEE Trans Parallel Distrib Syst. 1998;9(11):1137–1152.
Web of Science ®Google Scholar
Zhu X, Wang J, Guo H, et al. Fault-Tolerant scheduling for real-time scientific workflows with elastic resource provisioning in virtualized Clouds. IEEE Trans Parallel Distrib Syst. 2016;27(12):3501–3517.
Web of Science ®Google Scholar
Bharathi S, Chervenak A, Deelman E, et al. Characterization of scientific workflows. 2008 Third Workshop on Workflows in Support of Large-Scale Science; 2008. pp. 1–10.
Google Scholar
Chen W, Deelman E. Workflowsim: A toolkit for simulating scientific workflows in distributed environments. IEEE 8th International Conference on Escience (e-science), 2012, IEEE; 2012. pp. 1–8.
Google Scholar
Calheiros RN, Ranjan R, Beloglazov A, et al. Cloudsim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw: Practice Experience. 2011;41(1):23–50.
Web of Science ®Google Scholar
Ferreira da Silva R, Chen W, Juve G, et al. Community resources for enabling and evaluating research on scientific workflows. Proceedings of the 10th IEEE International Conference on e-Science; 2014. pp. 177–184.
Google Scholar
Workflow Archive. [Online]. Available: http://workflowarchive. org, 2019.
Google Scholar
Baccarelli E, Naranjo PGV, Scarpiniti M, et al. Fog of everything: energy-efficient networked computing architectures research challenges and a case study. IEEE Access. 2017;5:9882–9910.
Web of Science ®Google Scholar
Baccarelli E, Naranjo PGV, Shojafar M, et al. Q: energy and delay-efficient dynamic queue management in TCP/IP virtualized data centers. Comput Commun. Apr. 2017;102:89–106.
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Fault tolerance for a scientific workflow system in a Cloud computing environment

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Fault tolerance for a scientific workflow system in a Cloud computing environment

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date