235
Views
2
CrossRef citations to date
0
Altmetric
Articles

Job failure prediction in Hadoop based on log file analysis

& ORCID Icon
Pages 260-269 | Received 22 Mar 2019, Accepted 14 Feb 2020, Published online: 01 Mar 2020
 

Abstract

Hadoop is a popular framework based on MapReduce programming model to allow for distributed processing of large datasets across clusters with various number of computer nodes. Just like any dynamic computational environment, Hadoop has some problems and one of which is unsuccessful execution of MapReduce jobs. Job failures can cause significant resource wasting, performance deterioration, and user dissatisfaction. Therefore, a proactive and predictive management approach could be very useful in Hadoop systems. In this paper, we try to predict the futurity of MapReduce jobs in OpenCloud Hadoop cluster by using its log files. OpenCloud is a research cluster managed by CMU’s Parallel Data Lab which uses Hadoop to process big data. We first tried to study the log files and analyze the relationship between the jobs, resources, and workload characteristics and the failures in order to discover the effective features for the prediction process. After recognizing the job failure patterns, some popular machine learning algorithms are deployed to predict the success/failure status of the jobs before they start to execute. Eventually, we compared the learning methods and showed that the C5.0 algorithm had the best results with an accuracy of 91.37%, a recall of 74.43%, and a precision of 80.31%.

Acknowledgments

The log files used in this study are provided and shared by CMU’s Parallel Data Lab (hla. Available: ftp://ftp.pdl.cmu.edu/pub/datasets)

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Notes on contributors

Ehsan Shirzad

Ehsan Shirzad has received his M.Sc. degree in Information Technology Engineering at University of Birjand in 2018. His research interests include Data mining, Machine learning, Big data, Data analysis, Internet of things, and Health informatics.

Hamid Saadatfar

Hamid Saadatfar is currently an assistant professor of Computer Department at University of Birjand. He has received his B.Sc., M.Sc., and Ph.D. degrees from Ferdowsi university of Mashhad in 2007, 2009 and 2013, respectively. His research interests include cluster and grid computing, distributed data mining, big data analysis and power aware computing.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 288.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.