235
Views
2
CrossRef citations to date
0
Altmetric
Articles

Job failure prediction in Hadoop based on log file analysis

& ORCID Icon
Pages 260-269 | Received 22 Mar 2019, Accepted 14 Feb 2020, Published online: 01 Mar 2020

References

  • George G, Haas MR, Pentland A. Big data and management. Acad Manage J. 2014;57(2):321–326. doi: https://doi.org/10.5465/amj.2014.4002
  • Birkin M. Big data: big data for social science research. Ubiquity. 2018;2018(January):1–7. doi: https://doi.org/10.1145/3158339
  • Gubbi J, Buyya R, Marusic S, et al. Internet of things (IoT): a vision, architectural elements, and future directions. Future Gener Comput Syst. 2013;29(7):1645–1660. doi: https://doi.org/10.1016/j.future.2013.01.010
  • Baru C, Bhandarkar M, Nambiar R, et al., editors. Setting the direction for big data benchmark standards. Technology Conference on performance Evaluation and Benchmarking. Berlin, Heidelberg: Springer; 2012.
  • Chen L, Gao S, Cao X. Research on real-time outlier detection over big data streams. Int J Comput Appl. 2017: 1–9. DOI:https://doi.org/10.1080/1206212X.2017.1397388.
  • Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. Commun ACM. 2008;51(1):107–113. doi: https://doi.org/10.1145/1327452.1327492
  • Lee C-W, Hsieh K-Y, Hsieh S-Y, et al. A dynamic data placement strategy for hadoop in heterogeneous environments. Big Data Res. 2014;1:14–22. doi: https://doi.org/10.1016/j.bdr.2014.07.002
  • Vance A. Hadoop, a free software program, finds uses beyond search. New York Times. 2009 Mar.
  • Zhu J. Research on data mining of electric power system based on Hadoop cloud computing platform. Int J Comput Appl. 2017: 1–7. doi: https://doi.org/10.1080/1206212X.2017.1402623
  • Hashem IAT, Anuar NB, Gani A, et al. Mapreduce: Review and open challenges. Scientometrics. 2016;109(1):389–422. doi: https://doi.org/10.1007/s11192-016-1945-y
  • Bakry ME, Safwat S, Hegazy O. Big data classification using fuzzy K-nearest neighbor. Int J Comput Appl. 2015;13210:8–13. doi: https://doi.org/10.5120/ijca2015907591
  • Powered by Apache Hadoop: Apache; 2017. Available from: https://wiki.apache.org/hadoop/PoweredBy.
  • Mashayekhy L, Nejad MM, Grosu D, et al. Energy-aware scheduling of mapreduce jobs for big data applications. IEEE Trans Parallel Distrib Syst. 2015;26(10):2720–2733. doi: https://doi.org/10.1109/tpds.2014.2358556
  • Wang W, Zhu K, Ying L, et al. Maptask scheduling in mapreduce with data locality: throughput and heavy-traffic optimality. IEEE/ACM Trans Network. 2016;24(1):190–203. doi: https://doi.org/10.1109/tnet.2014.2362745
  • Kc K, Anyanwu K, editors. Scheduling hadoop jobs to meet deadlines. IEEE Second International Conference on cloud computing Technology and Science; IEEE; Indianapolis, IN, USA; 2010.
  • Yong M, Garegrat N, Mohan S, editors. Towards a resource aware scheduler in hadoop. Proceedings of ICWS; Los Angeles, CA; 2009.
  • Yang C, Lin W, Liu M, editors. A novel triple encryption scheme for hadoop-based cloud data security. Fourth International Conference on Emerging Intelligent Data and Web Technologies; IEEE; Xi'an, China; 2013.
  • Sarvabhatla M, Reddy MCM, Vorugunti CS. A secure and light weight authentication service in Hadoop using one time pad. Procedia Comput Sci. 2015;50:81–86. doi: https://doi.org/10.1016/j.procs.2015.04.064
  • Xie J, Yin S, Ruan X, et al., editors. Improving mapreduce performance through data placement in heterogeneous hadoop clusters. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum; IEEE; 2010.
  • Chen Q, Liu C, Xiao Z. Improving MapReduce performance using smart speculative execution strategy. IEEE Trans Comput. 2014;63(4):954–967. doi: https://doi.org/10.1109/tc.2013.15
  • Gu R, Yang X, Yan J, et al. SHadoop: improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters. J Parallel Distrib Comput. 2014;74(3):2166–2179. doi: https://doi.org/10.1016/j.jpdc.2013.10.003
  • Li H, Groep D, Wolters L, editors. Workload characteristics of a multi-cluster supercomputer. Workshop on Job Scheduling Strategies for Parallel Processing; Springer; 2004.
  • Li H. Workload dynamics on clusters and grids. J Supercomput. 2009;47(1):1–20. doi: https://doi.org/10.1007/s11227-008-0189-x
  • Gunter D, Tierney BL, Brown A, et al., editors. Log summarization and anomaly detection for troubleshooting distributed systems. 8th IEEE/ACM International Conference on grid computing; IEEE; Austin, TX, USA; 2007.
  • Ren K, Kwon Y, Balazinska M, et al. Hadoop's adolescence: an analysis of Hadoop usage in scientific workloads. Proc VLDB Endowm. 2013;6(10):853–864. doi: https://doi.org/10.14778/2536206.2536213
  • Chen Y, Alspaugh S, Katz R. Interactive analytical processing in big data systems: a cross-industry study of mapreduce workloads. Proc VLDB Endowm. 2012;5(12):1802–1813. doi: https://doi.org/10.14778/2367502.2367519
  • Kumar R, Vadhiyar S, editors. Prediction of queue waiting times for metascheduling on parallel batch systems. Workshop on Job Scheduling Strategies for Parallel Processing; Springer; 2014.
  • Smith W, Wong P. Resource selection using execution and queue wait time predictions. NAS Technical Report Number: NAS-02-003; 2002.
  • Smith W, Taylor V, Foster I, editors. Using run-time predictions to estimate queue wait times and improve scheduler performance. Workshop on Job Scheduling Strategies for Parallel Processing. Springer; 1999.
  • Wu, X, Zeng, Y., & Zhao, C. Regression-based execution time prediction in Hadoop environment information. Proceedings of the International Conference on Information Technology and Computer Application Engineering; Hong Kong, China: Informa UK Limited; 2015. p. 623–627.
  • Kavulya S, Tan J, Gandhi R, et al., editors. An analysis of traces from a production mapreduce cluster. Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing; IEEE Computer Society; Washington, DC, USA; 2010.
  • Samak T, Gunter D, Goode M, et al., editors. Failure analysis of distributed scientific workflows executing in the cloud. Proceedings of the 8th International Conference on Network and Service Management. Las Vegas, Nevada: International Federation for Information Processing; 2012.
  • Sahoo RK, Squillante MS, Sivasubramaniam A, et al., editors. Failure data analysis of a large-scale heterogeneous server environment. International Conference on Dependable Systems and Networks; IEEE; Florence, Italy; 2004.
  • Garth BS. A large-scale study of failures in high-performance computing systems. ICDSN. 2006.
  • El-Sayed N, Schroeder B, editors. Reading between the lines of failure logs: understanding how HPC systems fail. 43rd Annual IEEE/IFIP International Conference on Dependable SYSTEMS and Networks; IEEE; Budapest, Hungary; 2013.
  • Oppenheimer D, Ganapathi A, Patterson DA, editors. Why do Internet services fail, and what can be done about it? USENIX symposium on internet technologies and systems. Berkeley, Seattle, WA; 2003.
  • Leangsuksun C, Liu T, Rao T, et al., editors. A failure predictive and policy-based high availability strategy for linux high performance computing cluster. The 5th LCI International Conference on Linux Clusters: The HPC Revolution; Austin, TX; 2004.
  • Fadishei H, Saadatfar H, Deldari H, editors. Job failure prediction in grid environment based on workload characteristics. 14th International CSI Computer Conference; IEEE; Tehran, Iran; 2009.
  • Rosa A, Chen LY, Binder W, editors. Predicting and mitigating jobs failures in big data clusters. 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing; IEEE; Shenzhen, China; 2015.
  • Saadatfar H, Fadishei H, Deldari H. Predicting job failures in AuverGrid based on workload log analysis. New Gener Comput. 2012;30(1):73–94. doi: https://doi.org/10.1007/s00354-012-0105-z
  • Liu C, Han J, Shang Y, et al. Predicting of job failure in compute cloud based on online extreme learning machine: a comparative study. IEEE Access. 2017;5:9359–9368. doi: https://doi.org/10.1109/access.2017.2706740
  • Williams AW, Pertet SM, Narasimhan P, editors. Tiresias: Black-box failure prediction in distributed systems. IEEE International Parallel and Distributed Processing Symposium; IEEE; Rome, Italy; 2007.
  • Hongyan T, Ying L, Jia T, et al., editors. Hunting killer tasks for cloud system through machine learning: A Google cluster Case study. IEEE International Conference on Software Quality, Reliability and Security; IEEE; Vienna, Austria; 2016.
  • CMU. Cloud computer cluster at the parallel data laboratory: CMU; 2013. Available from: https://wiki.pdl.cmu.edu/OpenCloud.
  • Berman O, Ashrafi N. Optimization models for reliability of modular software systems. IEEE Trans Software Eng. 1993;19(11):1119–1123. doi: https://doi.org/10.1109/32.256858
  • Chen X, Lu C-D, Pattabiraman K, editors. Failure analysis of jobs in compute clouds: A google cluster case study. IEEE 25th International Symposium on Software Reliability Engineering; IEEE; Naples, Italy; 2014.
  • Javadi B, Kondo D, Iosup A, et al. The failure trace archive: enabling the comparison of failure measurements and models of distributed systems. J Parallel Distrib Comput. 2013;73(8):1208–1223. doi: https://doi.org/10.1016/j.jpdc.2013.04.002
  • Rosa A, Chen LY, Binder W. Failure analysis and prediction for big-data systems. IEEE Trans Services Comput. 2017;10(6):984–998. doi: https://doi.org/10.1109/TSC.2016.2543718
  • Rakshith C, Kumar PJ, Thorenoor SG. Machine learning application for diagnosis of respiratory disease through pulmonary function test data. Imperial J Interdiscip Res. 2017;3(11):526–529.
  • Hastie T, Tibshirani R, Friedman J. Unsupervised learning. The elements of statistical learning. New York (NY): Springer; 2009. p. 485–585.
  • Hastie T, Friedman J, Tibshirani R. Overview of supervised learning. The elements of statistical learning. New York: Springer; 2001. p. 9–40.
  • Pinto J, Jain P, Kumar T, editors. Hadoop distributed computing clusters for fault prediction. International Computer Science and Engineering Conference; IEEE; Chiang Mai, Thailand; 2016.
  • Ebenezer AS, Rajsingh EB, Kaliaperumal B. A novel proactive health aware fault tolerant (HAFT) scheduler for computational grid based on resource failure data analytics. Int J Comput Appl. 2019;41(5):367–377.
  • Wang Q, Vincent J, King G. Prediction based link state update. Int J Comput Appl. 2007;29(4):379–393.
  • El-Sayed N, Schroeder B. How reliable are large-scale jobs in parallel clusters? CSRG-627. Toronto: University of Toronto; 2015.
  • Islam T, Manivannan D, editors. Predicting application failure in cloud: A machine learning approach. IEEE International Conference on Cognitive Computing; IEEE; Honolulu, HI, USA; 2017.
  • Hongyan T, Ying L, Long W, et al., editors. Predicting misconfiguration-induced unsuccessful executions of jobs in big data system. IEEE 41st Annual Computer Software and Applications Conference; IEEE; Turin, Italy; 2017.
  • Xu X, Zhang Z, Chen Y, et al. HMM-based predictive model for enhancing data quality in WSN. Int J Comput Appl. 2017;39:1–9.
  • Liang Y, Zhang Y, Xiong H, et al., editors. Failure prediction in ibm bluegene/l event logs. Seventh IEEE International Conference on data Mining; IEEE; Omaha, NE, USA; 2007.
  • Hacker TJ, Romero F, Carothers CD. An analysis of clustered failures on large supercomputing systems. J Parallel Distrib Comput. 2009;69(7):652–665. doi: https://doi.org/10.1016/j.jpdc.2009.03.007
  • White T. Hadoop: The definitive guide. Sebastopol (CA): Yahoo Press, O'Reilly Media, Inc.; 2012.
  • Sintay B. Unix timestamp 2013. Available from: http://www.unixtimestamp.com.
  • hla 2013. Available from: ftp://ftp.pdl.cmu.edu/pub/datasets/.
  • Wang N, Yang J, Lu Z, et al. Comparison and improvement of Hadoop MapReduce performance prediction models in the private cloud. Lecture notes in computer science: advances in services computing. Zhangjiajie: Springer International Publishing; 2016. p. 77–91.
  • Lama P, Zhou X, editors. Aroma: Automated resource allocation and configuration of mapreduce environment in the cloud. Proceedings of the 9th International Conference on Autonomic computing; ACM; San Jose, CA, USA; 2012.
  • Babu S, editor Towards automatic optimization of MapReduce programs. Proceedings of the 1st ACM Symposium on Cloud Computing; ACM; Indianapolis, IN, USA; 2010.
  • Friedman N, Geiger D, Goldszmidt M. Bayesian network classifiers. Mach Learn. 1997;29(2-3):131–163. doi: https://doi.org/10.1023/A:1007465528199
  • Agresti A. Logistic regression. Categorical data analysis. Vol. 482. Hoboken (NJ): John Wiley & Sons; 2003. p. 165–210.
  • Shifei D, Bingjuan Q, Hongyan T. An overview on theory and algorithm of support vector machines. J Univ Electron Sci Technol China. 2011;40(1):2–10.
  • Steinberg D, Colla P. CART: classification and regression trees. The top ten algorithms in data mining. Vol. 9. Shan: CRC Press; 2009. p. 179.
  • Quinlan R. Data mining tools See5 and C5. 0. 2003.
  • Bishop CM. Neural networks for pattern recognition. Oxford: Oxford University Press; 1995.
  • Powers DM. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation. J Mach Learn Technol. 2011. doi: 10.1.1.214.9232
  • Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;29(5):1189–1232.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.