Search in:

Advanced search

International Journal of Computers and Applications Volume 44, 2022 - Issue 3

Submit an article Journal homepage

235

Views

CrossRef citations to date

Altmetric

Articles

Job failure prediction in Hadoop based on log file analysis

Ehsan Shirzada Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, IranCorrespondence[email protected]
View further author information

Hamid Saadatfarb Department of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Birjand, Birjand, Iran

https://orcid.org/0000-0002-6130-8450 View further author information

Pages 260-269 | Received 22 Mar 2019, Accepted 14 Feb 2020, Published online: 01 Mar 2020

Cite this article
https://doi.org/10.1080/1206212X.2020.1732081
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

George G, Haas MR, Pentland A. Big data and management. Acad Manage J. 2014;57(2):321–326. doi: https://doi.org/10.5465/amj.2014.4002
Web of Science ®Google Scholar
Birkin M. Big data: big data for social science research. Ubiquity. 2018;2018(January):1–7. doi: https://doi.org/10.1145/3158339
Google Scholar
Gubbi J, Buyya R, Marusic S, et al. Internet of things (IoT): a vision, architectural elements, and future directions. Future Gener Comput Syst. 2013;29(7):1645–1660. doi: https://doi.org/10.1016/j.future.2013.01.010
Web of Science ®Google Scholar
Baru C, Bhandarkar M, Nambiar R, et al., editors. Setting the direction for big data benchmark standards. Technology Conference on performance Evaluation and Benchmarking. Berlin, Heidelberg: Springer; 2012.
Google Scholar
Chen L, Gao S, Cao X. Research on real-time outlier detection over big data streams. Int J Comput Appl. 2017: 1–9. DOI:https://doi.org/10.1080/1206212X.2017.1397388.
Google Scholar
Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. Commun ACM. 2008;51(1):107–113. doi: https://doi.org/10.1145/1327452.1327492
Web of Science ®Google Scholar
Lee C-W, Hsieh K-Y, Hsieh S-Y, et al. A dynamic data placement strategy for hadoop in heterogeneous environments. Big Data Res. 2014;1:14–22. doi: https://doi.org/10.1016/j.bdr.2014.07.002
Google Scholar
Vance A. Hadoop, a free software program, finds uses beyond search. New York Times. 2009 Mar.
Google Scholar
Zhu J. Research on data mining of electric power system based on Hadoop cloud computing platform. Int J Comput Appl. 2017: 1–7. doi: https://doi.org/10.1080/1206212X.2017.1402623
Google Scholar
Hashem IAT, Anuar NB, Gani A, et al. Mapreduce: Review and open challenges. Scientometrics. 2016;109(1):389–422. doi: https://doi.org/10.1007/s11192-016-1945-y
Web of Science ®Google Scholar
Bakry ME, Safwat S, Hegazy O. Big data classification using fuzzy K-nearest neighbor. Int J Comput Appl. 2015;13210:8–13. doi: https://doi.org/10.5120/ijca2015907591
Google Scholar
Powered by Apache Hadoop: Apache; 2017. Available from: https://wiki.apache.org/hadoop/PoweredBy.
Google Scholar
Mashayekhy L, Nejad MM, Grosu D, et al. Energy-aware scheduling of mapreduce jobs for big data applications. IEEE Trans Parallel Distrib Syst. 2015;26(10):2720–2733. doi: https://doi.org/10.1109/tpds.2014.2358556
Web of Science ®Google Scholar
Wang W, Zhu K, Ying L, et al. Maptask scheduling in mapreduce with data locality: throughput and heavy-traffic optimality. IEEE/ACM Trans Network. 2016;24(1):190–203. doi: https://doi.org/10.1109/tnet.2014.2362745
Web of Science ®Google Scholar
Kc K, Anyanwu K, editors. Scheduling hadoop jobs to meet deadlines. IEEE Second International Conference on cloud computing Technology and Science; IEEE; Indianapolis, IN, USA; 2010.
Google Scholar
Yong M, Garegrat N, Mohan S, editors. Towards a resource aware scheduler in hadoop. Proceedings of ICWS; Los Angeles, CA; 2009.
Google Scholar
Yang C, Lin W, Liu M, editors. A novel triple encryption scheme for hadoop-based cloud data security. Fourth International Conference on Emerging Intelligent Data and Web Technologies; IEEE; Xi'an, China; 2013.
Google Scholar
Sarvabhatla M, Reddy MCM, Vorugunti CS. A secure and light weight authentication service in Hadoop using one time pad. Procedia Comput Sci. 2015;50:81–86. doi: https://doi.org/10.1016/j.procs.2015.04.064
Google Scholar
Xie J, Yin S, Ruan X, et al., editors. Improving mapreduce performance through data placement in heterogeneous hadoop clusters. IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum; IEEE; 2010.
Google Scholar
Chen Q, Liu C, Xiao Z. Improving MapReduce performance using smart speculative execution strategy. IEEE Trans Comput. 2014;63(4):954–967. doi: https://doi.org/10.1109/tc.2013.15
Web of Science ®Google Scholar
Gu R, Yang X, Yan J, et al. SHadoop: improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters. J Parallel Distrib Comput. 2014;74(3):2166–2179. doi: https://doi.org/10.1016/j.jpdc.2013.10.003
Web of Science ®Google Scholar
Li H, Groep D, Wolters L, editors. Workload characteristics of a multi-cluster supercomputer. Workshop on Job Scheduling Strategies for Parallel Processing; Springer; 2004.
Google Scholar
Li H. Workload dynamics on clusters and grids. J Supercomput. 2009;47(1):1–20. doi: https://doi.org/10.1007/s11227-008-0189-x
Web of Science ®Google Scholar
Gunter D, Tierney BL, Brown A, et al., editors. Log summarization and anomaly detection for troubleshooting distributed systems. 8th IEEE/ACM International Conference on grid computing; IEEE; Austin, TX, USA; 2007.
Google Scholar
Ren K, Kwon Y, Balazinska M, et al. Hadoop's adolescence: an analysis of Hadoop usage in scientific workloads. Proc VLDB Endowm. 2013;6(10):853–864. doi: https://doi.org/10.14778/2536206.2536213
Google Scholar
Chen Y, Alspaugh S, Katz R. Interactive analytical processing in big data systems: a cross-industry study of mapreduce workloads. Proc VLDB Endowm. 2012;5(12):1802–1813. doi: https://doi.org/10.14778/2367502.2367519
Google Scholar
Kumar R, Vadhiyar S, editors. Prediction of queue waiting times for metascheduling on parallel batch systems. Workshop on Job Scheduling Strategies for Parallel Processing; Springer; 2014.
Google Scholar
Smith W, Wong P. Resource selection using execution and queue wait time predictions. NAS Technical Report Number: NAS-02-003; 2002.
Google Scholar
Smith W, Taylor V, Foster I, editors. Using run-time predictions to estimate queue wait times and improve scheduler performance. Workshop on Job Scheduling Strategies for Parallel Processing. Springer; 1999.
Google Scholar
Wu, X, Zeng, Y., & Zhao, C. Regression-based execution time prediction in Hadoop environment information. Proceedings of the International Conference on Information Technology and Computer Application Engineering; Hong Kong, China: Informa UK Limited; 2015. p. 623–627.
Google Scholar
Kavulya S, Tan J, Gandhi R, et al., editors. An analysis of traces from a production mapreduce cluster. Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing; IEEE Computer Society; Washington, DC, USA; 2010.
Google Scholar
Samak T, Gunter D, Goode M, et al., editors. Failure analysis of distributed scientific workflows executing in the cloud. Proceedings of the 8th International Conference on Network and Service Management. Las Vegas, Nevada: International Federation for Information Processing; 2012.
Google Scholar
Sahoo RK, Squillante MS, Sivasubramaniam A, et al., editors. Failure data analysis of a large-scale heterogeneous server environment. International Conference on Dependable Systems and Networks; IEEE; Florence, Italy; 2004.
Google Scholar
Garth BS. A large-scale study of failures in high-performance computing systems. ICDSN. 2006.
Google Scholar
El-Sayed N, Schroeder B, editors. Reading between the lines of failure logs: understanding how HPC systems fail. 43rd Annual IEEE/IFIP International Conference on Dependable SYSTEMS and Networks; IEEE; Budapest, Hungary; 2013.
Google Scholar
Oppenheimer D, Ganapathi A, Patterson DA, editors. Why do Internet services fail, and what can be done about it? USENIX symposium on internet technologies and systems. Berkeley, Seattle, WA; 2003.
Google Scholar
Leangsuksun C, Liu T, Rao T, et al., editors. A failure predictive and policy-based high availability strategy for linux high performance computing cluster. The 5th LCI International Conference on Linux Clusters: The HPC Revolution; Austin, TX; 2004.
Google Scholar
Fadishei H, Saadatfar H, Deldari H, editors. Job failure prediction in grid environment based on workload characteristics. 14th International CSI Computer Conference; IEEE; Tehran, Iran; 2009.
Google Scholar
Rosa A, Chen LY, Binder W, editors. Predicting and mitigating jobs failures in big data clusters. 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing; IEEE; Shenzhen, China; 2015.
Google Scholar
Saadatfar H, Fadishei H, Deldari H. Predicting job failures in AuverGrid based on workload log analysis. New Gener Comput. 2012;30(1):73–94. doi: https://doi.org/10.1007/s00354-012-0105-z
Web of Science ®Google Scholar
Liu C, Han J, Shang Y, et al. Predicting of job failure in compute cloud based on online extreme learning machine: a comparative study. IEEE Access. 2017;5:9359–9368. doi: https://doi.org/10.1109/access.2017.2706740
Web of Science ®Google Scholar
Williams AW, Pertet SM, Narasimhan P, editors. Tiresias: Black-box failure prediction in distributed systems. IEEE International Parallel and Distributed Processing Symposium; IEEE; Rome, Italy; 2007.
Google Scholar
Hongyan T, Ying L, Jia T, et al., editors. Hunting killer tasks for cloud system through machine learning: A Google cluster Case study. IEEE International Conference on Software Quality, Reliability and Security; IEEE; Vienna, Austria; 2016.
Google Scholar
CMU. Cloud computer cluster at the parallel data laboratory: CMU; 2013. Available from: https://wiki.pdl.cmu.edu/OpenCloud.
Google Scholar
Berman O, Ashrafi N. Optimization models for reliability of modular software systems. IEEE Trans Software Eng. 1993;19(11):1119–1123. doi: https://doi.org/10.1109/32.256858
Web of Science ®Google Scholar
Chen X, Lu C-D, Pattabiraman K, editors. Failure analysis of jobs in compute clouds: A google cluster case study. IEEE 25th International Symposium on Software Reliability Engineering; IEEE; Naples, Italy; 2014.
Google Scholar
Javadi B, Kondo D, Iosup A, et al. The failure trace archive: enabling the comparison of failure measurements and models of distributed systems. J Parallel Distrib Comput. 2013;73(8):1208–1223. doi: https://doi.org/10.1016/j.jpdc.2013.04.002
Web of Science ®Google Scholar
Rosa A, Chen LY, Binder W. Failure analysis and prediction for big-data systems. IEEE Trans Services Comput. 2017;10(6):984–998. doi: https://doi.org/10.1109/TSC.2016.2543718
Web of Science ®Google Scholar
Rakshith C, Kumar PJ, Thorenoor SG. Machine learning application for diagnosis of respiratory disease through pulmonary function test data. Imperial J Interdiscip Res. 2017;3(11):526–529.
Google Scholar
Hastie T, Tibshirani R, Friedman J. Unsupervised learning. The elements of statistical learning. New York (NY): Springer; 2009. p. 485–585.
Google Scholar
Hastie T, Friedman J, Tibshirani R. Overview of supervised learning. The elements of statistical learning. New York: Springer; 2001. p. 9–40.
Google Scholar
Pinto J, Jain P, Kumar T, editors. Hadoop distributed computing clusters for fault prediction. International Computer Science and Engineering Conference; IEEE; Chiang Mai, Thailand; 2016.
Google Scholar
Ebenezer AS, Rajsingh EB, Kaliaperumal B. A novel proactive health aware fault tolerant (HAFT) scheduler for computational grid based on resource failure data analytics. Int J Comput Appl. 2019;41(5):367–377.
Google Scholar
Wang Q, Vincent J, King G. Prediction based link state update. Int J Comput Appl. 2007;29(4):379–393.
Google Scholar
El-Sayed N, Schroeder B. How reliable are large-scale jobs in parallel clusters? CSRG-627. Toronto: University of Toronto; 2015.
Google Scholar
Islam T, Manivannan D, editors. Predicting application failure in cloud: A machine learning approach. IEEE International Conference on Cognitive Computing; IEEE; Honolulu, HI, USA; 2017.
Google Scholar
Hongyan T, Ying L, Long W, et al., editors. Predicting misconfiguration-induced unsuccessful executions of jobs in big data system. IEEE 41st Annual Computer Software and Applications Conference; IEEE; Turin, Italy; 2017.
Google Scholar
Xu X, Zhang Z, Chen Y, et al. HMM-based predictive model for enhancing data quality in WSN. Int J Comput Appl. 2017;39:1–9.
Google Scholar
Liang Y, Zhang Y, Xiong H, et al., editors. Failure prediction in ibm bluegene/l event logs. Seventh IEEE International Conference on data Mining; IEEE; Omaha, NE, USA; 2007.
Google Scholar
Hacker TJ, Romero F, Carothers CD. An analysis of clustered failures on large supercomputing systems. J Parallel Distrib Comput. 2009;69(7):652–665. doi: https://doi.org/10.1016/j.jpdc.2009.03.007
Web of Science ®Google Scholar
White T. Hadoop: The definitive guide. Sebastopol (CA): Yahoo Press, O'Reilly Media, Inc.; 2012.
Google Scholar
Sintay B. Unix timestamp 2013. Available from: http://www.unixtimestamp.com.
Google Scholar
hla 2013. Available from: ftp://ftp.pdl.cmu.edu/pub/datasets/.
Google Scholar
Wang N, Yang J, Lu Z, et al. Comparison and improvement of Hadoop MapReduce performance prediction models in the private cloud. Lecture notes in computer science: advances in services computing. Zhangjiajie: Springer International Publishing; 2016. p. 77–91.
Google Scholar
Lama P, Zhou X, editors. Aroma: Automated resource allocation and configuration of mapreduce environment in the cloud. Proceedings of the 9th International Conference on Autonomic computing; ACM; San Jose, CA, USA; 2012.
Google Scholar
Babu S, editor Towards automatic optimization of MapReduce programs. Proceedings of the 1st ACM Symposium on Cloud Computing; ACM; Indianapolis, IN, USA; 2010.
Google Scholar
Friedman N, Geiger D, Goldszmidt M. Bayesian network classifiers. Mach Learn. 1997;29(2-3):131–163. doi: https://doi.org/10.1023/A:1007465528199
Web of Science ®Google Scholar
Agresti A. Logistic regression. Categorical data analysis. Vol. 482. Hoboken (NJ): John Wiley & Sons; 2003. p. 165–210.
Google Scholar
Shifei D, Bingjuan Q, Hongyan T. An overview on theory and algorithm of support vector machines. J Univ Electron Sci Technol China. 2011;40(1):2–10.
Google Scholar
Steinberg D, Colla P. CART: classification and regression trees. The top ten algorithms in data mining. Vol. 9. Shan: CRC Press; 2009. p. 179.
Google Scholar
Quinlan R. Data mining tools See5 and C5. 0. 2003.
Google Scholar
Bishop CM. Neural networks for pattern recognition. Oxford: Oxford University Press; 1995.
Google Scholar
Powers DM. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation. J Mach Learn Technol. 2011. doi: 10.1.1.214.9232
Google Scholar
Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;29(5):1189–1232.
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Job failure prediction in Hadoop based on log file analysis

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Job failure prediction in Hadoop based on log file analysis

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date