881
Views
17
CrossRef citations to date
0
Altmetric
Articles

Nested cross-validation with ensemble feature selection and classification model for high-dimensional biological data

, &
Pages 110-125 | Received 10 May 2020, Accepted 09 Nov 2020, Published online: 29 Nov 2020

References

  • Algamal, Z. Y. 2017. An efficient gene selection method for high-dimensional microarray data based on sparse logistic regression. Electronic Journal of Applied Statistical Analysis 10:242–56.
  • Algamal, Z. Y., R. Alhamzawi, and H. T. Mohammad Ali. 2018. Gene selection for microarray gene expression classification using Bayesian Lasso quantile regression. Computers in Biology and Medicine 97:145–52. doi:10.1016/j.compbiomed.2018.04.018.
  • Algamal, Z. Y., and M. H. Lee. 2015a. Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification. Expert Systems with Applications 42 (23):9326–32. doi:10.1016/j.eswa.2015.08.016.
  • Algamal, Z. Y., and M. H. Lee. 2015b. Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional cancer classification. Computers in Biology and Medicine 67:136–45. doi:10.1016/j.compbiomed.2015.10.008.
  • Algamal, Z. Y., and M. H. Lee. 2019. A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification. Advances in Data Analysis and Classification 13 (3):753–71. doi:10.1007/s11634-018-0334-1.
  • Alharthi, A. M., M. H. Lee, and Z. Y. Algamal. 2020. Weighted L1-norm logistic regression for gene selection of microarray gene expression classification. International Journal on Advanced Science Engineering Information Technology 4:2088–5334.
  • Al-Thanoon, N. A., O. S. Qasim, and Z. Y. Algamal. 2018. Tuning parameter estimation in SCAD-support vector machine using firefly algorithm with application in gene selection and cancer classification. Computers in Biology and Medicine 103:262–68. doi:10.1016/j.compbiomed.2018.10.034.
  • Ben-Dor, A., L. Bruhn, N. Friedman, I. Nachman, M. Schummer, and Z. Yakhini. 2000. Tissue classification with gene expression profiles. Journal of Computational Biology: A Journal of Computational Molecular Cell Biology 7 (3-4):559–83. doi:10.1089/106652700750050943.
  • Breiman, L. 1998. Arcing classifier (with discussion and a rejoinder by the author). Annals of Statistics 26 (3):801–49.
  • Breiman, L. 2001. Random forests. Machine Learning 45 (1):5–32. doi:10.1023/A:1010933404324.
  • Brown, G., J. Wyatt, R. Harris, and X. Yao. 2005. Diversity creation methods: a survey and categorization. Information Fusion 6(1):5–20. doi:10.1016/j.inffus.2004.04.004.
  • Chalise, P., and B. Fridley. 2012. Comparison of penalty functions for sparse canonical correlation analysis. Computational Statistics & Data Analysis 56 (2):245–54. doi:10.1016/j.csda.2011.07.012.
  • Cortes, C., and V. Vapnik. 1995. Support-vector networks. Machine Learning 20 (3):273–97. doi:10.1007/BF00994018.
  • Dash, M., and H. Liu. 1997. Feature selection for classification. Intelligent Data Analysis 1 (3):131–56. doi:10.3233/IDA-1997-1302.
  • Dietterich, T. G. 2000. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40 (2):139–57. doi:10.1023/A:1007607513941.
  • Dudoit, S., J. Fridlyand, and T. P. Speed. 2002. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97 (457):77–87. doi:10.1198/016214502753479248.
  • Golub, T. R., D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, et al. 1999. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science (New York, N.Y.) 286 (5439):531–37. doi:10.1126/science.286.5439.531.
  • Hira, Z. M., and D. F. Gillies. 2015. A review of feature selection and feature extraction methods applied on microarray data. Advances in Bioinformatics 2015:198363. doi:10.1155/2015/198363.
  • Holzinger, E. R., S. M. Dudek, A. T. Frase, B. Fridley, P. Chalise, and M. D. Ritchie. 2012. Comparison of methods for meta-dimensional data analysis using in silico and biological data sets. EvoBIO, LNCS 7246:134–43.
  • Jirapech-Umpai, T., and S. Aitken. 2005. Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes. BMC Bioinformatics 6:148. doi:10.1186/1471-2105-6-148.
  • Khoshgoftaar, T. M., D. J. Dittman, R. Wald, and W. Awada. 2013. A review of ensemble classification for DNA microarrays data. Paper presented at 2013 IEEE 25th International Conference on Tools with Artificial Intelligence (ICTAI), Herndon, VA, USA. p. 381–89. doi:10.1109/ICTAI.2013.64.
  • Konishi, H., D. Ichikawa, T. Arita, and E. Otsuji. 2016. Microarray technology and its applications for detecting plasma microRNA biomarkers in digestive tract cancers. Methods in Molecular Biology (Clifton, N.J.) 1368:99–109. doi:10.1007/978-1-4939-3136-1_8.
  • Kuncheva, L. I. 2004. Combining pattern classifiers: Methods and algorithms. Hoboken, NJ: Wiley-Interscience.
  • Kuncheva, L. I., and C. J. Whitaker. 2003. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51 (2):181–207. doi:10.1023/A:1022859003006.
  • Long, P. M., and V. B. Vega. 2003. Boosting and microarray data. Machine Learning 52 (1/2):31–44. doi:10.1023/A:1023937123600.
  • Nguyen, M. H., and F. de la Torre. 2010. Optimal feature selection for support vector machines. Pattern Recognition 43 (3):584–91. doi:10.1016/j.patcog.2009.09.003.
  • Opitz, D., and R. Maclin. 1999. Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research 11:169–98. doi:10.1613/jair.614.
  • Osareh, A., and B. Shadgar. 2013. An efficient ensemble learning method for gene microarray classification. BioMed Research International 2013:10.
  • Polikar, R. 2006. Ensemble based systems in decision making. IEEE Circuits and Systems Magazine 6 (3):21–45. doi:10.1109/MCAS.2006.1688199.
  • Rokach, L. 2010. Ensemble-based classifiers. Artificial Intelligence Review 33 (1-2):1–39. doi:10.1007/s10462-009-9124-7.
  • Saeys, Y., I. Inza, and P. Larrañaga. 2007. A review of feature selection techniques in bioinformatics. Bioinformatics (Oxford, England) 23 (19):2507–17. doi:10.1093/bioinformatics/btm344.
  • Schapire, R. E, Y. Freund, P. Bartlett, and W. S. Lee. 1998. Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics 26 (5):1651–86.
  • Shalev-Shwartz, S., Y. Singer, N. Srebro, and A. Cotter. 2011. Pegasos: Primal estimated sub-gradient solver for SVM. Mathematical Programming 127 (1):3–30. doi:10.1007/s10107-010-0420-4.
  • Singh, D., P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, P. Tamayo, A. A. Renshaw, A. V. D'Amico, J. P. Richie, et al. 2002. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1 (2):203–209. doi:10.1016/S1535-6108(02)00030-2.
  • Strobl, C., A.-L. Boulesteix, T. Kneib, T. Augustin, and A. Zeileis. 2008. Conditional variable importance for random forests. BMC Bioinformatics 9:307. doi:10.1186/1471-2105-9-307.
  • Tan, A. C., and D. Gilbert. 2003. Ensemble machine learning on gene expression data for cancer classification. Applied Bioinformatics 2 (3 Suppl):S75–S83.
  • TCGA Network. 2017. Integrated genomic and molecular characterization of cervical cancer. Nature 543:387–84.
  • Zhang, T. 2004. Solving large scale linear prediction problems using stochastic gradient descent algorithms. Paper presented at Proceedings of the Twenty-First International Conference on Machine Learning, Banff, Alberta, Canada, ACM, 116.
  • Zou, H., and T. Hastie. 2005. Regularization and variable selection via the elastic net (vol B 67, pg 301, 2005). Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (5):768. doi:10.1111/j.1467-9868.2005.00527.x.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.