80
Views
1
CrossRef citations to date
0
Altmetric
Research Article

A classification-based fuzzy-rules proxy model to assist in the full model selection problem in high volume datasets

ORCID Icon &
Pages 815-844 | Received 11 Oct 2019, Accepted 13 Feb 2021, Published online: 18 Jun 2021
 

ABSTRACT

Improvement of accuracy in classifiers is a crucial topic in the machine learning field. The problem has been addressed, making new algorithms and selecting the fittest classifier for a given dataset. The latter approach combined with feature selection and pre-processing form up a new paradigm known as Full Model Selection. This paradigm is like a black box whose input is a dataset, and as an output, a precise classification model is obtained. Despite that, full model selection is not the first alternative with the larger datasets of nowadays. We propose the use of MapReduce to deal with huge datasets, a bio-inspired optimisation algorithm and the use of a novel algorithm based on fuzzy classification rules as a proxy model to guide the optimisation process. To the best of our knowledge, this work is the first to propose a classification algorithm based on fuzzy rules as a proxy model. Obtained results showed an accuracy improvement and a considerable reduction of the computing time in datasets of a wide range of sizes.

Acknowledgments

The authors are grateful to Mrs. Lynn Morales and Mr. Blas Morales for their invaluable help in reviewing the manuscript.

Disclosure of potential conflicts of interest

No potential conflict of interest was reported by the author(s).

Notes

1 Proxy models are a computationally inexpensive alternative to a full numerical simulation and can be defined as a mathematically, statistically, or data-driven model that replicates the simulation model output for selected input parameters (Alenezi & Mohaghegh, Citation2016)

2 Gaussian process is a probabilistic, non-parametric model with uncertainty predictions. It can be used for the modeling of complex, non-linear systems. The output of the GP is a normal distribution expressed in terms of mean and variance (Borgelt et al. Citation2012).

Additional information

Funding

The first author is grateful for the support from CONACyT scholarship number 391898.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 373.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.