404
Views
13
CrossRef citations to date
0
Altmetric
Original Articles

Mixed integer linear programming and heuristic methods for feature selection in clustering

ORCID Icon, ORCID Icon & ORCID Icon
Pages 1379-1395 | Received 04 Jul 2016, Accepted 20 Oct 2017, Published online: 05 Jan 2018

References

  • AlBdaiwi, B., Ghosh, D., & Goldengorin, B. (2011). Data aggregation for p-median problems. Journal of Combinatorial Optimization, 21, 348–363.
  • Andrews, J. L., & McNicholas, P. D. (2013). vscc: Variable selection for clustering and classification. R package version 1. Retrieved from http://CRAN.R-project.org/package=vscc
  • Andrews, J. L., & McNicholas, P. M. (2014). Variable selection for clustering and classification. Journal of Classification, 31, 136–153.
  • Avella, P., Boccia, M., Salerno, S., & Vasilyev, I. (2012). An aggregation heuristic for large scale p-median problems. Computers and Operations Research, 39, 1625–1632.
  • Avella, P., Sassano, A., & Vasil’ev, I. (2007). Computational study of large-scale p-median problems. Mathematical Programming, 109, 89–114.
  • Benati, S., & García, S. (2014). A mixed integer linear model for clustering with variable selection. Computers and Operations Research, 43, 280–285.
  • Brusco, B. J. (2004). Clustering binary data in the presence of masking variables. Psychological Methods, 9, 510–523.
  • Caballero, R., Laguna, M., Martí, R., & Molina, J. (2011). Scatter tabu search for multiobjective clustering problems. The Journal of the Operational Research Society, 62, 2034–2046.
  • Carmone, F. J., Kara, A., & Maxwell, S. (1999). HINoV: A new model to improve market segmentation by identifying noisy variables. Journal of Marketing Research, 36, 501–509.
  • Chen, J. S., Ching, R. K. H., & Lin, Y. S. (2004). An extended study of the k-means algorithm for data clustering and its applications. The Journal of the Operational Research Society, 55, 976–987.
  • Church, R. L. (2003). COBRA: A new formulation of the classic p-median location problem. Annals of Operations Research, 122, 103–120.
  • Church, R. L. (2008). BEAMR: An exact and approximate model for the p-median problem. Computers and Operations Research, 35, 417–426.
  • Cornuejols, G., Nemhauser, G., & Wolsey, L. (1980). A canonical representation of simple plant location-problems and its applications. SIAM Journal on Algebraic And Discrete Methods, 1, 261–272.
  • Elloumi, S. (2010). A tighter formulation of the p-median problem. Journal of Combinatorial Optimization, 19, 69–83.
  • Elloumi, S., Labbé, M., & Pochet, Y. (2004). A new formulation and resolution method for the p-center problem. INFORMS Journal on Computing, 16, 84–94.
  • Fraiman, R., Justel, A., & Svarc, M. (2008). Selection of variables for cluster analysis and classification rules. Journal of the American Statistical Association, 103, 1294–1303.
  • Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97, 611–631.
  • Fraley, C., Raftery, A .E., Brendan Murphy, T., & Scrucca, L. (2012). mclust version 4 for R: Normal mixture modeling for model-based clustering, classification, and density estimation (Technical Report No. 597). Department of Statistics, University of Washington.
  • Fowlkes, E. B., Gnanadesikan, R., & Kettering, J. R. (1988). Variable selection in clustering. Journal of Classification, 5, 205–228.
  • Friedman, J., & Meulman, J. (2004). Clustering objects on subsets of attributes. Journal of the Royal Statistical Society. Ser. B, 66, 815–849.
  • García, S., Labbé, M., & Marín, A. (2011). Solving large p-median problems with a radius formulation. INFORMS Journal on Computing, 23, 46–556.
  • García, S., Landete, M., & Marín, A. (2012). New formulation and a branch-and-cut algorithm for the multiple allocation p-hub median problem. European Journal Of Operational Research, 220, 48–57.
  • García-Escudero, L. A., Gordaliza, A., & Matrán, C. (2003). Trimming tools in exploratory data analysis. Journal of Computational and Graphical Statistics, 12, 434–449.
  • Goldengorin, B., & Krushinsky, D. (2011). Complexity evaluation of benchmark instances for the p-median problem. Mathematical and Computer Modeling, 53, 1719–1736.
  • Guyon, I., & Elisseef, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.
  • Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society, Series C, 28, 100–108.
  • Hoking, R. R. (1976). The analysis and selection of variables in linear regression. Biometrics, 32, 1–49.
  • Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
  • Kariv, O., & Hakimi, S. L. (1979). An algorithmic approach to network location problems, part II. The p-medians. SIAM Journal on Applied Mathematics, 37, 539–560.
  • Law, M. H. C., Figuereido, M. A. T., & Jain, A. K. (2004). Simultaneous feature selection and clustering using mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 1154–1166.
  • MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability (pp. 281–297). Berkeley, CA: University of California Press.
  • Marín, A., Nickel, S., Puerto, J., & Velten, S. (2009). A flexible model and efficient solution strategies for discrete location problems. Discrete Applied Mathematics, 157, 1128–1145.
  • McLachlan, G. J., & Krishnan, T. (1997). The EM algorithm and extensions. New York, NY: Wiley.
  • Maldonado, S., Pérez, J., Weber, R., & Labbé, M. (2014). Feature selection for support vector machines via mixed integer linear programming. Information Science, 279, 163–175.
  • Mladenovic, N., Brimberg, J., Hansen, P., & Moreno-Pérez, J. A. (2007). The p-median problem: A survey of metaheuristic approaches. European Journal of Operational Research, 179, 927–939.
  • Morlini, I., & Zani, S. (2013). Variable selection in cluster analysis: An approach based on a new index. (in Giusti A., Ritter G. and Vichi M. - Classification and Data Mining - Springer, Berlin DEU, Studies in Classification, Data Analysis, and Knowledge Organization: 71–79).
  • Qiu, W., & Joe, H. (2006). Generation of random cluster with specified degree of separation. Journal of Classification, 23, 315–334.
  • Qiu, W., & Joe, H. (2013). clusterGeneration: Random Cluster Generation (with specified degree of separation). Retrieved from http://CRAN.R-project.org/package=clusterGeneration
  • Pan, W., & Shen, X. (2007). Penalized model-based clustering with application to variable selection. Journal of Machine Learning Research, 8, 1154–1164.
  • Puerto, J., Ramos, A. B., & Rodríguez-Chía, A. M. (2013). A specialized branch & bound & cut for single-allocation ordered median hub location problems. Discrete Applied Mathematics, 161, 2624–2646.
  • Raftery, A. E., & Dean, N. (2006). Variable selection for model-based clustering. Journal of the American Statistical Association, 101, 168–178.
  • Scrucca, L., Adrian, E., & Raftery, N. D. (2013). clustvarsel: A package implementing variable selection for model-based clustering in R, version 2.0. Retrieved from http://CRAN.R-project.org/package=clustvarsel
  • Steinley, D., & Brusco, M. J. (2008a). A new variable weighting and selection procedure for k-means cluster analysis. Multivariate Behavioral Research, 43, 77–108.
  • Steinley, D., & Brusco, M. J. (2008b). Selection of variables in cluster analysis: an empirical comparison of eight procedures. Psychometrika, 73, 125–144.
  • Tadesse, M. G., Sha, N., & Vannucci, M. (2005). Bayesian variable selection in clustering high-dimensional data. Journal of the American Statistical Association, 100, 602–617.
  • Yang, J., & Olafsson, S. (2009). Near-optimal feature selection for large databases. The Journal of the Operational Research Society, 60, 1045–1055.
  • Witten, D. M., & Tibshirani, R. (2010). A framework for feature selection in clustering. Journal of the American Statistical Association, 105, 713–726.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.