ABSTRACT
In this article, we propose a new framework for matrix factorization based on principal component analysis (PCA) where sparsity is imposed. The structure to impose sparsity is defined in terms of groups of correlated variables found in correlation matrices or maps. The framework is based on three new contributions: an algorithm to identify the groups of variables in correlation maps, a visualization for the resulting groups, and a matrix factorization. Together with a method to compute correlation maps with minimum noise level, referred to as missing-data for exploratory data analysis (MEDA), these three contributions constitute a complete matrix factorization framework. Two real examples are used to illustrate the approach and compare it with PCA, sparse PCA, and structured sparse PCA. Supplementary materials for this article are available online.
Supplementary Materials
Supplementary materials contain a description of the Group Identification Algorithm, additional information on the experiments of the paper and the code for reproducibility of results.
Acknowledgments
José Camacho designed and programmed the GIA and GPCA algorithms, and performed and discussed the analysis of the Network Security dataset. Rafael A. Rodríguez-Gómez designed and programmed the Treemap Visualization. Edoardo Saccenti performed and discussed the analysis of the Metabolomic dataset. The authors thank Alejandro Pérez-Villegas for his help in the development of some of the figures. Anonymous reviewers are acknowledged for their useful comments. This work is partly supported by the Spanish Ministry of Economy and Competitiveness and FEDER funds through project TIN2014-60346-R and by the European Commission fundedFP7 project INFECT (contract no. 305340).