2,881
Views
1
CrossRef citations to date
0
Altmetric
Statistical Computing and Graphics

Improved Approximation and Visualization of the Correlation Matrix

ORCID Icon & ORCID Icon
Pages 432-442 | Received 20 Sep 2022, Accepted 19 Feb 2023, Published online: 11 Apr 2023
 

Abstract

The graphical representation of the correlation matrix by means of different multivariate statistical methods is reviewed, a comparison of the different procedures is presented with the use of an example dataset, and an improved representation with better fit is proposed. Principal component analysis is widely used for making pictures of correlation structure, though as shown a weighted alternating least squares approach that avoids the fitting of the diagonal of the correlation matrix outperforms both principal component analysis and principal factor analysis in approximating a correlation matrix. Weighted alternating least squares is a very strong competitor for principal component analysis, in particular if the correlation matrix is the focus of the study, because it improves the representation of the correlation matrix, often at the expense of only a minor percentage of explained variance for the original data matrix, if the latter is mapped onto the correlation biplot by regression. In this article, we propose to combine weighted alternating least squares with an additive adjustment of the correlation matrix, and this is seen to lead to further improved approximation of the correlation matrix.

4 Supplementary Materials

R-package Correlplot: R-package Correlplot (version 1.0.8) contains code to calculate the different approximations to the correlation matrix and to create the graphics shown in the article. The package contains all datasets used in the article. R-package Correlplot has a vignette containing a detailed example showing how to generate all graphical representations of the correlation matrix (GNU zipped tar file).

Approximations: The file approximations.pdf contains the approximations to the correlation matrix of the Heart attack data. Each table in the supplement gives the sample correlations above the diagonal, and the approximations obtained with a particular method on and/or below the diagonal (PDF file).

Disclosure Statement

The authors report there are no competing interests to declare.

Acknowledgments

Part of this work (Graffelman Citation2022) was presented at the 17th Conference of the International Federation of Classification Societies (IFCS 2022) at the” Fifty years of biplots” session organized by professor Niël le Roux (Stellenbosch University) in Porto, Portugal. We thank two anonymous reviewers whose comments on the manuscript have helped to improve it.

Additional information

Funding

This work was supported by the Spanish Ministry of Science and Innovation and the European Regional Development Fund under grant PID2021-125380OB-I00 (MCIN/AEI/FEDER); and the National Institutes of Health under Grant GM075091.