Abstract
The article compares the numerical and statistical perspectives on the problem of near-collinearity with a view to investigate whether assigning statistical interpretation to numerical measures changes the original problem in ways that calls into question certain aspects of the current conventional wisdom. The numerical perspective views the problem as stemming from the ill-conditioning of the matrix, irrespective of whether the numbers denote data or not. The statistical perspective frames the problem in terms of sample correlations among regressors (simple and partial). It is argued that this reframing changes the nature of the numerical problem into a problem relating to the probabilistic structure of the Linear Regression model. The disparity between the two perspectives arises because high correlations among regressors is neither necessary nor sufficient for
to be ill-conditioned. Moreover, the sample correlations are highly vulnerable to statistical misspecification. For instance, the presence of mean t-heterogeneity will render all statistical measures of near-collinearity untrustworthy. It is argued that many confusions in the near-collinearity literature arise from erroneously attributing symptoms of statistical misspecification to the presence of near-collinearity when the latter is misdiagnosed using unreliable statistical measures.
Acknowledgments
Thanks are due to Dennis Cook and two anonymous referees for valuable comments and suggestions.