Constructing Common Factors from Continuous and Categorical Data: Econometric Reviews: Vol 34 , No 6-10

Sample our Economics, Finance,Business & Industry journals, sign in here to start your access, latest two full volumes FREE to you for 14 days

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
Read this article /doi/full/10.1080/07474938.2014.956625?needAccess=true

Abstract

The method of principal components is widely used to estimate common factors in large panels of continuous data. This article first reviews alternative methods that obtain the common factors by solving a Procrustes problem. While these matrix decomposition methods do not specify the probabilistic structure of the data and hence do not permit statistical evaluations of the estimates, they can be extended to analyze categorical data. This involves the additional step of quantifying the ordinal and nominal variables. The article then reviews and explores the numerical properties of these methods. An interesting finding is that the factor space can be quite precisely estimated directly from categorical data without quantification. This may require using a larger number of estimated factors to compensate for the information loss in categorical variables. Separate treatment of categorical and continuous variables may not be necessary if structural interpretation of the factors is not required, such as in forecasting exercises.

Keywords:

Alternating least squares
Factor models
Ordinal data
Principal components

JEL Classification:

ACKNOWLEDGMENTS

I thank Aman Ullah for teaching me econometrics and especially grateful for his guidance and support over the years. Comments from two anonymous referees are greatly appreciated. I also thank Nickolay Trendafilov for helpful comments and discussions.

Notes

¹A 1983 issue of Journal of Econometrics (de Leeuw and Wansbeek editors) was devoted to these methods.

²The focus is rather different from the structural factor analysis considered in Cunha and Heckman (Citation2008) and Almund et al. (Citation2011).

³It is important that the lower bound is attainable and does not depend on x. If the lower bound was 1 instead of 2, no meaning could be attached to x = 3 because the lower bound of 1 is not attainable.

⁴A matrix is sub-orthonormal if it can be made orthonormal by appending rows or columns.

⁵The orthogonal Procrustes problem was solved in Schonmenn (Citation1966). See Gower and Dijksterhuis (Citation2004) for a review for subsequent work.

⁶Suppose that two continuous variables X ₁ and X ₂ are jointly normal with correlation coefficient ρ. The probability that (X ₁ > τ₁, X ₂ > τ₂) is given by

The tetrachoric correlation proposed by Pearson (Citation1900) is the ρ such that p ₁₂(ρ) equals the sample proportion . Polychoric correlations are then generalizations of tetrachoric correlations from two dichotomous indicators to multiple ordered class.

⁷The initial procedure proposed by Takane et al. (Citation1979) and refined by Nevels (Citation1989) both have shortcomings. FACTALS fixes those bugs. Special thanks to H. Kiers for sharing the MATLAB code.

⁸The method has been discovered and rediscovered under different names, including as quantification, multiple correspondence analysis, dual or optimal scaling, and homogeneity analysis. See Tenenhaus and Young (Citation1985) for a synthesis of these procedures. However, none of these methods are familiar to economists.

⁹Olsson et al. (Citation1982) show that ρ_Yz is downward biased for ρ_YZ if Y and Z are jointly normal. The greatest attenuation occurs when there are few categories and the data are opposite skewed. In the special case when consecutive integers are assigned to categories of Y, it can be shown that and φ(·) is the standard normal density and q is the categorization attenuation factor.

For y = X (latent continuous data), x (categorical data), and G (adjacency matrix of indicators), IC _y denotes the number of factors selected by the Bai and Ng (Citation2002) criterion with penalty when the principal components are constructed from data y. AO _y denotes factors determined using the criterion of Onatski (Citation2010). The columns denote the average R ² when each of the factors estimated from y are regressed on the true factors.

¹⁰In an earlier version of the article when x and G were not demeaned, PCA estimated one more factor in both x and G.

For y = X, x, G, Z where Z denotes quantified data, denotes the number of factors estimated by the IC criterion of Bai and Ng (Citation2002) with penalty g ₂. is the average R ² when each of the factors estimated from y is regerssed on all the true factors.

¹¹The FACTALS has convergence problems when N is 100 and the dimension of G is large.

¹²Given weights w ₁,…, w _T and real numbers x ₁,…, x _T, the monotone (isotonic) regression problem finds to minimize subject to the monotonicity condition t ⪯ k implies y _t ≤ y _k where ⪯ is a partial ordering on the index set [1,…T]. An up-and-down-block algorithm is given in Kruskal (Citation1964). See also de Leeuw (Citation2005).

Cunha , F. , Heckman , J. ( 2008 ). Formulating, identifying and estimating the techology of cognitive and noncognitive skill formuation . Journal of Human Resources 43 ( 4 ): 738 – 782 .

Web of Science ®Google Scholar

Almund , M. , Duckworth , A. , Heckman , J. , Kautz , T. ( 2011 ). Personality psychology and economics, NBER working paper 16822 .

Google Scholar

Schonmenn , P. ( 1966 ). A generalized solution of the orthogonal procustes problem . Psychometrika 31 ( 1 ): 1 – 10 .

Web of Science ®Google Scholar

Gower , J. , Dijksterhuis , G. ( 2004 ). Procrustes Problems . Oxford : Oxford University Press .

Google Scholar

Pearson , K. ( 1900 ). On the correlation of charcters not quantitatively measurable . In: Mathematical Contributions to the Theory of Evolution: Philosophical Transactions of the Royal Society of London, Series A. Vol. 195 , pp. 1 – 46 .

Google Scholar

Takane , Y. , Young , F. , de Leeuw , J. ( 1979 ). Nonmetric common factor analysis an alternating least squares method with optimal scaling features . Behaviormetrika 6 : 45 – 56 .

Google Scholar

Nevels , K. ( 1989 ). An improved solution for factals: A nonmetric common factor analysis . Psychometrika 54 ( 3390343 ).

Google Scholar

Tenenhaus , M. , Young , F. ( 1985 ). An analysis and synthesis of multiple correspondence analysis, optimal scaling, dual scaling, homogeneity analysis and other methods for quantifying categorical multivariate data . Psychometrika 50 ( 1 ): 91 – 119 .

Web of Science ®Google Scholar

Olsson , U. , Drasgow , F. , Dorans , N. ( 1982 ). The polyserial correlation coefficient . Psychometrika 47 ( 3 ): 337 – 347 .

Web of Science ®Google Scholar

Bai , J. , Ng , S. ( 2002 ). Determining the number of factors in approximate factor models . Econometrica 70 ( 1 ): 191 – 221 .

Web of Science ®Google Scholar

Onatski , A. ( 2010 ). Determining the number of factors from empirical distribution of eigenvalues . Review of Economics and Statistics 92 ( 4 ): 1004 – 1016 .

Web of Science ®Google Scholar

Bai , J. , Ng , S. ( 2002 ). Determining the number of factors in approximate factor models . Econometrica 70 ( 1 ): 191 – 221 .

Web of Science ®Google Scholar

Kruskal , J. ( 1964 ). Nonmetric multidimensional scaling: A numerical method . Psychometrika 29 : 115 – 129 .

Web of Science ®Google Scholar

de Leeuw , J. ( 2005 ). Monotonic regression . In: Everitt , B. , Howell , D. , ed. Encyclopedia of Statistics in Behavioral Science . Vol. 3 . New York : John Wiley and Sons , pp. 1260 – 1261 .

Google Scholar

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Username Password

Forgot password?

Keep me logged in (not suitable for shared devices).

You will otherwise be logged out automatically, after a limited period, and will need to log in again.

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later Item saved, go to cart

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 61.00 Add to cart

PDF download + Online access - Online Checkout

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 578.00 Add to cart

Issue Purchase - Online Checkout

* Local tax will be added as applicable

Share icon
Back to Top

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

Constructing Common Factors from Continuous and Categorical Data

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Constructing Common Factors from Continuous and Categorical Data

Abstract

ACKNOWLEDGMENTS

Notes

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature