Search in:

Econometric Reviews Volume 34, 2015 - Issue 6-10: Special Issue in Honor of Aman Ullah

Submit an article Journal homepage

368

Views

CrossRef citations to date

Altmetric

Original Articles

Constructing Common Factors from Continuous and Categorical Data

Serena Ng Department of Economics, Columbia University, New York, New York, USACorrespondence[email protected]

Pages 1141-1171 | Published online: 03 Sep 2014

Cite this article
https://doi.org/10.1080/07474938.2014.956625
CrossMark

Sample our Economics, Finance,Business & Industry journals, sign in here to start your access, latest two full volumes FREE to you for 14 days

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
Read this article /doi/full/10.1080/07474938.2014.956625?needAccess=true

Abstract

The method of principal components is widely used to estimate common factors in large panels of continuous data. This article first reviews alternative methods that obtain the common factors by solving a Procrustes problem. While these matrix decomposition methods do not specify the probabilistic structure of the data and hence do not permit statistical evaluations of the estimates, they can be extended to analyze categorical data. This involves the additional step of quantifying the ordinal and nominal variables. The article then reviews and explores the numerical properties of these methods. An interesting finding is that the factor space can be quite precisely estimated directly from categorical data without quantification. This may require using a larger number of estimated factors to compensate for the information loss in categorical variables. Separate treatment of categorical and continuous variables may not be necessary if structural interpretation of the factors is not required, such as in forecasting exercises.

Keywords:

Alternating least squares
Factor models
Ordinal data
Principal components

JEL Classification:

ACKNOWLEDGMENTS

I thank Aman Ullah for teaching me econometrics and especially grateful for his guidance and support over the years. Comments from two anonymous referees are greatly appreciated. I also thank Nickolay Trendafilov for helpful comments and discussions.

Notes

¹A 1983 issue of Journal of Econometrics (de Leeuw and Wansbeek editors) was devoted to these methods.

²The focus is rather different from the structural factor analysis considered in Cunha and Heckman (Citation2008) and Almund et al. (Citation2011).

³It is important that the lower bound is attainable and does not depend on x. If the lower bound was 1 instead of 2, no meaning could be attached to x = 3 because the lower bound of 1 is not attainable.

⁴A matrix is sub-orthonormal if it can be made orthonormal by appending rows or columns.

⁵The orthogonal Procrustes problem was solved in Schonmenn (Citation1966). See Gower and Dijksterhuis (Citation2004) for a review for subsequent work.

⁶Suppose that two continuous variables X ₁ and X ₂ are jointly normal with correlation coefficient ρ. The probability that (X ₁ > τ₁, X ₂ > τ₂) is given by

The tetrachoric correlation proposed by Pearson (Citation1900) is the ρ such that p ₁₂(ρ) equals the sample proportion . Polychoric correlations are then generalizations of tetrachoric correlations from two dichotomous indicators to multiple ordered class.

⁷The initial procedure proposed by Takane et al. (Citation1979) and refined by Nevels (Citation1989) both have shortcomings. FACTALS fixes those bugs. Special thanks to H. Kiers for sharing the MATLAB code.

⁸The method has been discovered and rediscovered under different names, including as quantification, multiple correspondence analysis, dual or optimal scaling, and homogeneity analysis. See Tenenhaus and Young (Citation1985) for a synthesis of these procedures. However, none of these methods are familiar to economists.

⁹Olsson et al. (Citation1982) show that ρ_Yz is downward biased for ρ_YZ if Y and Z are jointly normal. The greatest attenuation occurs when there are few categories and the data are opposite skewed. In the special case when consecutive integers are assigned to categories of Y, it can be shown that and φ(·) is the standard normal density and q is the categorization attenuation factor.

For y = X (latent continuous data), x (categorical data), and G (adjacency matrix of indicators), IC _y denotes the number of factors selected by the Bai and Ng (Citation2002) criterion with penalty when the principal components are constructed from data y. AO _y denotes factors determined using the criterion of Onatski (Citation2010). The columns denote the average R ² when each of the factors estimated from y are regressed on the true factors.

¹⁰In an earlier version of the article when x and G were not demeaned, PCA estimated one more factor in both x and G.

For y = X, x, G, Z where Z denotes quantified data, denotes the number of factors estimated by the IC criterion of Bai and Ng (Citation2002) with penalty g ₂. is the average R ² when each of the factors estimated from y is regerssed on all the true factors.

¹¹The FACTALS has convergence problems when N is 100 and the dimension of G is large.

¹²Given weights w ₁,…, w _T and real numbers x ₁,…, x _T, the monotone (isotonic) regression problem finds to minimize subject to the monotonicity condition t ⪯ k implies y _t ≤ y _k where ⪯ is a partial ordering on the index set [1,…T]. An up-and-down-block algorithm is given in Kruskal (Citation1964). See also de Leeuw (Citation2005).

Cunha , F. , Heckman , J. ( 2008 ). Formulating, identifying and estimating the techology of cognitive and noncognitive skill formuation . Journal of Human Resources 43 ( 4 ): 738 – 782 .

Web of Science ®Google Scholar

Almund , M. , Duckworth , A. , Heckman , J. , Kautz , T. ( 2011 ). Personality psychology and economics, NBER working paper 16822 .

Google Scholar

Schonmenn , P. ( 1966 ). A generalized solution of the orthogonal procustes problem . Psychometrika 31 ( 1 ): 1 – 10 .

Web of Science ®Google Scholar

Gower , J. , Dijksterhuis , G. ( 2004 ). Procrustes Problems . Oxford : Oxford University Press .

Google Scholar

Pearson , K. ( 1900 ). On the correlation of charcters not quantitatively measurable . In: Mathematical Contributions to the Theory of Evolution: Philosophical Transactions of the Royal Society of London, Series A. Vol. 195 , pp. 1 – 46 .

Google Scholar

Takane , Y. , Young , F. , de Leeuw , J. ( 1979 ). Nonmetric common factor analysis an alternating least squares method with optimal scaling features . Behaviormetrika 6 : 45 – 56 .

Google Scholar

Nevels , K. ( 1989 ). An improved solution for factals: A nonmetric common factor analysis . Psychometrika 54 ( 3390343 ).

Google Scholar

Tenenhaus , M. , Young , F. ( 1985 ). An analysis and synthesis of multiple correspondence analysis, optimal scaling, dual scaling, homogeneity analysis and other methods for quantifying categorical multivariate data . Psychometrika 50 ( 1 ): 91 – 119 .

Web of Science ®Google Scholar

Olsson , U. , Drasgow , F. , Dorans , N. ( 1982 ). The polyserial correlation coefficient . Psychometrika 47 ( 3 ): 337 – 347 .

Web of Science ®Google Scholar

Bai , J. , Ng , S. ( 2002 ). Determining the number of factors in approximate factor models . Econometrica 70 ( 1 ): 191 – 221 .

Web of Science ®Google Scholar

Onatski , A. ( 2010 ). Determining the number of factors from empirical distribution of eigenvalues . Review of Economics and Statistics 92 ( 4 ): 1004 – 1016 .

Web of Science ®Google Scholar

Bai , J. , Ng , S. ( 2002 ). Determining the number of factors in approximate factor models . Econometrica 70 ( 1 ): 191 – 221 .

Web of Science ®Google Scholar

Kruskal , J. ( 1964 ). Nonmetric multidimensional scaling: A numerical method . Psychometrika 29 : 115 – 129 .

Web of Science ®Google Scholar

de Leeuw , J. ( 2005 ). Monotonic regression . In: Everitt , B. , Howell , D. , ed. Encyclopedia of Statistics in Behavioral Science . Vol. 3 . New York : John Wiley and Sons , pp. 1260 – 1261 .

Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Constructing Common Factors from Continuous and Categorical Data

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Constructing Common Factors from Continuous and Categorical Data

Abstract

ACKNOWLEDGMENTS

Notes

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date