352
Views
6
CrossRef citations to date
0
Altmetric
Articles

What Causes the Mean Bias of the Likelihood Ratio Statistic with Many Variables?

, &
 

Abstract

Survey data often contain many variables. Structural equation modeling (SEM) is commonly used in analyzing such data. However, conventional SEM methods are not crafted to handle data with a large number of variables (p). A large p can cause Tml, the most widely used likelihood ratio statistic, to depart drastically from the assumed chi-square distribution even with normally distributed data and a relatively large sample size N. A key element affecting this behavior of Tml is its mean bias. The focus of this article is to determine the cause of the bias. To this end, empirical means of Tml via Monte Carlo simulation are used to obtain the empirical bias. The most effective predictors of the mean bias are subsequently identified and their predictive utility examined. The results are further used to predict type I errors of Tml. The article also illustrates how to use the obtained results to determine the required sample size for Tml to behave reasonably well. A real data example is presented to show the effect of the mean bias on model inference as well as how to correct the bias in practice.

Article information

Conflict of Interest Disclosures: Each author signed a form for disclosure of potential conflicts of interest. No authors reported any financial or other conflicts of interest in relation to the work described.

Ethical Principles: The authors affirm having followed professional ethical guidelines in preparing this work. These guidelines include obtaining informed consent from human participants, maintaining ethical treatment and respect for the rights of human or animal participants, and ensuring the privacy of participants and their data, such as ensuring that individual participants cannot be identified in reported results or from publicly available original or archival data.

Funding: This work was supported by Grant SES-1461355 from the National Science Foundation.

Role of the Funders/Sponsors: None of the funders or sponsors of this research had any role in the design and conduct of the study; collection, management, analysis, and interpretation of data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.

Acknowledgments: The authors would like to thank Stephen West, Brenna Gomer and two reviewers for their comments on prior versions of this manuscript. The ideas and opinions expressed herein are those of the authors alone, and endorsement by the authors' institutions or the National Science Foundation is not intended and should not be inferred.

Notes

1 The function h1(x)=log(x) is regarded as a transformation of x with power 0 in the development of Box and Cox (1964).

2 The selected number of best subsets might be arbitrary, but our experience indicates that the additional gain becomes minimal as we select more subsets. Also, best-subset regression becomes less effective with too many variables being included in the following step that involves product terms.

3 The option “model y = v1-v10/selection = maxR; weight w;” under Proc Reg allows us to select the best predictors from v1 to v10 according to weighted least squares.

4 The variables in these subsets are reported in .

5 Note that the dots corresponding to rw1=61.69 and rw1=61.20 are close to overlap in , and so are the two corresponding to rw1=51.73 and rw1=50.91.

6 We use λj,k to represent the factor loading of the jth variable on the kth factor. For example, λ13,3 is the loading of the 13th variable (Straight-Curved Capitals) on the 3rd factor (Speed), including λ13,1 makes variable 13 also load on the 1st factor (Spatial).

7 We used 5 decimals in order to see the change in p-values with different models and different methods of evaluation.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.