Abstract
Regularization methods in linear regression models with manifest variables have been shown to be effective in selecting key predictors from a set of many variables, while improving predictions for novel observations. Regularization methods are particularly attractive for the analysis of complex multidimensional data when theory development is the primary goal; for example when researchers attempt to predict general or specific factors in bifactor models using many potentially relevant predictors. However, applications of regularization methods in such models are still scarce. In a simulation study, we examined the performance of different regularization methods in bifactor-(S-1) models, varying the number of predictors, the correlations with the outcome (effect size), the underlying structure of multicollinearity as well as the sample size, the type of penalty, and a single-step versus a two-step approach. We explore potential caveats in the use of regularization methods in bifactor-(S-1) models, provide practical recommendations, and discuss future directions.
Notes
1 Preliminary analyses have been conducted with the RegSem-package (Jacobucci et al., Citation2016), however multiple problems occurred. First, the computational effort exceeded our expectations especially in conditions with a high number of noise variables rendering the simulation infeasible. Second, many converged models produced theoretically impossible values indicating improper solutions (Rindskopf, Citation1984). Third, after employing a rather simple and somewhat liberal indicator to avoid these theoretically impossible values, close to one third of all models indicated improper solutions. This indicator marked all models in which the sum of the squared regression coefficients (ridge) or the sum of their absolute values (lasso) increased beyond the respective value obtained with ML estimation.
2 We thank an anonymous reviewer for their remark regarding this topic.
3 Calculations were performed under SMP Debian 5.10.113-1 (2022-04-29) x86_64 GNU/Linux running on a machine with an AMD Ryzen Threadripper 3970X 32-Core Processor with 2195 Mhz and 1/16/128 MB L1/L2/L3-Cache respectively as well as a maximum of 126 GiB of RAM.