ABSTRACT
Mixed effects regression models are widely used by language researchers. However, these regressions are implemented with an algorithm which may not converge on a solution. While convergence issues in linear mixed effects models can often be addressed with careful experiment design and model building, logistic mixed effects models introduce the possibility of separation or quasi-separation, which can cause problems for model estimation that result in convergence errors or in unreasonable model estimates. These problems cannot be solved by experiment or model design. In this paper, we discuss (quasi-)separation with the language researcher in mind, explaining what it is, how it causes problems for model estimation, and why it can be expected in linguistic datasets. Using real-linguistic datasets, we then show how Bayesian models can be used to overcome convergence issues introduced by quasi-separation, whereas frequentist approaches fail. On the basis of these demonstrations, we advocate for the adoption of Bayesian models as a practical solution to dealing with convergence issues when modeling binary linguistic data.
Acknowledgments
This work was supported by NSF grants BCS-1251343 to Jennifer Cole and Jose I. Hualde, BCS-1349110 and BCS-1431324 to Darren Tanner, and by a grant from the Illinois Campus Research Board (RB14158) to Darren Tanner. AK received financial support from an Illinois Distinguished Fellowship from the University of Illinois. KS received financial support from a Doctoral Fellowship from the Social Sciences and Humanities Research Council of Canada, and from an Illinois Distinguished Fellowship from the University of Illinois. This research was supported by equipment funded from the Office of the Vice-Chancellor of Research at the University of Illinois at Urbana-Champaign to JR. Thanks to Darren Tanner and Jennifer Cole for making their data available.
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes
1. Due to space constraints, we leave out a full technical discussion of what convergence means for these models (and Bayesian models).
2. All items referenced as ‘supplementary materials,’ including .rda files containing full code and summarized data, may be found at //www.doi.org/10.17605/OSF.IO/ZHUJF.
3. This approach requires the analyst to focus on each estimated parameter rather than the traditional approach in some fields for omnibus tests followed by post-hoc tests. Due to space requirements, we do not fully address this, but only choose to say that there are Bayesian forms of this approach which could be implemented, but are themselves controversial (i.e. Bayes Factors). In any case, the approach outlined here moves the researcher away from a p-value only statistical analysis to examining more of the statistical model relevant for a research question.
4. The model failed to converge with a max gradient of 0.006. This is close enough to a 0.002 threshold that allowing the algorithm to estimate for, say, ten times longer might have led to convergence. However, the model specified here took 1 day to run. From a practical standpoint, we believe that advocating for models which would take ten days to run would be unreasonable when there is a solution in the form of Bayesian approaches. We also tried many different optimizers, none of which achieved convergence.