Abstract
In binary regression, the predictor variables may be measured with error. The Berkson case of the errors-in-variables problem is considered, under which the values of the predictor variables are set by the experimenter but not achieved exactly. A particular model for this case is considered, with probit regression and normally distributed errors of observation. The regression parameters for intercept and slope are to be estimated. Two estimators are studied, the maximum likelihood estimator (MLE) and a modification of it. In a simulation study, the modified MLE is shown to improve on the MLE in a situation with substantial measurement error. Quantal bioassay is an important field of application of the Berkson case of the binary regression model. In quantal bioassay, there is a stimulus, perhaps a carcinogen or poison, with doses X to be determined by the experimenter. To address questions on carcinogenicity or toxicity of the stimulus, each experimental animal is assigned a dose of the stimulus, and a binary response such as death or survival is observed. A need for errors-in-variables models could arise in quantal bioassay if there is difficulty in achieving the desired dose. Or, the amount of the injected substance remaining in the bloodstream may be the variable that most directly affects the outcome. This amount could be modeled by a formula such as log(X) + error. When analyzing a data set, it is important to remember that even if there is error in X, an errors-in-variables model may not be needed. Consider the situation in which one can never hope to observe X without error and is only interested in estimating the probability of success for a new observed X measured with error. In this situation, under the model considered in this article, the ordinary binary regression model, assuming no error in X, is all that is required (see Sec. 2 for details). On the other hand, if one is interested in modeling the relationship between the true dose and the outcome or in predicting the outcome given a true dose, then the errors-in-variables model is needed. Madansky (1959) discussed these points for errors-in-variables in linear regression. This article shows that when an errors-in-variables model is appropriate and measurement error is large, the usual probit regression MLE is a poor estimator of the intercept and slope parameters α and β. In particular, this naive MLE is inconsistent. To improve the naive MLE, one must know or estimate in a separate experiment the measurement-error variance; otherwise, the regression parameters are unidentifiable.