Abstract
Estimation of the multivariate dispersion matrix with incomplete data is problematic and often the estimate is not at least positive semidefinite. Three procedures—the EM algorithm, smoothing and use of the complete data vectors only—guarantee that the estimator is at least positive semidefinite. Monte Carlo simulations were used to compare the accuracy of these three procedures, as measured by the average scaled absolute deviation (SD) between estimated and actual values of the elements of the dispersion matrix. In general, the smoothing procedure was more accurate than the EM algorithm with smaller correlations; with larger correlations, the EM algorithm outperformed smoothing. Use of complete data vectors only was, in general, less accurate than the other two methods.