Abstract
The first step in statistical analysis is the parameter estimation. In multivariate analysis, one of the parameters of interest to be estimated is the mean vector. In multivariate statistical analysis, it is usually assumed that the data come from a multivariate normal distribution. In this situation, the maximum likelihood estimator (MLE), that is, the sample mean vector, is the best estimator. However, when outliers exist in the data, the use of sample mean vector will result in poor estimation. So, other estimators which are robust to the existence of outliers should be used. The most popular robust multivariate estimator for estimating the mean vector is S-estimator with desirable properties. However, computing this estimator requires the use of a robust estimate of mean vector as a starting point. Usually minimum volume ellipsoid (MVE) is used as a starting point in computing S-estimator. For high-dimensional data computing, the MVE takes too much time. In some cases, this time is so large that the existing computers cannot perform the computation. In addition to the computation time, for high-dimensional data set the MVE method is not precise. In this paper, a robust starting point for S-estimator based on robust clustering is proposed which could be used for estimating the mean vector of the high-dimensional data. The performance of the proposed estimator in the presence of outliers is studied and the results indicate that the proposed estimator performs precisely and much better than some of the existing robust estimators for high-dimensional data.
Acknowledgements
The authors are grateful to the referees and the editor for helpful comments that led to substantial improvement in the paper.
Disclosure statement
No potential conflict of interest was reported by the authors.