ABSTRACT
This article proposes a new procedure named Max-Eigen difference (MED) for identifying outliers in multivariate data sets. Theoretical aspects of the procedure are briefly discussed. The proposed procedure is compared with the Mahalanobis distance (MD) and robust distance (RD) via two examples. It is indicated that the MED works better than MD and is comparable with RD. Finally, this procedure is applied during constructing a quadratic discriminant analysis which is used to splicing sites prediction for DNA sequences. Through the results of rice and human genome data sets, it can be seen that the robustified discriminant provides higher prediction accuracy than the usual discrimination method.
Acknowledgments
We thank an anonymous referee for useful comments on an earlier version of the article. This work is partly supported by NSFC, grant 10371126.