ABSTRACT
We study the problem of merging homogeneous groups of pre-classified observations from a robust perspective motivated by the anti-fraud analysis of international trade data. This problem may be seen as a clustering task which exploits preliminary information on the potential clusters, available in the form of group-wise linear regressions. Robustness is then needed because of the sensitivity of likelihood-based regression methods to deviations from the postulated model. Through simulations run under different contamination scenarios, we assess the impact of outliers both on group-wise regression fitting and on the quality of the final clusters. We also compare alternative robust methods that can be adopted to detect the outliers and thus to clean the data. One major conclusion of our study is that the use of robust procedures for preliminary outlier detection is generally recommended, except perhaps when contamination is weak and the identification of cluster labels is more important than the estimation of group-specific population parameters. We also apply the methodology to find homogeneous groups of transactions in one empirical example that illustrates our motivating anti-fraud framework.
Acknowledgements
The authors are grateful to Domenico Perrotta and Marco Riani both for their specific comments and for broader discussion of the topic addressed in this work. They also thank one referee and Prof. Andrei Volodin for several helpful suggestions on a previous version of this manuscript.
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes
1. The complete description for this CN code is: slag, ash and residues (other than from the manufacture of iron or steel), containing mainly copper.
2. The average number of nominated outliers is given by .