ABSTRACT
The presence of extreme outliers in the upper tail data of income distribution affects the Pareto tail modeling. A simulation study is carried out to compare the performance of three types of boxplot in the detection of extreme outliers for Pareto data, including standard boxplot, adjusted boxplot and generalized boxplot. It is found that the generalized boxplot is the best method for determining extreme outliers for Pareto distributed data. For the application, the generalized boxplot is utilized for determining the exreme outliers in the upper tail of Malaysian income distribution. In addition, for this data set, the confidence interval method is applied for examining the presence of dragon-kings, extreme outliers which are beyond the Pareto or power-laws distribution.
Acknowledgements
The authors would like to thank the reviewers for their thoughtful comments and efforts towards improving this manuscript. The authors are indebted to the Department of Statistics Malaysia and Bank Data of UKM for providing the Household Income Data (HIS). In addition, special thanks are also given to Prof. Vincenzo Verardi and Christopher Bruffaerts for sharing their R commands.
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes
1. In the literature, the Pareto distribution is also often known as the power-law distribution and Zipf’s law [Citation13,Citation32,Citation38].
2. The shape parameter of the Pareto distribution (α) is also known as the Pareto tail index, Pareto exponent or Pareto coefficient.
3. The shape parameters are α = 1, 2, and 3. According to Brzezinski [Citation13], this range of α covers most Pareto exponents found in the literature.
4. The outliers generated from normal distribution with μ = 10000, 100, and 21.5444; and σ = 1 are equivalent to 99.99% quantile of Pareto distribution with shape parameters of α = 1, 2, and 3.