Abstract
Traditional quantile estimators that are based on one or two order statistics are a common way to estimate distribution quantiles based on the given samples. These estimators are robust, but their statistical efficiency is not always good enough. A more efficient alternative is the Harrell-Davis quantile estimator which uses a weighted sum of all order statistics. Whereas this approach provides more accurate estimations for the light-tailed distributions, it’s not robust. To be able to customize the tradeoff between statistical efficiency and robustness, we could consider a trimmed modification of the Harrell-Davis quantile estimator. In this approach, we discard order statistics with low weights according to the highest density interval of the beta distribution.
Acknowledgments
The author thanks Ivan Pashchenko for valuable discussions.
Disclosure statement
The author reports there are no competing interests to declare.
Reference implementation
Here is an R implementation of the suggested estimator:
getBetaHdi <- function(a, b, width) { eps <- 1e-9 if (a < 1 + eps & b < 1 + eps) # Degenerate case return(c(NA, NA)) if (a < 1 + eps & b > 1) # Left border case return(c(0, width)) if (a > 1 & b < 1 + eps) # Right border case return(c(1 - width, 1)) if (width > 1 - eps) return(c(0, 1)) # Middle case mode <- (a - 1)/(a + b - 2) pdf <- function(x) dbeta(x, a, b) l <- uniroot( f = function(x) pdf(x) - pdf(x + width), lower = max(0, mode - width), upper = min(mode, 1 - width), tol = 1e–9 )$root r <- l + width return(c(l, r))}thdquantile <- function(x, probs, width = 1/sqrt(length(x))) sapply(probs, function(p) { n <- length(x) if (n == 0) return(NA) if (n == 1) return(x) x <- sort(x) a <- (n + 1) * p b <- (n + 1) * (1 - p) hdi <- getBetaHdi(a, b, width) hdiCdf <- pbeta(hdi, a, b) cdf <- function(xs) { xs[xs < = hdi[1]] <- hdi[1] xs[xs > = hdi[2]] <- hdi[2] (pbeta(xs, a, b) - hdiCdf[1])/(hdiCdf[2] - hdiCdf[1]) } iL <- floor(hdi[1] * n) iR <- ceiling(hdi[2] * n) cdfs <- cdf(iL:iR/n) W <- tail(cdfs, -1) - head(cdfs, -1) sum(x[(iL + 1):iR] * W)})
An implementation of the original Harrell-Davis quantile estimator could be found in the Hmisc package (see (Harrell Citation2021)).
Notes
1 The source code of all simulations is available on GitHub: https://github.com/AndreyAkinshin/paper-thdqe