Full article: Robust oracle estimation and uncertainty quantification for possibly sparse quantiles

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

A general many quantiles + noise model is studied in the robust formulation (allowing non-normal, non-independent observations), where the identifiability requirement for the noise is formulated in terms of quantiles rather than the traditional zero expectation assumption. We propose a penalisation method based on the quantile loss function with appropriately chosen penalty function making inference on possibly sparse high-dimensional quantile vector. We apply a local approach to address the optimality by comparing procedures to the oracle sparsity structure. We establish that the proposed procedure mimics the oracle in the problems of estimation and uncertainty quantification (under the so-called EBR condition). Adaptive minimax results over sparsity scale follow from our local results.

KEYWORDS:

1. Introduction

Since the emergence of statistical science, the classical ‘signal+noise’ paradigm for observed data is the main testbed in theoretical statistics for a vast number of methods and techniques for various inference problems and in various optimality frameworks. Many other practically important situations can be reduced to or approximated by the ‘signal+noise’ setting, capturing the statistical essence of the original (typically, more complex) model and preserving its main features in a pure form. There is a huge literature on ‘signal+noise’ setting with big variety of combinations of the main ingredients in the study. We mention the following ingredients: assumptions on the observation model (moment conditions, independence, normality, etc.); applied methodology (LSE, penalisation, shrinkage, thresholding, (empirical) Bayesian approach, projection, FDR method); studied inference problem (estimation, detection, model selection, testing, posterior contraction, uncertainty quantification, structure recovery); structural assumptions (smoothness, sparsity, clustering, shape constraints, such as monotonicity, unimodality and convexity, loss functions (expectations of powers of $ℓ_{2}$ -norm, $ℓ_{q}$ -norm; Hamming loss, etc.); optimality framework and pursued criteria (minimax rate, oracle rate, control of FDR, FNR, expectation of the loss function, exponential bounds for the large deviations of the loss functions, etc.). A small sample of relevant literature includes (Donoho, Johnstone, Hoch, and Stern Citation1992; Donoho and Johnstone Citation1994; Benjamini and Hochberg Citation1995; Birgé and Massart Citation2001; Johnstone and Silverman Citation2004; Baraud Citation2004; Abramovich, Benjamini, Donoho, and Johnstone Citation2006; Efron Citation2008; Babenko and Belitser Citation2010; Belloni, Chernozhukov, and Wang Citation2011; Castillo and van der Vaart Citation2012; Martin and Walker Citation2014; Johnstone Citation2017; Belitser Citation2017; Bellec Citation2018; Butucea, Ndaoud, Stepanova, and Tsybakov Citation2018; Belitser and Nurushev Citation2020).

Suppose we observe $X = (X_{1}, \dots, X_{n}) \sim P_{θ}^{n}$ , where $θ = (θ_{1}, \dots, θ_{n}) \in R^{n}$ is the parameter of interest. We shall suppress n from the notation to simply write $P_{θ}$ for $P_{θ}^{n}$ . A natural example is given by $X_{i} \overset{ind}{\sim} N (θ_{i}, 1)$ , $i = 1, \dots, n$ , but this specific distributional assumption on the data generating process will not be imposed. The main modelling assumption in this note is that, for some fixed level $τ \in (0, 1)$ , $P (X_{i} - θ_{i} \leq 0) = τ, i = 1, \dots, n .$ In other words, we deal with the setting many quantiles + noise, as $θ_{i}$ 's are the τ-quantiles of the observed $X_{i}$ 's and the goal is to make an inference on the high-dimensional parameter θ. One can study and make inference on several quantiles simultaneously across different levels $τ \in (0, 1)$ . While the mean is a commonly used measure of centrality, it is heavily influenced by outliers or extreme values in the dataset. In contrast, quantiles are robust to outliers and provide information about the spread of the data as well as the location of the central values, cf., Zou and Yuan (Citation2008), Jiang, Wang, and Bondell (Citation2013) and Jiang, Bondell, and Wang (Citation2014). The setting ‘many quantiles + noise’ with sparse quantile vectors arises in statistical and machine learning applications where the goal is to estimate a large number of quantiles for a high-dimensional dataset. The number of quantiles to be estimated can be very large, while the data may be sparse, meaning that only a small fraction of the entries in the data are non-zero. In such applications, quantiles may be a more sensible object of study, especially when the distribution is skewed or contains outliers. In this setting, the interest of study focuses on the effect of the parameter values on the tails of a distribution of observations, in addition to, or instead of, the centre. For example, in finance, quantiles such as the value at risk (the value at risk in finance has the meaning of quantile) or expected shortfall are commonly used to estimate the potential losses of a portfolio of investments, rather than relying solely on the mean return. Estimating a large number of quantiles is important for risk management and portfolio optimisation, as it provides a more complete picture of the potential losses or gains of a portfolio. In genomics, estimating many quantiles can help identify genes or markers that are associated with disease. In image processing, estimating quantiles can help identify regions of interest or anomalies in images. Quantiles as objects of interest occur in diverse areas, including economics, biology, meteorology; cf. Santosa and Symes (Citation1986), Abrevaya (Citation2002), Koenker (Citation2005), Belloni and Chernozhukov (Citation2011), Hulmán et al. (Citation2015), Gabriela Ciuperca (Citation2018) and Belloni, Chernozhukov, and Kato (Citation2019).

Besides quantile formulation, the next distinctive feature of our study is the robust formulation in the sense that we do not assume any particular form of the distribution of the observed data. Only a mild condition (Condition C1) is imposed. The observations do not need to be normal and, in fact no specific distribution is assumed, the distribution of the ‘noise’ terms $X_{i} - θ_{i}$ , $i = 1, \dots, n$ , may depend on θ and do not even have to be independent. This makes the applicability scope of our results rather broad.

The third aspect of our study is the local approach. It is in general impossible to make sensible inference on a high-dimensional parameter without any additional structure, either in terms of assumptions on parameter, or on the observation model, or both. In this paper, we are concerned with sparsity structure, various version of which have been predominantly studied in the literature recently. Sparsity structure means that a relative majority of the parameter entries are all equal to some fixed value (typically zero). In the local approach, we address optimality by comparing procedures to the oracle sparsity structure (think of an ‘oracle observer’ that knows the best sparsity pattern of the actual θ). When applying the local approach, the idea is not to rely on sparsity as such, but rather extract all the sparsity structure present in the data, and utilise it in the aimed inference problems. The claim that a procedure performs optimally in the local sense means that it attains the oracle quality, i.e. mimics the best sparsity structure pertinent to the true parameter, whichever it is. The local result imply minimax optimality over all scales (including the traditional sparsity class $ℓ_{0} [s]$ ) that satisfy a certain condition.

The final feature of our study concerns the problem of uncertainty quantification. In recent years, focus in nonparametric statistics has shifted from point estimation to uncertainty quantification, a more challenging problem, much less results on this topic are available in the literature; cf. Szabó, van der Vaart, and van Zanten (Citation2015), Belitser (Citation2017), van der Pas, Szabó, and van der Vaart (Citation2017) and Belitser and Nurushev (Citation2020). Certain negative results (cf. Li Citation1989; Baraud Citation2004; Belitser and Nurushev Citation2019 and the references therein) show that in general a confidence set cannot simultaneously have coverage and the optimal size uniformly over all parameter values. This makes the construction of the so-called ‘honest’ confidence sets impossible, and a strategy recently pursued in the literature is to discard a set of ‘deceptive parameters’ to ensure coverage at the remaining parameter values, while maintaining the optimal size uniformly over the whole set. In an increasing order of generality, removing deceptive parameters is expressed by imposing the conditions of self-similarity, polished tail and excessive bias restriction (EBR), see Szabó et al. (Citation2015), Belitser (Citation2017), van der Pas et al. (Citation2017) and Belitser and Nurushev (Citation2020). All the above papers deal with $ℓ_{2}$ -loss framework for Gaussian observations in ‘many means + noise’ setting (a more general case is considered in Belitser and Nurushev Citation2020).

In this paper, we work with a new setting ‘many quantiles + general noise’, and use the quantile loss rather than the traditional $ℓ_{2}$ -loss. The quantile loss function has certain properties of usual loss functions, but it is not symmetric, the peculiarity essentially characterising the notion of quantile. We propose and exploit a new version of EBR condition expressed in terms of quantile loss. To the best of our knowledge, there are no results on uncertainty quantification neither for the quantile loss function, nor for the $ℓ_{1}$ -loss, also neither in local, nor global (minimax) formulation in this general robust setting. In this respect, our study presents the first step in this direction.

We finally summarise the main contributions of this paper.

For our new (robust) setting ‘many quantiles + general noise’ we propose a penalisation procedure for selecting the sparsity pattern, based on the quantile loss (as counterpart of the $ℓ_{2}$ -loss) and the corresponding penalty term.
The estimated sparsity pattern is next used for the construction of an estimator of θ and a confidence set for θ, again in terms of the quantile loss function.
Two theorems establish the local (oracle) optimality of the estimator and confidence set (under the EBR condition), respectively.
We provide an elegant and relatively short proofs of the main results; compare with the relatively laborious proofs of related results in Szabó et al. (Citation2015), Belitser (Citation2017), van der Pas et al. (Citation2017) and Belitser and Nurushev (Citation2020).
The obtained results are robust and local in the sense explained above.
The obtained local results imply adaptive minimax optimality for scales of classes that satisfy certain relation (in particular, for the traditional sparsity scale).

Organisation of the rest of the paper is as follows. In Section 2, we give a robust formulation of the observation model in the ‘many quantiles + general noise’ setting, and introduce some notations and preliminaries. In Section 3, we present the main results of the paper and discuss their consequences for the optimality in the minimax sense for the traditional sparsity scale. Section 4 contains a small simulation study. The proofs of the theorems are provided in Section 5.

2. Robust model formulation and preliminaries

We can formally rewrite the model stated in the introduction in the familiar ‘signal+noise’ form: (1) $X_{i} = θ_{i} + ξ_{i}, P_{θ} (ξ_{i} \leq 0) = τ, i \in [n] = {1, \dots, n} .$ (1) where $ξ_{i} = X_{i} - θ_{i}$ can be thought of as ‘noise’. Note that all the quantities depend of course on τ. Precisely, for $τ \in (0, 1)$ , in the model (Equation1(1) $X_{i} = θ_{i} + ξ_{i}, P_{θ} (ξ_{i} \leq 0) = τ, i \in [n] = {1, \dots, n} .$ (1) ) we have $θ = θ (τ)$ , $ξ = ξ (τ)$ with $P_{θ} (ξ_{i} (τ) \leq 0) = τ$ , $i = 1, \dots, n$ . Not to overload the notation, we suppress the dependence on τ in the sequel. As we mentioned in the Introduction, we study robust setting in the sense that we do not assume any particular distribution of $ξ = (ξ_{1}, \dots, ξ_{n})$ . In fact, the $ξ_{i}$ 's do not have to be identically distributed, their distribution may depend on θ and they do not even have to be independent.

Introduce the following asymmetric absolute deviation function: $ρ (x) = ρ_{τ} (x) = ρ_{τ, m} (x) = \sum_{i = 1}^{m} x_{i} (τ - 1 {x_{i} \leq 0}), x \in R^{m}, m \in N,$ which we will also call quantile loss function, $1 {E}$ denotes the indicator of event E.

Slightly abusing notation, we use the same notation $ρ (x)$ for any $m \in N$ , for example, most of the time it will be either m = n or m = 1, depending on the dimensionality of the argument of the function. This will always be clear from the context. Often, we will suppress the dependence of ρ on a fixed $τ \in (0, 1)$ , unless emphasising this dependence where it is relevant.

Remark 2.1

The origin of the quantile loss function $ρ_{τ} (x)$ , $x \in R$ , is well explained in the book (Koenker Citation2005). The function $ρ_{τ} (x)$ characterises the τ-quantile of a random variable Z in the sense that the τ-quantile $θ_{τ}$ of Z is known to minimise the criterion function $E ρ_{τ} (Z - ϑ)$ with respect to ϑ, i.e. $\arg min_{ϑ \in R} E ρ_{τ} (Z - ϑ) = θ_{τ}$ . Basically, this function plays the same role for characterising the quantiles as the quadratic function in characterising the expectation of a random variable.

The quantile loss function ρ can be related to the $ℓ_{1}$ -criterion $| x | = \sum_{i = 1}^{m} | x_{i} |$ , $x = (x_{1}, \dots, x_{m}) \in R^{m}$ : for any $τ \in (0, 1)$ , (2) $(1 - c_{τ}) | x | \leq ρ_{τ} (x) \leq c_{τ} | x |, with c_{τ} = max {τ, 1 - τ} .$ (2) In particular, it follows that $ρ_{1 / 2} (x) = | x | / 2$ . Another useful property of the quantile loss function ρ to be used later on is that, for any $x \in R^{m}$ , (3) $ρ_{τ} (- x) \leq C_{τ} ρ_{τ} (x), with C_{τ} = max {\frac{1 - τ}{τ}, \frac{τ}{1 - τ}} = \frac{c_{τ}}{1 - c_{τ}} .$ (3)

Remark 2.2

The above quantile loss function evaluated at the difference $ρ (x - y)$ , $x, y \in R^{m}$ , possesses all the properties of a metric $d (x, y)$ except for the symmetry. In particular, $ρ (0) = 0$ (zero at zero); if $x \neq y$ , $ρ (x - y) > 0$ (positivity); and finally the triangle inequality holds (4) $ρ (x + y) \leq ρ (x) + ρ (y), x, y \in R^{m} .$ (4)

Since we have as many parameters as observations, it is typically impossible to make inference on θ even in a weak sense unless the data possesses some structure. Here, we work with the structural assumption that θ is (possibly) a sparse vector. Specifically, we assume that $θ \in L_{I}$ , where $I \subseteq [n] = {1, \dots, n}$ and $L_{I} = {x \in R^{n} : x_{i} = 0, i \in I^{c}}$ , where $I^{c} = [n] ∖ I$ . The structure studied here is the unknown ‘true’ sparsity pattern $I_{0} = I_{0} (θ) \subseteq [n]$ , that is, $θ \in L_{I_{0}}$ and $I_{0}$ is the ‘minimal’ sparsity structure in the sense that $I_{0} = \arg min_{I \subseteq [n] : L_{I} ∋ θ} \sum_{i \in I} i$ .

Introduce some further notation: let $I$ denote the family of all subsets of $[n]$ ; for $x \in R^{n}$ , denote its $ℓ_{1}$ -norm by $| x | = \sum_{i = 1}^{n} | x_{i} |$ ; $| S |$ denote the cardinality of a set S. Further let us introduce the so-called ‘quantile projection’ $P_{I}$ onto $L_{I}$ (called just projection in what follows), with respect to the quantile loss function $ρ (x - y)$ . For $x \in R^{n}$ , define $P_{I} x$ as follows: $inf_{y \in L_{I}} ρ (x - y) = ρ (x - P_{I} x) .$ In view of the properties of the quantile loss function ρ given by Remark 2.2, the projection $P_{I}$ is readily found as the following linear operator: $P_{I} x = (x_{i} 1 {i \in I}, i = 1, \dots, n) \in R^{n} .$ We will use later a certain monotonicity property of the quantile projection operator. (5) $If I_{1} \subseteq I_{2}, then ρ (P_{I_{1}} x) \leq ρ (P_{I_{2}} x), x \in R^{n} .$ (5) For $s \geq 0$ and $I \in I$ , denote (with the convention $0 \log (a / 0) = 0$ for a>0) $λ (s) = s \log (e n / s), p (I) = {(| I | λ (| I |))}^{1 / 2} = | I | {[\log (e n / | I |)]}^{1 / 2} .$ We have that $λ (s) \geq s$ for $s \in [0, n]$ , and besides, since $λ (s)$ is increasing in $s \in (0, n]$ , $λ (s) \geq 1 + \log (n)$ for all $s \in [n]$ . The function $p (I)$ has a meaning of complexity of structure I.

The following relation will be used later: for any $ν > 1$ , (6) $\sum_{I \in I} e^{- ν λ (| I |)} = 1 + \sum_{s \in [n]} \sum_{I : | I | = s} e^{- ν λ (s)} \leq 1 + \sum_{s \in [n]} e^{- (ν - 1) s} \leq C_{ν},$ (6) with $C_{ν} = 1 + (e^{ν - 1} - 1)^{- 1} < \infty$ .

The following conditions on $ξ = X - θ$ is assumed throughout.

Condition C1. For some $H_{0}, M_{0} > 0$ and some positive function $ψ (u)$ monotonically increasing to infinity as $u \to \infty$ , and all $I \in I$ , $I \neq \emptyset$ , all $M \geq 0$ , (7) $sup_{θ \in R^{n}} P_{θ} (ρ (P_{I} ξ) > M p (I)) \leq H_{0} e^{- (ψ (M) - M_{0}) λ (| I |)} .$ (7) Notice that Condition C1 is trivially fulfilled for $I = \emptyset$ with zero right-hand side. Let $P = P (H_{0}, M_{0}, ψ)$ stand for the family of distributions $P_{θ}$ satisfying Condition C1.

As $λ (s) \geq 1 + \log (n)$ for all $s \in [n]$ , the right-hand side of (Equation7(7) $sup_{θ \in R^{n}} P_{θ} (ρ (P_{I} ξ) > M p (I)) \leq H_{0} e^{- (ψ (M) - M_{0}) λ (| I |)} .$ (7) ) is further bounded by $H_{0} e^{- (ψ (M) - M_{0}) λ (1)} \leq H_{0} n^{- (ψ (M) - M_{0})}$ , which we will use in the proof.

Remark 2.3

Condition C1 is mild, for instance, it holds for independent sub-gaussian $ξ_{i}$ 's. Recall one of the equivalent definitions of sub-gaussianity (see Vershynin Citation2018): a random variable W is called σ- sub-gaussian for $σ > 0$ if for some $c_{0} > 0$ $E e^{c_{0} W^{2} / σ^{2}} \leq 2$ . In our case, $σ = 1$ . For example, for $ξ_{i} \overset{ind}{\sim} N (0, 1)$ , $c_{0} = 3 / 8$ (see Section 4). Recall that $‖ x ‖^{2} = \sum_{i = 1}^{n} x_{i}^{2}$ denotes the usual squared $ℓ_{2}$ -norm of $x \in R^{n}$ . Now, since $P_{I} ξ$ has at most $| I |$ non-zero entries, by (Equation2(2) $(1 - c_{τ}) | x | \leq ρ_{τ} (x) \leq c_{τ} | x |, with c_{τ} = max {τ, 1 - τ} .$ (2) ) and the Cauchy–Schwartz inequality we have $ρ (P_{I} ξ) \leq c_{τ} | P_{I} ξ | \leq c_{τ} | I |^{1 / 2} ‖ P_{I} ξ ‖$ (this also follows from the following inequality between two different $ℓ_{q}$ -norms: for $x \in R^{n}$ and $0 < q_{1} < q_{2} \leq \infty$ , $‖ x ‖_{q_{2}} \leq ‖ x ‖_{q_{1}} \leq n^{1 / q_{1} - 1 / q_{2}} ‖ x ‖_{q_{2}}$ ). Recall also that $λ (s) \geq s$ , $s \in [0, n]$ . Using these and the Markov inequality, we obtain that, for $H_{0} = 1$ , $ψ (M) = c_{0} c_{τ}^{- 2} M^{2}$ and $M_{0} = \log 2$ , $\begin{aligned} P (ρ (P_{I} ξ) \geq M p (I)) & \leq P (c_{τ} | I |^{1 / 2} ‖ P_{I} ξ ‖_{2} \geq M p (I)) \\ = P (‖ P_{I} ξ ‖_{2} \geq \frac{M}{c_{τ}} [λ (| I |)]^{1 / 2}) \\ \leq P (c_{0} ‖ P_{I} ξ ‖_{2}^{2} \geq c_{0} c_{τ}^{- 2} M^{2} λ (| I |)) \\ \leq e^{| I | \log 2 - c_{0} c_{τ}^{- 2} M^{2} λ (| I |)} \leq e^{- (ψ (M) - M_{0}) λ (| I |)} . \end{aligned}$ One can extend the results to the case of the so-called sub-exponential errors, with adjusted complexity function $p (I)$ ; see Remark 3.3.

Remark 2.4

Condition C1 is clearly satisfied for bounded, arbitrarily dependent, $ξ_{i}$ 's. This condition allows some interesting cases of dependent $ξ_{i}$ 's. By arguing in the same way as in Belitser and Nurushev (Citation2020), we can establish that Condition C1 holds also for $ξ_{k}$ 's which follow an autoregressive model AR(1) with sub-gaussian white noise $ϵ_{k}$ 's (for appropriately chosen model parameters): $ξ_{k} = α ξ_{k - 1} + ϵ_{k}, k \in [n]; ξ_{0} = 0, | α | < 1.$ In Belitser and Nurushev (Citation2020), the normal white noise was used in the above model AR(1), but this can easily be extended to the sub-gaussian case by adjusting the constants involved.

Remark 2.5

We can extend our setting by including also a parameter $σ > 0$ by considering $σ ξ_{i}$ instead of just $ξ_{i}$ (and $(X_{i} - θ) / σ$ instead of $X_{i} - θ$ ) in (Equation1(1) $X_{i} = θ_{i} + ξ_{i}, P_{θ} (ξ_{i} \leq 0) = τ, i \in [n] = {1, \dots, n} .$ (1) ) and in Condition C1. In that case, the parameter $σ > 0$ is assumed to be known and fixed throughout. Together with n, it reflects the information amount in the model in the sense that $σ \to 0$ means the flux of information.

We propose a penalised projection estimator. For $κ > 0$ , define $\hat{I} = \hat{I} (X, κ)$ to be a minimiser of the criterion $C (I) = C (I, κ) = C (I, κ, X)$ : (8) $\begin{aligned} C (I) & = ρ (X - P_{I} X) + κ p (I) = ρ (P_{I^{c}} X) + κ p (I), \\ min_{I \in I} C (I) & = C (\hat{I}) . \end{aligned}$ (8)

Remark 2.6

Since the proposed procedure $\hat{I}$ for selecting sparsity pattern is based on the non-symmetric quantile loss function ρ, the deviations from below and from above are treated differently. But the computation of the estimator $\hat{I}$ of the sparsity pattern is not difficult. Indeed, it can be reduced to the search over n options: with $b_{k} = ρ (X_{k}) \geq 0$ , $k \in [n]$ , $\begin{aligned} min_{I \in I} C (I) & = min_{I \in I} {ρ (P_{I^{c}} X) + κ p (I)} = min_{I \in I} {\sum_{k \in I^{c}} ρ (X_{k}) + κ p (I)} \\ = min_{i \in [n]} {min_{I \in I : | I | = i} \sum_{k \in I^{c}} ρ (X_{k}) + κ i \log^{1 / 2} (\frac{e n}{i})} \\ = min_{i \in [n]} {\sum_{k = 1}^{n - i} b_{(k)} + κ (i \log^{1 / 2} (\frac{e n}{i})} = min_{i \in [n]} {B_{i} + κ (i \log^{1 / 2} (\frac{e n}{i})}, \end{aligned}$ where $b_{(1)} \leq \dots \leq b_{(n)}$ is the ordered sequence of $b_{k}$ 's and $B_{i} = \sum_{k = 1}^{n - i} b_{(k)}$ .

Now, using the estimated sparsity structure $\hat{I}$ , define the estimator (9) $\hat{θ} = \hat{θ} (X, κ) = P_{\hat{I}} X .$ (9) From the triangle inequality (Equation4(4) $ρ (x + y) \leq ρ (x) + ρ (y), x, y \in R^{m} .$ (4) ), it follows that $ρ (x - y) \geq ρ (x) - ρ (y)$ . Using this, (Equation3(3) $ρ_{τ} (- x) \leq C_{τ} ρ_{τ} (x), with C_{τ} = max {\frac{1 - τ}{τ}, \frac{τ}{1 - τ}} = \frac{c_{τ}}{1 - c_{τ}} .$ (3) ) and the definition (Equation8(8) $\begin{aligned} C (I) & = ρ (X - P_{I} X) + κ p (I) = ρ (P_{I^{c}} X) + κ p (I), \\ min_{I \in I} C (I) & = C (\hat{I}) . \end{aligned}$ (8) ), we derive that, for any $I^{'} \in I$ , (10) $\begin{aligned} κ (p (\hat{I}) - p (I^{'})) & \leq ρ (P_{I^{' c}} X) - ρ (P_{{\hat{I}}^{c}} X) = ρ (P_{\hat{I} ∖ I^{'}} X) - ρ (P_{I^{'} ∖ \hat{I}} X) \\ \leq ρ (P_{\hat{I} ∖ I^{'}} θ) + ρ (P_{\hat{I} ∖ I^{'}} ξ) - ρ (P_{I^{'} ∖ \hat{I}} θ) + ρ (P_{I^{'} ∖ \hat{I}} (- ξ)) \\ \leq ρ (P_{I^{' c}} θ) - ρ (P_{{\hat{I}}^{c}} θ) + ρ (P_{\hat{I}} ξ) + C_{τ} ρ (P_{I^{'}} ξ) . \end{aligned}$ (10) For $ϰ > 0$ , $θ \in R^{n}$ and $I \in I$ , introduce the quantity $r (θ, I) = r_{ϰ} (θ, I) = ρ (θ - P_{I} θ) + ϰ p (I) = ρ (P_{I^{c}} θ) + ϰ p (I),$ which we call the quantile rate of the sparsity structure I. The oracle sparsity structure $I_{o} = I_{o} (θ) = I_{o} (θ, ϰ)$ is the one minimising $r_{ϰ} (θ, I)$ : (11) $min_{I \in I} r_{ϰ} (θ, I) = r_{ϰ} (θ, I_{o}) = r_{ϰ} (θ) = r (θ) .$ (11) where its minimal value $r (θ)$ is called the oracle quantile rate, or just oracle rate.

3. Main results

In this section, we give the main results. All the constants in the below assertions depend on the fixed constants $M_{0}, H_{0}$ and function ψ appearing in Condition C1, the constant κ appearing in the oracle structure definition (Equation8(8) $\begin{aligned} C (I) & = ρ (X - P_{I} X) + κ p (I) = ρ (P_{I^{c}} X) + κ p (I), \\ min_{I \in I} C (I) & = C (\hat{I}) . \end{aligned}$ (8) ), and the quantile level τ.

The next result concerns the estimation problem.

Theorem 3.1

Estimation

Let Condition C1 be fulfilled. Then for any $ϰ > 0$ and sufficiently large κ, there exist positive $M_{1}, H_{1}, m_{1}$ such that (12) $P_{θ} (ρ (θ - \hat{θ}) \geq M_{1} r (θ)) \leq H_{1} e^{- m_{1} λ (1)} \leq H_{1} n^{- m_{1}}, θ \in R^{n},$ (12) where $\hat{θ}$ is defined by (Equation9(9) $\hat{θ} = \hat{θ} (X, κ) = P_{\hat{I}} X .$ (9) ).

Next we address the new problem of uncertainty quantification (UQ) for the parameter θ. A confidence set is defined in terms of quantile loss function ρ as follows: (13) $B (\tilde{θ}, \tilde{r}) = {θ \in R^{n} : ρ (θ - \tilde{θ}) \leq \tilde{r}},$ (13) where the ‘ centre’ $\tilde{θ} = \tilde{θ} (X) : R^{n} \mapsto R^{n}$ and ‘radius’ $\tilde{r} = \tilde{r} (X) : R^{n} \mapsto R_{+} = [0, + \infty]$ are measurable functions of the data X. The goal is to construct such a confidence set $B (\tilde{θ}, C \tilde{r})$ that for any $α_{1}, α_{2} \in (0, 1]$ and some function $R (θ)$ , $R : R^{n} \to R_{+}$ , there exist C, c>0 such that (14) $sup_{θ \in Θ_{cov}} P_{θ} (θ \notin B (\tilde{θ}, C \tilde{r})) \leq α_{1}, sup_{θ \in Θ_{size}} P_{θ} (\tilde{r} \geq c R (θ)) \leq α_{2},$ (14) for some $Θ_{cov}, Θ_{size} \subseteq R^{n}$ . The function $R (θ)$ , called radial rate, is a benchmark for the effective radius of the confidence set $B (\tilde{θ}, C \tilde{r})$ . The first expression in (Equation14(14) $sup_{θ \in Θ_{cov}} P_{θ} (θ \notin B (\tilde{θ}, C \tilde{r})) \leq α_{1}, sup_{θ \in Θ_{size}} P_{θ} (\tilde{r} \geq c R (θ)) \leq α_{2},$ (14) ) is called coverage relation and the second size relation. It is desirable to find the smallest $r (θ)$ , the biggest $Θ_{cov}$ and $Θ_{size}$ such that (Equation14(14) $sup_{θ \in Θ_{cov}} P_{θ} (θ \notin B (\tilde{θ}, C \tilde{r})) \leq α_{1}, sup_{θ \in Θ_{size}} P_{θ} (\tilde{r} \geq c R (θ)) \leq α_{2},$ (14) ) holds and $R (θ) ≍ r (Θ_{size})$ , where $r (Θ_{size})$ is the optimal rate in estimation problem for θ. In our local approach, we pursue even more ambitious goal $R (θ) ≍ r (θ)$ , where $r (θ)$ is the oracle rate from the (local) estimation problem.

Typically, the so-called deceptiveness issue arises for the UQ problem in that the confidence set of the optimal size and high coverage can only be constructed for non-deceptive parameters (in particular, $Θ_{cov}$ cannot be the whole set $R^{n}$ ). That is, coverage with an optimal sized radius is only possible if certain deceptive set of parameters is excluded from consideration, which is expressed by imposing some condition on the parameter. For example, the EBR (excessive bias restriction) condition in case of $ℓ_{2}$ -norm is proposed in Belitser and Nurushev (Citation2019), Belitser (Citation2017) and Belitser and Ghosal (Citation2020). Here we need an EBR-like condition (we keep the same term EBR), but now in terms of the quantile loss function ρ.

Condition EBR. We say that parameter $θ \in R^{n}$ satisfies the excessive bias restriction (EBR) condition with structural parameter $t \geq 0$ if $θ \in Θ_{eb} (t) = Θ_{eb} (t, ϰ)$ where (15) $Θ_{eb} (t) = {θ \in R^{n} : \frac{ρ (θ - P_{I_{o}} θ)}{p (I_{o})} \leq t},$ (15) the oracle structure $I_{o} = I_{o} (θ)$ is defined by (Equation11(11) $min_{I \in I} r_{ϰ} (θ, I) = r_{ϰ} (θ, I_{o}) = r_{ϰ} (θ) = r (θ) .$ (11) ) (with the convention $0 / 0 = 0$ ).

The extent of the restriction $θ \in Θ_{eb} (t)$ varies over different choices of constant t, becoming more lenient as t increases, eventually covering the entire parameter space. In general, for any t, a sequence of θ's can be found such that $θ \notin Θ_{eb} (t)$ , when n varies. On the other hand, the set $Θ_{eb} (0)$ is not empty and consists of such θ's for which the oracle $I_{o}$ coincides with the true sparsity structure $I_{0}$ . For a general discussion on EBR, we refer the reader to Belitser and Nurushev (Citation2019). For sparsity structure in $ℓ_{1}$ -sense, the EBR condition is satisfied if the minimal absolute value of the non-zero coordinates of θ is larger than a certain lower bound, but this bound depends also on the number of the non-zero coordinates of θ. For example, for some sufficiently large C>0, $\cup_{I \in I} {θ \in L_{I} : | θ_{i} | \geq C \log^{1 / 2} (\frac{e n}{| I |}), i \in I} \subseteq Θ_{eb} (0) .$

Theorem 3.2

Confidence ball

Let the confidence set $B (\cdot, \cdot)$ be defined by (Equation13(13) $B (\tilde{θ}, \tilde{r}) = {θ \in R^{n} : ρ (θ - \tilde{θ}) \leq \tilde{r}},$ (13) ), $\hat{r} = p (\hat{I})$ , $\hat{I}$ and $\hat{θ}$ be given by (Equation8(8) $\begin{aligned} C (I) & = ρ (X - P_{I} X) + κ p (I) = ρ (P_{I^{c}} X) + κ p (I), \\ min_{I \in I} C (I) & = C (\hat{I}) . \end{aligned}$ (8) ) and (Equation9(9) $\hat{θ} = \hat{θ} (X, κ) = P_{\hat{I}} X .$ (9) ) respectively. Then for sufficiently large $κ, ϰ$ there exist constants $M_{2} = M_{2} (t), H_{2}, m_{2}, M_{3}, H_{3}, m_{3} > 0$ such that for any $t \geq 0$ , (16) $\begin{aligned} sup_{θ \in Θ_{eb} (t)} P_{θ} (θ \notin B (\hat{θ}, M_{2} (t) \hat{r})) & \leq H_{2} n^{- m_{2}}, \end{aligned}$ (16) (17) $\begin{aligned} sup_{θ \in R^{n}} P_{θ} (\hat{r} \geq M_{3} r (θ)) & \leq H_{3} n^{- m_{3}} . \end{aligned}$ (17)

Constants $H_{i}$ , $M_{i}$ , $m_{i}$ , i = 1, 2, 3, are all evaluated in the proofs of Theorems 3.1 and 3.2 in the form of several bounds, which depend on function ψ, constants $H_{0}, M_{0}$ from Condition C1, the quantile level τ and ϰ. There is no ‘optimal’ choice, in fact, many choices are possible, an improvement of one constant typically lead to worsening another one. It should also be noted that the constants (any choice satisfying the bounds in the proofs) are uniform over the whole family $P$ of distributions satisfying Condition C1, and can therefore be significantly improved for specific error distributions.

So far, all results are formulated in terms of the oracle. For estimation and the construction of a confidence set, this local approach delivers the most general results, as the convergence rate of the proposed estimator is directly linked with the oracle sparsity rather than the true structure, which may not be sparse (but very close to a sparse structure). However, if the parameter has a true sparse structure, the method will attain the quality pertinent to that true sparsity as well.

To illustrate this, consider any $θ \in R^{n}$ . The true structure $I_{0} (θ)$ and the oracle structure $I_{o} (θ)$ in general do not coincide, but they are related by (18) $r (θ) = r (θ, I_{o}) \leq r (θ, I_{0}) = ϰ p (I_{0}) = ϰ | I_{0} | \log^{1 / 2} (e n / | I_{0} |) .$ (18) The first two relations hold by the definition of the oracle and the third because $P_{I_{0}} θ = θ$ for the true structure. Since $r (θ) = ρ (θ - P_{I_{o}} θ) + ϰ p (I_{o})$ , (Equation18(18) $r (θ) = r (θ, I_{o}) \leq r (θ, I_{0}) = ϰ p (I_{0}) = ϰ | I_{0} | \log^{1 / 2} (e n / | I_{0} |) .$ (18) ) implies that $p (I_{o}) \leq p (I_{0})$ and hence $| I_{o} | \leq | I_{0} |$ . Further, as $| I_{0} (θ) | \leq s$ for $θ \in ℓ_{0} [s]$ , (Equation18(18) $r (θ) = r (θ, I_{o}) \leq r (θ, I_{0}) = ϰ p (I_{0}) = ϰ | I_{0} | \log^{1 / 2} (e n / | I_{0} |) .$ (18) ) also yields $sup_{θ \in ℓ_{0} [s]} r (θ) \leq ϰ s \log^{1 / 2} (e n / s) .$ Besides, all the constants in Theorems 3.1 and 3.2 are uniform over $θ \in ℓ_{0} [s]$ and $P_{θ} \in P$ . The next result follows from these facts and Theorems 3.1 and 3.2.

Corollary 3.1

Under the conditions of Theorems 3.1 and 3.2, with the same choice of the constants, $\begin{aligned} sup_{θ \in ℓ_{0} [s]} sup_{P_{θ} \in P} P_{θ} (ρ (θ - \hat{θ}) \geq M_{1} ϰ s \log^{1 / 2} (e n / s)) & \leq H_{1} n^{- m_{1}}, \\ sup_{θ \in ℓ_{0} [s]} sup_{P_{θ} \in P} P_{θ} (\hat{r} \geq M_{3} ϰ s \log^{1 / 2} (e n / s)) & \leq H_{3} n^{- m_{3}}, \end{aligned}$ and (Equation16(16) $\begin{aligned} sup_{θ \in Θ_{eb} (t)} P_{θ} (θ \notin B (\hat{θ}, M_{2} (t) \hat{r})) & \leq H_{2} n^{- m_{2}}, \end{aligned}$ (16) ) holds.

We claim that the obtained convergence rate $s \log^{1 / 2} (e n / s)$ in terms of quantile loss function is optimal over the class $ℓ_{0} [s]$ in the minimax sense. More precisely, the results of Johnstone and Silverman (Citation2004) (in Johnstone and Silverman Citation2004, relations (17) and (18) with p = 0 and q = 1) and (Equation2(2) $(1 - c_{τ}) | x | \leq ρ_{τ} (x) \leq c_{τ} | x |, with c_{τ} = max {τ, 1 - τ} .$ (2) ) imply that for the normal model $P_{θ} = ⨂_{i \in [n]} N (θ_{i}, 1)$ , there exist absolute $C_{1}$ such that $inf_{\tilde{θ}} sup_{θ \in ℓ_{0} [s]} E_{θ} | θ - \tilde{θ} | \geq C_{1} s \log^{1 / 2} (e n / s) .$ This lower bound is not quite what we need to match with our upper bound because it is formulated in terms of expectation and the $ℓ_{1}$ -loss. However, it is not difficult to establish the probability version of the lower bound: there exist absolute $C_{1}, C_{2} > 0$ such that $inf_{\tilde{θ}} sup_{θ \in ℓ_{0} [s]} P_{θ} (| θ - \tilde{θ} | \geq C_{1} s \log^{1 / 2} (e n / s)) \geq C_{2} .$ The relation (Equation2(2) $(1 - c_{τ}) | x | \leq ρ_{τ} (x) \leq c_{τ} | x |, with c_{τ} = max {τ, 1 - τ} .$ (2) ) connects the $ℓ_{1}$ -loss with the quantile loss, so that the above relation implies that $inf_{\tilde{θ}} sup_{θ \in ℓ_{0} [s]} P_{θ} (ρ (θ - \tilde{θ}) \geq C_{1} c_{τ}^{- 1} s \log^{1 / 2} (e n / s)) \geq C_{2} .$ In this form, the lower bound matches our upper bound given by Corollary 3.1, basically showing that our procedure attains the optimal rate $s \log^{1 / 2} (e n / s)$ for the sparsity class $ℓ_{0} [s]$ . This also means that the size of the constructed confidence ball $B (\hat{θ}, M_{2} (t) \hat{r})$ is also optimal over the sparsity class $ℓ_{0} [s]$ in the minimax sense. However, the unavoidable price for the optimality in the size relation is that the coverage relation holds uniformly over $Θ_{eb} (t)$ , not over $ℓ_{0} [s]$ .

Remark 3.1

Interestingly, although there are some ‘deceptive’ θ in $ℓ_{0} [s]$ that are not covered by $Θ_{eb} (t)$ , there are also some θ's in $Θ_{eb} (t)$ which do not belong to the sparsity class $ℓ_{0} [s]$ , but for which the coverage relation holds.

Remark 3.2

We derived the adaptive (sparsity s is also unknown) minimax results over the traditional sparsity scale ${ℓ_{0} [s], s \in [n]}$ as consequence of our local oracle results. The scope of our local result is even broader, the minimax results can be derived over any scale of classes ${Θ_{s}, s \in S}$ with the corresponding minimax rates $R (Θ_{s})$ as longs as, for some c>0, $sup_{θ \in Θ_{s}} r (θ) \leq c R (Θ_{s}), s \in S .$

Indeed, if the above relation is fulfilled, we immediately obtain all the claims of Corollary 3.1 with $Θ_{s}$ instead of $ℓ_{0} [s]$ , as consequences of Theorems 3.1 and 3.2. For example, it seems possible to derive the minimax results also for the scale of the $ℓ_{s}$ -balls: $ℓ_{s} [η] = {θ \in R^{n} : \frac{1}{n} \sum_{i = 1}^{n} | θ_{i} |^{s} \leq η^{s}}$ , $s \in [0, 2]$ , with $η = η_{n} \to 0$ as $n \to \infty$ .

Remark 3.3

Interestingly, in relation to Remark 2.3, Condition C1 also holds for independent sub-exponential $ξ_{i}$ 's, but with the adjusted function $p (I) = p_{e x p} (I) = λ (| I |)$ . Recall one of the equivalent definitions of sub-exponentiality (see Vershynin Citation2018): a random variable W is called σ-sub-exponential if $E e^{c_{0} | W |} \leq 2$ for some $c_{0} > 0$ . For example, for the Laplace distribution $ξ_{i} \overset{ind}{\sim} f_{λ} (x) = \frac{λ}{2} e^{- λ | x |}$ , $2 \geq E e^{c_{0} | ξ_{1} |} = \frac{λ}{λ - c_{0}}$ , $c_{0} < λ$ , we take $c_{0} = \frac{λ}{2}$ . In particular, if $λ = 1$ , $c_{0} = \frac{1}{2}$ .

Now, since $P_{I} ξ$ has at most $| I |$ non-zero entries, by (Equation2(2) $(1 - c_{τ}) | x | \leq ρ_{τ} (x) \leq c_{τ} | x |, with c_{τ} = max {τ, 1 - τ} .$ (2) ) we have $ρ (P_{I} ξ) \leq c_{τ} | P_{I} ξ | = c_{τ} \sum_{i \in I} | ξ_{i} |$ . Recall also that $λ (s) \geq s$ , $s \in [0, n]$ . Using these and the Markov inequality, we obtain that $\begin{aligned} P (ρ (P_{I} ξ) \geq M p_{e x p} (I)) & \leq P (c_{τ} \sum_{i \in I} | ξ_{i} | \geq M p_{e x p} (I)) \\ = P (c_{0} \sum_{i \in I} | ξ_{i} | \geq \frac{M c_{0}}{c_{τ}} p_{e x p} (I)) \\ \leq e^{| I | \log 2 - c_{0} c_{r}^{- 1} M p_{e x p} (I)} \leq e^{- (ψ (M) - M_{0}) λ (| I |)}, \end{aligned}$ for $ψ (M) = c_{0} c_{r}^{- 1} M$ and $M_{0} = \log 2$ . Similarly to the setting discussed in Remark 2.4, it is possible to generalise the case of independent sub-exponential errors $ξ_{i}$ 's to the case of errors driven by an AR(1) model with sub-exponential white noise.

Thus, for sub-exponential $ξ_{i}$ 's, Theorems 3.1 and 3.2 hold with the oracle rate $r_{\exp} (θ) = min_{I \in I} (ρ (P_{I^{c}} θ) + ϰ p_{\exp} (I))$ . Notice a slight deterioration of the oracle rate $r_{\exp} (θ)$ as compared to the sub-gaussian case, as we now have the complexity $p_{\exp} (I) = | I | \log (e n / | I |)$ instead of $p (I) = | I | [\log (e n / | I |)]^{1 / 2}$ . This will also lead to the corresponding slightly deteriorated global rate $s \log (e n / s)$ (instead of $s \log^{1 / 2} (e n / s)$ ) in Corollary 3.1. Intuitively, a worse rate for the sub-exponential case is not surprising and should be considered as price for the ‘heavy tailedness’ of the erros $ξ_{i}$ 's.

The question remains whether the resulting rate in this case $λ (s) = s \log (e n / s)$ is minimax over the sparsity scale ${ℓ_{0} [s], s \in [n]}$ for the $ℓ_{1}$ -norm and sub-exponential errors. We did not find relevant results in the literature on this, but we conjecture that the minimax rate over sparsity scale for the $ℓ_{q}$ -norm with errors with density $f_{ξ} (x) ≍ e^{- | x |^{γ}}$ , $γ \in (0, 2]$ , is expected to be $s [\log (e n / s)]^{q / γ}$ .

4. A simulation study

In this section, we present a small simulation study. The main goal is to demonstrate the deceptiveness phenomenon for the UQ problem, which concerns the coverage relation of Theorem 3.2. Theorem 3.1 and the size relation of Theorem 3.2 will also be demonstrated in passing. Recall that the coverage relation in Theorem 3.2 holds only for the so-called non-deceptive parameters which are described by the EBR condition: $θ \in Θ_{eb} (t)$ . Below we provide an example illustrating the failure of the coverage relation for a deceptive parameter θ. Exactly, we construct a sequence of ‘deceptive’ $θ_{n} \in R^{n}$ , $n \in N$ , such that, for any $M_{2}$ , $P_{θ_{n}} (θ_{n} \notin B (\hat{θ}, M_{2} \hat{r})) \to 1$ as $n \to \infty$ .

Consider the simplest setting in the model (Equation1(1) $X_{i} = θ_{i} + ξ_{i}, P_{θ} (ξ_{i} \leq 0) = τ, i \in [n] = {1, \dots, n} .$ (1) ): $ξ_{i} \overset{ind}{\sim} N (0, 1)$ , $i = 1, \dots, n$ , and $τ = 1 / 2$ . In this case, according to Remark 3, Condition C1 is satisfied with $H_{0} = 1$ , $ϕ (M) = c_{0} c_{0.5}^{- 2} M^{2}$ and $M_{0} = \log 2$ where $c_{0.5} = \frac{1}{2}$ and such $c_{0}$ that $2 \geq E e^{c_{0} W^{2}} = (1 - 2 c_{0})^{- 1 / 2}$ for $W \sim N (0, 1)$ , leading to a choice $c_{0} = \frac{3}{8}$ . Consider a parameter $θ_{n} = (d_{n}, 1, \dots, 1) \in R^{n}$ with any $d_{n} > 2 ϰ p ({1}) = 2 ϰ \log^{1 / 2} (e n)$ , so that the oracle sparsity structure $I_{o} = I_{o} (θ_{n}) = {1}$ for all $n \in N$ and hence the ratio in the definition (Equation15(15) $Θ_{eb} (t) = {θ \in R^{n} : \frac{ρ (θ - P_{I_{o}} θ)}{p (I_{o})} \leq t},$ (15) ) of the EBR condition is $\frac{ρ (θ_{n} - P_{I_{o}} θ_{n})}{p (I_{o})} = \frac{n - 1}{2 \log (e n)^{1 / 2}} = t_{n}$ , $n \in N$ . Since $t_{n} \to \infty$ as $n \to \infty$ , this means that the sequence ${θ_{n}}$ can be seen as ‘deceptive’ in that for any $t \geq 0$ there exists $n_{0}$ such that $θ_{n} \notin Θ_{eb} (t)$ for all $n \geq n_{0}$ . In the below simulations, we took $κ = 1.1$ , $ϰ = 1.4$ and $d_{n} = 10$ which ensures the condition $d_{n} > 2 ϰ \log^{1 / 2} (e n)$ for all considered n's.

Introduce the quantities ${\tilde{D}}_{n} = \frac{ρ (θ_{n} - \hat{θ})}{r (θ_{n})}$ , ${\hat{D}}_{n} = \frac{ρ (θ_{n} - \hat{θ})}{\hat{r}}$ and ${\bar{D}}_{n} = \frac{\hat{r}}{r (θ_{n})}$ . Now we perform the following simulations: for a chosen $n \in N$ and the above-described $θ_{n} \in R^{n}$ , generate B = 1000 data samples, then compute B values of ${\hat{θ}}^{(k)}$ , ${\hat{r}}^{(k)}$ , and the corresponding ${\hat{D}}_{n}^{(k)}$ , ${\tilde{D}}_{n}^{(k)}$ and ${\bar{D}}_{n}^{(k)}$ , $k = 1, \dots, B$ . Since ${ρ (θ_{n} - \hat{θ}) \geq M_{1} r (θ_{n})} = {{\tilde{D}}_{n} \geq M_{1}}$ , ${θ_{n} \notin B (\hat{θ}, M_{2} \hat{r})} = {{\hat{D}}_{n} > M_{2}}$ and ${\hat{r} \geq M_{3} r (θ_{n})} = {{\bar{D}}_{n} \geq M_{3}}$ , we can approximate the probabilities of the events from Theorems 3.1 and 3.2 as follows: $\begin{aligned} P_{θ_{n}} (ρ (θ_{n} - \hat{θ}) \geq M_{1} r (θ_{n})) & \approx \frac{1}{B} \sum_{k = 1}^{B} 1 {{\tilde{D}}_{n}^{(k)} \geq M_{1}}, \\ P_{θ_{n}} (θ_{n} \notin B (\hat{θ}, M_{2} \hat{r})) & \approx \frac{1}{B} \sum_{k = 1}^{B} 1 {{\hat{D}}_{n}^{(k)} > M_{2}}, \\ P_{θ_{n}} (\hat{r} \geq M_{3} r (θ_{n})) & \approx \frac{1}{B} \sum_{k = 1}^{B} 1 {{\bar{D}}_{n}^{(k)} \geq M_{3}} . \end{aligned}$ In Figure , the boxplots of ${\tilde{D}}_{n}^{(k)}$ 's, ${\hat{D}}_{n}^{(k)}$ 's and ${\bar{D}}_{n}^{(k)}$ 's are plotted for several increasing n = 50, 100, 200, 400, 800, 1600, 3200. We see that the boxplots of ${\tilde{D}}_{n}^{(k)}$ (the plot on the left) stabilises around 1, this is because of the special choice of $θ_{n}$ for which $ρ (θ_{n} - \hat{θ})$ is asymptotically equivalent to the oracle rate $r (θ_{n})$ . Thus, the first probability in the above display will go to zero for any $M_{1} > 1$ , which shows that the claim of Theorem 3.1 holds also for this deceptive ${θ_{n}}$ as it should because the result of Theorem 3.1 is uniform over $θ \in R^{n}$ . The size relation of Theorem 3.2 holds as well (as it also should): the boxplots of ${\bar{D}}_{n}^{(k)}$ 's (the plot on the right) even move to zero, showing that the third probability in the above display will converge to zero for any $M_{3} > 0$ .

Figure 1. Sub-gaussian case: boxplots of ${\tilde{D}}_{n}^{(k)}$ 's, ${\hat{D}}_{n}^{(k)}$ 's and ${\bar{D}}_{n}^{(k)}$ 's for increasing n and deceptive ${θ_{n}}$ .

$Figure 1. Sub-gaussian case: boxplots of D~n(k)'s, D^n(k)'s and D¯n(k)'s for increasing n and deceptive {θn}.$

The most interesting plot is in the middle of Figure , it shows that the boxplots of ${\hat{D}}_{n}^{(k)}$ 's move up as n increases, ensuring that the fraction of data points such that ${\hat{D}}_{n}^{(k)} > M_{2}$ goes to 1 as $n \to \infty$ for any constant $M_{2}$ . Hence, $P_{θ_{n}} (θ \notin B (\hat{θ}, M_{2} \hat{r})) \to 1$ as $n \to \infty$ , for the chosen deceptive parameter sequence ${θ_{n}}$ , which demonstrates the deceptiveness phenomenon.

Finally, we perform simulations for the sub-exponential case discussed in Remark 3.3 and a non-deceptive parameter $θ_{n}^{'} = (d_{n}, \dots, d_{n}, 0, \dots, 0) \in R^{n}$ . Precisely, we consider the Laplace errors $ξ_{i} \overset{ind}{\sim} Laplac (1, 0)$ , $i = 1, \dots, n$ (i.e. with density $\frac{1}{2} e^{- | x |}$ ), we set the first 20 entries of parameter $θ_{n}^{'}$ equal to $d_{n}$ and the remaining coordinates to zero. This means also that we used complexity $p_{e x p} (I)$ instead of $p (I)$ . All the other parameters of this simulation experiment remain the same as before.

Figure shows that the claims of Theorems 3.1 and 3.2 hold for non-deceptive ${θ_{n}^{'}}$ . We now see a downward trend in the boxplots of ${\hat{D}}_{n}^{(k)}$ 's as n increases illustrating the good behaviour of the confidence set for non-deceptive ${θ_{n}^{'}}$ also in the sub-exponential case.

Figure 2. Sub-exponential case: boxplots of ${\tilde{D}}_{n}^{(k)}$ 's, ${\hat{D}}_{n}^{(k)}$ 's, ${\bar{D}}_{n}^{(k)}$ 's for increasing n and non-deceptive ${θ_{n}^{'}}$ .

$Figure 2. Sub-exponential case: boxplots of D~n(k)'s, D^n(k)'s, D¯n(k)'s for increasing n and non-deceptive {θn′}.$

5. Proofs

In this section, we present the proofs of the theorems.

Proof of Theorem 3.1.

First we introduce some notation. Define the events $E_{1} = {ρ (θ - \hat{θ}) \geq M_{1} r (θ)}, E_{2} = {r (θ, \hat{I}) \leq A_{1} r (θ)}, E_{3} = {ρ (P_{\hat{I}} ξ) \leq A_{2} p (\hat{I})},$ where the constants $M_{1}, A_{1}, A_{2} > 0$ are to be chosen later. We evaluate the probability of interest $P_{θ} (E_{1}) = P_{θ} (ρ (θ - \hat{θ}) \geq M_{1} r (θ))$ as follows: (19) $\begin{aligned} P_{θ} (E_{1}) & = P_{θ} (E_{1} \cap E_{2} \cap E_{3}) + P_{θ} (E_{1} \cap E_{3}^{c}) + P_{θ} (E_{1} \cap E_{2}^{c} \cap E_{3}) \\ = T_{1} + T_{2} + T_{3} . \end{aligned}$ (19) We bound these three probabilities separately.

First we bound $T_{1}$ . Using the properties (Equation3(3) $ρ_{τ} (- x) \leq C_{τ} ρ_{τ} (x), with C_{τ} = max {\frac{1 - τ}{τ}, \frac{τ}{1 - τ}} = \frac{c_{τ}}{1 - c_{τ}} .$ (3) ) and (Equation4(4) $ρ (x + y) \leq ρ (x) + ρ (y), x, y \in R^{m} .$ (4) ) of the quantile loss function ρ, we derive that the event $E_{1} \cap E_{2} \cap E_{3}$ implies that $\begin{aligned} M_{1} r (θ) & \leq ρ (θ - \hat{θ}) = ρ (θ - P_{\hat{I}} X) \\ \leq ρ (θ - P_{\hat{I}} θ) + ρ (- P_{\hat{I}} ξ) \leq ρ (θ - P_{\hat{I}} θ) + A_{2} C_{τ} p (\hat{I}) \\ \leq max {1, \frac{A_{2} C_{τ}}{ϰ}} r (θ, \hat{I}) \leq max {1, \frac{A_{2} C_{τ}}{ϰ}} A_{1} r (θ) . \end{aligned}$ Hence, if $M_{1} > A_{1} max {1, \frac{A_{2} C_{τ}}{ϰ}}$ , then (20) $T_{1} = P_{θ} (E_{1} \cap E_{2} \cap E_{3}) \leq P_{θ} (M_{1} r (θ) \leq max {1, \frac{A_{2} C_{τ}}{ϰ}} A_{1} r (θ)) = 0.$ (20) To bound $T_{2}$ , write $T_{2} = P_{θ} (E_{1} \cap E_{3}^{c}) \leq P_{θ} (E_{3}^{c}) \leq \sum_{I \in I} P (ρ (P_{I} ξ) > A_{2} p (I)) .$ If $A_{2} > 0$ is such that $ψ (A_{2}) > M_{0} + 1$ , then using Condition C1 (Equation7(7) $sup_{θ \in R^{n}} P_{θ} (ρ (P_{I} ξ) > M p (I)) \leq H_{0} e^{- (ψ (M) - M_{0}) λ (| I |)} .$ (7) ), we obtain that $P (ρ (P_{I} ξ) > A_{2} p (I)) \leq H_{0} e^{- A_{3} λ (| I |)}, I \in I, I \neq \emptyset,$ where we set $A_{3} = ψ (A_{2}) - M_{0}$ . Recall that $λ (s)$ is increasing for $s \in (0, n]$ . This, (Equation6(6) $\sum_{I \in I} e^{- ν λ (| I |)} = 1 + \sum_{s \in [n]} \sum_{I : | I | = s} e^{- ν λ (s)} \leq 1 + \sum_{s \in [n]} e^{- (ν - 1) s} \leq C_{ν},$ (6) ) and the two previous displays entail that, since $A_{3} > 1$ (so that $A_{4} = \frac{(A_{3} - 1)}{2} > 0$ ), (21) $\begin{aligned} T_{2} & \leq P_{θ} (E_{3}^{c}) \leq \sum_{I \in I} P (ρ (P_{I} ξ) > A_{2} p (I)) \leq H_{ξ} \sum_{I \in I : | I | \geq 1} e^{- A_{3} λ (| I |)} \\ \leq H_{ξ} e^{- A_{4} λ (1)} \sum_{I \in I : | I | \geq 1} e^{- (1 + A_{4}) λ (| I |)} \leq H_{1}^{'} e^{- A_{4} λ (1)} . \end{aligned}$ (21) Finally, we bound $T_{3} = P_{θ} (E_{1} \cap E_{2}^{c} \cap E_{3})$ . Using (Equation10(10) $\begin{aligned} κ (p (\hat{I}) - p (I^{'})) & \leq ρ (P_{I^{' c}} X) - ρ (P_{{\hat{I}}^{c}} X) = ρ (P_{\hat{I} ∖ I^{'}} X) - ρ (P_{I^{'} ∖ \hat{I}} X) \\ \leq ρ (P_{\hat{I} ∖ I^{'}} θ) + ρ (P_{\hat{I} ∖ I^{'}} ξ) - ρ (P_{I^{'} ∖ \hat{I}} θ) + ρ (P_{I^{'} ∖ \hat{I}} (- ξ)) \\ \leq ρ (P_{I^{' c}} θ) - ρ (P_{{\hat{I}}^{c}} θ) + ρ (P_{\hat{I}} ξ) + C_{τ} ρ (P_{I^{'}} ξ) . \end{aligned}$ (10) ) with $I^{'} = I_{o}$ and $κ > A_{2}$ , we obtain that under $E_{2}^{c} \cap E_{3}$ , $\begin{aligned} C_{τ} ρ (P_{I_{o}} ξ) & \geq ρ (P_{{\hat{I}}^{c}} θ) + κ p (\hat{I}) - ρ (P_{I_{o}^{c}} θ) - κ p (I_{o}) - ρ (P_{\hat{I}} ξ) \\ \geq ρ (P_{{\hat{I}}^{c}} θ) + κ p (\hat{I}) - ρ (P_{I_{o}^{c}} θ) - κ p (I_{o}) - A_{2} p (\hat{I}) \\ \geq min {\frac{κ - A_{2}}{ϰ}, 1} r (θ, \hat{I}) - max {\frac{κ}{ϰ}, 1} r (θ) \\ \geq min {\frac{κ - A_{2}}{ϰ}, 1} A_{1} r (θ) - max {\frac{κ}{ϰ}, 1} r (θ) \\ = A_{5} r (θ) \geq A_{5} ϰ p (I_{o}), \end{aligned}$ as long as $A_{5} = min {\frac{κ - A_{2}}{ϰ}, 1} A_{1} - max {\frac{κ}{ϰ}, 1} > 0$ . We conclude that the event $E_{2}^{c} \cap E_{3}$ implies the event ${ρ (P_{I_{o}} ξ) \geq \frac{A_{5} ϰ}{C_{τ}} p (I_{o})}$ , so that, by Condition C1 (Equation7(7) $sup_{θ \in R^{n}} P_{θ} (ρ (P_{I} ξ) > M p (I)) \leq H_{0} e^{- (ψ (M) - M_{0}) λ (| I |)} .$ (7) ), (22) $T_{3} \leq P_{θ} (E_{2}^{c} \cap E_{3}) \leq P_{θ} (ρ (P_{I_{o}} ξ) \geq \frac{A_{5} ϰ}{C_{τ}} p (I_{o})) \leq H_{ξ} e^{- A_{6} λ (1)},$ (22) where $A_{6} = ψ (\frac{A_{5} ϰ}{C_{τ}}) - M_{0} > 0$ if $A_{5} > 0$ is chosen so large that $ψ (\frac{A_{5} ϰ}{C_{τ}}) > M_{0}$ .

To summarise the choices of the constants, we need to take such $κ, M_{1} > 0$ (in the claim of the theorem) and such constants $A_{1}, A_{2} > 0$ (in the proof of the theorem) that $M_{1} > A_{1} max {1, \frac{A_{2} C_{τ}}{ϰ}}$ , $A_{3} = ψ (A_{2}) > M_{0} + 1$ , $A_{5} = min {\frac{κ - A_{2}}{ϰ}, 1} A_{1} - max {\frac{κ}{ϰ}, 1} > 0$ (for example, we can fix $κ = A_{2} + 1$ ) and $ψ (\frac{A_{5} ϰ}{C_{τ}}) > M_{0}$ , which is always possible since function $ψ (u) \to \infty$ monotonically as $u \to \infty$ . Combining (Equation19(19) $\begin{aligned} P_{θ} (E_{1}) & = P_{θ} (E_{1} \cap E_{2} \cap E_{3}) + P_{θ} (E_{1} \cap E_{3}^{c}) + P_{θ} (E_{1} \cap E_{2}^{c} \cap E_{3}) \\ = T_{1} + T_{2} + T_{3} . \end{aligned}$ (19) ) –(Equation22(22) $T_{3} \leq P_{θ} (E_{2}^{c} \cap E_{3}) \leq P_{θ} (ρ (P_{I_{o}} ξ) \geq \frac{A_{5} ϰ}{C_{τ}} p (I_{o})) \leq H_{ξ} e^{- A_{6} λ (1)},$ (22) ), we obtain the claim of the theorem with the chosen κ, $M_{1}$ , $H_{1} = H_{1}^{'} + H_{0}$ and $m_{1} = min {A_{4}, A_{6}}$ .

Proof of Theorem 3.2.

For some fixed $δ \in (0, 1)$ (for example, take $δ = 1 / 2$ ), introduce the event $E_{4} = {p (\hat{I}) \leq δ p (I_{o})}$ , where $I_{o} = I_{o} (θ)$ is defined by (Equation11(11) $min_{I \in I} r_{ϰ} (θ, I) = r_{ϰ} (θ, I_{o}) = r_{ϰ} (θ) = r (θ) .$ (11) ).

First we evaluate $P_{θ} (p (\hat{I}) \leq δ p (I_{o})) = P_{θ} (E_{4})$ . We have $\begin{aligned} p (\hat{I} \cup I_{o}) & = | \hat{I} \cup I_{o} | \log^{1 / 2} (\frac{e n}{| \hat{I} \cup I_{o} |}) \\ \leq | \hat{I} | \log^{1 / 2} (\frac{e n}{| \hat{I} \cup I_{o} |}) + | I_{o} | \log^{1 / 2} (\frac{e n}{| \hat{I} \cup I_{o} |}) \\ \leq | \hat{I} | \log^{1 / 2} (\frac{e n}{| \hat{I} |}) + | I_{o} | \log^{1 / 2} (\frac{e n}{| I_{o} |}) \\ = p (\hat{I}) + p (I_{o}) . \end{aligned}$ By using the above relation, we obtain that, under the event $E_{4}$ , $p (\hat{I} \cup I_{o}) \leq p (\hat{I}) + p (I_{o}) \leq (1 + δ) p (I_{o})$ . Hence, we have that, under $E_{4}$ , $\frac{1}{1 + δ} p (\hat{I} \cup I_{o}) \leq p (I_{o}) \leq p (\hat{I} \cup I_{o}) .$ The last relation, (Equation5(5) $If I_{1} \subseteq I_{2}, then ρ (P_{I_{1}} x) \leq ρ (P_{I_{2}} x), x \in R^{n} .$ (5) ) with $I_{1} = (\hat{I} \cup I_{o})^{c} \subseteq I_{o}^{c} = I_{2}$ and the definition (Equation11(11) $min_{I \in I} r_{ϰ} (θ, I) = r_{ϰ} (θ, I_{o}) = r_{ϰ} (θ) = r (θ) .$ (11) ) of the oracle $I_{o}$ imply that, under $E_{4}$ , $\begin{aligned} ρ (P_{{\hat{I}}^{c}} θ) - ρ (P_{(\hat{I} \cup I_{o})^{c}} θ) & \geq ρ (P_{{\hat{I}}^{c}} θ) - ρ (P_{I_{o}^{c}} θ) \geq ϰ (p (I_{o}) - p (\hat{I})) \\ \geq ϰ (1 - δ) p (I_{o}) \geq ϰ \frac{1 - δ}{1 + δ} p (\hat{I} \cup I_{o}) . \end{aligned}$ Recall the event $E_{3} = {ρ (P_{\hat{I}} ξ) \leq A_{2} p (\hat{I})}$ from the proof of the previous theorem and let κ be sufficiently large to satisfy $κ > A_{2}$ . Further, choose ϰ sufficiently large to satisfy $ϰ \frac{1 - δ}{1 + δ} - κ > C_{τ} A_{7}$ with such $A_{7} > 0$ that $A_{8} = ψ (A_{7}) - M_{0} > 0$ . Then, under $E_{3} \cap E_{4}$ , from (Equation10(10) $\begin{aligned} κ (p (\hat{I}) - p (I^{'})) & \leq ρ (P_{I^{' c}} X) - ρ (P_{{\hat{I}}^{c}} X) = ρ (P_{\hat{I} ∖ I^{'}} X) - ρ (P_{I^{'} ∖ \hat{I}} X) \\ \leq ρ (P_{\hat{I} ∖ I^{'}} θ) + ρ (P_{\hat{I} ∖ I^{'}} ξ) - ρ (P_{I^{'} ∖ \hat{I}} θ) + ρ (P_{I^{'} ∖ \hat{I}} (- ξ)) \\ \leq ρ (P_{I^{' c}} θ) - ρ (P_{{\hat{I}}^{c}} θ) + ρ (P_{\hat{I}} ξ) + C_{τ} ρ (P_{I^{'}} ξ) . \end{aligned}$ (10) ) with $I^{'} = \hat{I} \cup I_{o}$ , it follows that, (23) $\begin{aligned} C_{τ} ρ (P_{I_{o}} ξ) & \geq κ (p (\hat{I}) - p (I^{'})) - A_{2} p (\hat{I}) + ρ (P_{{\hat{I}}^{c}} θ) - ρ (P_{I^{' c}} θ) \\ \geq (ϰ \frac{1 - δ}{1 + δ} - κ) p (\hat{I} \cup I_{o}) \\ \geq (ϰ \frac{1 - δ}{1 + δ} - κ) p (I_{o}) > C_{τ} A_{7} p (I_{o}) . \end{aligned}$ (23) Using the last display, (Equation21(21) $\begin{aligned} T_{2} & \leq P_{θ} (E_{3}^{c}) \leq \sum_{I \in I} P (ρ (P_{I} ξ) > A_{2} p (I)) \leq H_{ξ} \sum_{I \in I : | I | \geq 1} e^{- A_{3} λ (| I |)} \\ \leq H_{ξ} e^{- A_{4} λ (1)} \sum_{I \in I : | I | \geq 1} e^{- (1 + A_{4}) λ (| I |)} \leq H_{1}^{'} e^{- A_{4} λ (1)} . \end{aligned}$ (21) ), Condition C1 (Equation7(7) $sup_{θ \in R^{n}} P_{θ} (ρ (P_{I} ξ) > M p (I)) \leq H_{0} e^{- (ψ (M) - M_{0}) λ (| I |)} .$ (7) ) and $A_{8} = ψ (A_{7}) - M_{0} > 0$ , we bound (24) $\begin{aligned} P_{θ} (p (\hat{I}) \leq δ p (I_{o})) & = P_{θ} (E_{4}) \leq P_{θ} (E_{3}^{c}) + P_{θ} (E_{3} \cap E_{4}) \\ \leq H_{1}^{'} e^{- A_{4} λ (1)} + P_{θ} (ρ (P_{I_{o}} ξ) > A_{7} p (I_{o})) \\ \leq H_{1}^{'} e^{- A_{4} λ (1)} + H_{0} e^{- A_{8} λ (1)} . \end{aligned}$ (24) Now we establish the coverage property. The constants $M_{1}$ , $H_{1}$ and $m_{1}$ are defined in Theorem 3.1. Take $M_{2} = \frac{M_{1} (t + ϰ)}{δ}$ , where fixed $δ \in (0, 1)$ is from the definition of the event $E_{4}$ . If $θ \in Θ_{eb} (t)$ , then, in view of (Equation11(11) $min_{I \in I} r_{ϰ} (θ, I) = r_{ϰ} (θ, I_{o}) = r_{ϰ} (θ) = r (θ) .$ (11) ), $r (θ) = ρ (P_{I_{o}^{c}} θ) + ϰ p (I_{o}) \leq (t + ϰ) p (I_{o})$ . So, $p (I_{o}) \geq (t + ϰ)^{- 1} r (θ)$ for all $θ \in Θ_{eb} (t)$ . Combining this with Theorem 3.1 and (Equation24(24) $\begin{aligned} P_{θ} (p (\hat{I}) \leq δ p (I_{o})) & = P_{θ} (E_{4}) \leq P_{θ} (E_{3}^{c}) + P_{θ} (E_{3} \cap E_{4}) \\ \leq H_{1}^{'} e^{- A_{4} λ (1)} + P_{θ} (ρ (P_{I_{o}} ξ) > A_{7} p (I_{o})) \\ \leq H_{1}^{'} e^{- A_{4} λ (1)} + H_{0} e^{- A_{8} λ (1)} . \end{aligned}$ (24) ) yields that, uniformly in $θ \in Θ_{eb} (t)$ , $\begin{aligned} P_{θ} (θ \notin B (\hat{θ}, M_{2} \hat{r})) \\ \leq P_{θ} (ρ (θ - \hat{θ}) > M_{2} \hat{r}, \hat{r} \geq δ p (I_{o})) + P_{θ} (\hat{r} < δ p (I_{o})) \\ \leq P_{θ} (ρ (θ - \hat{θ}) > M_{1} (t + ϰ) p (I_{o})) + P_{θ} (p (\hat{I}) < δ p (I_{o})) \\ \leq P_{θ} (ρ (θ - \hat{θ}) > M_{1} r (θ)) + P_{θ} (p (\hat{I}) < δ p (I_{o})) \\ \leq H_{1} e^{- m_{1} λ (1)} + H_{1}^{'} e^{- A_{4} λ (1)} + H_{0} e^{- A_{8} λ (1)} . \end{aligned}$ The coverage relation follows.

Let us show the size property. Recall again the event $E_{3} = {ρ (P_{\hat{I}} ξ) \leq A_{2} p (\hat{I})}$ from the proof of the previous theorem. By using (Equation23(23) $\begin{aligned} C_{τ} ρ (P_{I_{o}} ξ) & \geq κ (p (\hat{I}) - p (I^{'})) - A_{2} p (\hat{I}) + ρ (P_{{\hat{I}}^{c}} θ) - ρ (P_{I^{' c}} θ) \\ \geq (ϰ \frac{1 - δ}{1 + δ} - κ) p (\hat{I} \cup I_{o}) \\ \geq (ϰ \frac{1 - δ}{1 + δ} - κ) p (I_{o}) > C_{τ} A_{7} p (I_{o}) . \end{aligned}$ (23) ) with $I^{'} = I_{o}$ , we derive that the event ${p (\hat{I}) \geq M_{3} r (θ)} \cap E_{3}$ implies the event ${C_{τ} ρ (P_{I_{o}} ξ) \geq (κ - A_{2}) p (\hat{I}) - r (θ) \leq A_{9} p (\hat{I}) \geq A_{9} M_{3} r (θ) \geq A_{9} M_{3} ϰ p (I_{o})}$ with $A_{9} = κ - A_{2} - M_{3}^{- 1} > 0$ , where $A_{2}$ is defined in the proof of Theorem 3.1. We thus have that, for any $θ \in R^{n}$ , $\begin{aligned} P_{θ} (\hat{r} \geq M_{3} r (θ)) & \leq P_{θ} (E_{3}^{c}) + P_{θ} (\hat{r} \geq M_{3} r (θ), E_{3}) \\ = P_{θ} (E_{3}^{c}) + P_{θ} (p (\hat{I}) \geq M_{3} r (θ), E_{3}) \\ \leq P_{θ} (E_{3}^{c}) + P_{θ} (p (\hat{I}) \geq M_{3} r (θ), C_{τ} ρ (P_{I_{o}} ξ) \geq A_{9} p (\hat{I})) \\ \leq P_{θ} (E_{3}^{c}) + P_{θ} (C_{τ} ρ (P_{I_{o}} ξ) \geq A_{9} p (\hat{I}) \geq A_{9} M_{3} ϰ p (I_{o})) \\ \leq H_{1}^{'} e^{- A_{4} λ (1)} + H_{0} e^{- A_{10} λ (1)}, \end{aligned}$ where the last inequality in the above display is obtained by (Equation21(21) $\begin{aligned} T_{2} & \leq P_{θ} (E_{3}^{c}) \leq \sum_{I \in I} P (ρ (P_{I} ξ) > A_{2} p (I)) \leq H_{ξ} \sum_{I \in I : | I | \geq 1} e^{- A_{3} λ (| I |)} \\ \leq H_{ξ} e^{- A_{4} λ (1)} \sum_{I \in I : | I | \geq 1} e^{- (1 + A_{4}) λ (| I |)} \leq H_{1}^{'} e^{- A_{4} λ (1)} . \end{aligned}$ (21) ) and Condition C1 (Equation7(7) $sup_{θ \in R^{n}} P_{θ} (ρ (P_{I} ξ) > M p (I)) \leq H_{0} e^{- (ψ (M) - M_{0}) λ (| I |)} .$ (7) ), and $M_{3}$ is chosen to be so large that $A_{10} = ψ (A_{9} M_{3} ϰ / C_{τ}) - M_{0} > 0$ . The size relation follows.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

Abramovich, F., Benjamini, Y., Donoho, D.L., and Johnstone, I.M. (2006), ‘Adapting to Unknown Sparsity by Controlling the False Discovery Rate’, The Annals of Statistics, 34, 584–653.
Web of Science ®Google Scholar
Abrevaya, J. (2002), ‘The Effects of Demographics and Maternal Behavior on the Distribution of Birth Outcomes’, in Economic Applications of Quantile Regression, eds. B. Fitzenberger, R. Koenker, and J. A. F. Machado, Berlin Heidelberg: Springer-Verlag, pp. 247–257.
Google Scholar
Babenko, A., and Belitser, E. (2010), ‘Oracle Convergence Rate of Posterior Under Projection Prior and Bayesian Model Selection’, Mathematical Methods of Statistics, 19, 219–245.
Google Scholar
Baraud, Y. (2004), ‘Confidence Balls in Gaussian Regression’, The Annals of Statistics, 32, 528–551.
Web of Science ®Google Scholar
Belitser, E. (2017), ‘On Coverage and Local Radial Rates of Credible Sets’, The Annals of Statistics, 45, 1124–1151.
Web of Science ®Google Scholar
Belitser, E., and Ghosal, S. (2020), ‘Empirical Bayes Oracle Uncertainty Quantification for Regression’, The Annals of Statistics, 31, 536–559.
Google Scholar
Belitser, E., and Nurushev, N. (2019), General Framework for Projection Structures. ArXiv: 1904.01003.
Google Scholar
Belitser, E., and Nurushev, N. (2020), ‘Needles and Straw in a Haystack: Robust Empirical Bayes Confidence for Possibly Sparse Sequences’, Bernoulli, 26, 191–225.
Web of Science ®Google Scholar
Bellec, P.C. (2018), ‘Sharp Oracle Inequalities for Least Squares Estimators in Shape Restricted Regression’, The Annals of Statistics, 46, 745–780.
Web of Science ®Google Scholar
Belloni, A., and Chernozhukov, V. (2011), ‘L1-Penalized Quantile Regression in High-Dimensional Sparse Models’, The Annals of Statistics, 39, 82–130.
Web of Science ®Google Scholar
Belloni, A., Chernozhukov, V., and Kato, K. (2019), ‘Valid Post-Selection Inference in High-Dimensional Approximately Sparse Quantile Regression Models’, Journal of the American Statistical Association, 114, 749–758.
Web of Science ®Google Scholar
Belloni, A., Chernozhukov, V., and Wang, L. (2011), ‘Square-root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming’, Biometrika, 98, 791–806.
Web of Science ®Google Scholar
Benjamini, Y., and Hochberg, Y. (1995), ‘Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing’, Journal of the Royal Statistical Society: Series B, 57, 289–300.
Web of Science ®Google Scholar
Birgé, L., and Massart, P. (2001), ‘Gaussian Model Selection’, Journal of the European Mathematical Society, 3, 203–268.
Google Scholar
Butucea, C., Ndaoud, M., Stepanova, N., and Tsybakov, A. (2018), ‘Variable Selection with Hamming Loss’, The Annals of Statistics, 46, 1837–1875.
Web of Science ®Google Scholar
Castillo, I., and van der Vaart, A. (2012), ‘Needles and Straw in a Haystack: Posterior Concentration for Possibly Sparse Sequences’, The Annals of Statistics, 40, 2069–2101.
Web of Science ®Google Scholar
Donoho, D.L., and Johnstone, I.M. (1994), ‘Minimax Risk Over ℓp-Balls for ℓq-Error’, Probability Theory and Related Fields, 99, 277–303.
Web of Science ®Google Scholar
Donoho, D.L., Johnstone, I.M., Hoch, J.C., and Stern, A.S. (1992), ‘Maximum Entropy and the Nearly Black Object (with Discussion)’, Journal of the Royal Statistical Society: Series B, 54, 41–81.
Google Scholar
Efron, B. (2008), ‘Microarrays, Empirical Bayes and the Two-Groups Model’, Statistical Science, 23, 1–22.
Web of Science ®Google Scholar
Gabriela Ciuperca, G. (2018), ‘Test by Adaptive Lasso Quantile Method for Real-Time Detection of a Change-Point’, Metrika, 81, 689–720.
Web of Science ®Google Scholar
Hulmán, A., Witte, D.R., Kerényi, Z, Madarász, E., Tánczer, T., Bosnyák, Z, Szabó, E., Ferencz, V., Péterfalvi, A., Tabák, A.G, and Nyári, T.A. (2015), ‘Heterogeneous Effect of Gestational Weight Gain on Birth Weight: Quantile Regression Analysis From a Population-Based Screening’, Annals of Epidemiology, 25, 133–137.
PubMed Web of Science ®Google Scholar
Jiang, L., Bondell, H.D., and Wang, H.J. (2014), ‘Interquantile Shrinkage and Variable Selection in Quantile Regression’, Computational Statistics & Data Analysis, 69, 208–219.
PubMed Web of Science ®Google Scholar
Jiang, L., Wang, H.J., and Bondell, H.D. (2013), ‘Interquantile Shrinkage in Regression Models’, Journal of Computational and Graphical Statistics, 22, 970–986.
Web of Science ®Google Scholar
Johnstone, I.M. (2017), Gaussian Estimation: Sequence and Wavelet Models, Book Draft.
Google Scholar
Johnstone, I.M., and Silverman, B.W. (2004), ‘Needles and Straw in Haystacks: Empirical Bayes Estimates of Possibly Sparse Sequences’, The Annals of Statistics, 32, 1594–1649.
Web of Science ®Google Scholar
Koenker, R. (2005), Quantile Regression, Cambridge: Cambridge University Press.
Google Scholar
Li, K.-C. (1989), ‘Honest Confidence Regions for Nonparametric Regression’, The Annals of Statistics, 17, 1001–1008.
Web of Science ®Google Scholar
Martin, R., and Walker, S.G. (2014), ‘Asymptotically Minimax Empirical Bayes Estimation of a Sparse Normal Mean Vector’, Electronic Journal of Statistics, 8, 2188–2206.
Web of Science ®Google Scholar
Santosa, F., and Symes, W.W. (1986), ‘Linear Inversion of Band-Limited Reflection Seismograms’, SIAM Journal on Scientific and Statistical Computing, 7, 1307–1330.
Web of Science ®Google Scholar
Szabó, B.T., van der Vaart, A.W., and van Zanten, J.H. (2015), ‘Frequentist Coverage of Adaptive Nonparametric Bayesian Credible Sets’, The Annals of Statistics, 43, 1391–1428.
Web of Science ®Google Scholar
van der Pas, S.L., Szabó, B.T., and van der Vaart, A.W. (2017), ‘Uncertainty Quantification for the Horseshoe (with Discussion)’, Bayesian Analysis, 12, 1221–1274.
Web of Science ®Google Scholar
Vershynin, R. (2018), High-Dimensional Probability. An Introduction with Applications in Data Science, Cambridge: Cambridge University Press.
Google Scholar
Zou, H., and Yuan, M. (2008), ‘Composite Quantile Regression and the Oracle Model Selection Theory’, The Annals of Statistics, 36, 1108–1126.
Web of Science ®Google Scholar

Robust oracle estimation and uncertainty quantification for possibly sparse quantiles

Abstract

1. Introduction

2. Robust model formulation and preliminaries

3. Main results

Estimation

Confidence ball

4. A simulation study

5. Proofs

Disclosure statement

References

Information for

Open access

Opportunities

Help and information

Robust oracle estimation and uncertainty quantification for possibly sparse quantiles

Abstract

1. Introduction

2. Robust model formulation and preliminaries

3. Main results

Estimation

Confidence ball

4. A simulation study

5. Proofs

Disclosure statement

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date