Full article: Comment on ‘Review of sparse sufficient dimension reduction’

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
View PDF PDF View EPUB EPUB

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

We congratulate the authors on a very interesting overview of sparse sufficient dimension reduction (SDR). Sparse SDR methods are discussed in both the classical n>p setting as well as the high-dimensional p>n setting. Related topics such as model-free variable selection and variable screening are also discussed in a most logical fashion. Last but not least, two new methodological contributions are made in this review paper. Namely, new variable screening methods are proposed as extensions of Yu et al. (Citation2016), and novel sparse SDR methods are discussed following the sparse SIR in Tan et al. (Citation2020).

While all the methods discussed in this review are in the frequentist domain, our comment will focus on sparse SDR through Bayesian methods. Reich et al. (Citation2011) proposed an SDR approach via Bayesian mixture modelling. Take single-index model as an example. Let ${(x_{i}, Y_{i}), i = 1, \dots, n}$ be an i.i.d. sample from $(x, Y)$ . Let $β \in R^{p}$ be a basis for the central subspace and let $λ_{i} = β^{T} x_{i}$ be the sufficient predictor. Then the conditional distribution of $Y_{i}$ given $x_{i}$ can be modelled as (1) $p (Y_{i} | λ_{i}) = \sum_{k = 1}^{K} p_{k} (λ_{i}) N (μ_{k}, σ_{y}^{2}),$ (1) where there are K normal mixture components and $p_{k} (λ_{i})$ denotes the weight of the kth component. By choosing the weights carefully, model (Equation1(1) $p (Y_{i} | λ_{i}) = \sum_{k = 1}^{K} p_{k} (λ_{i}) N (μ_{k}, σ_{y}^{2}),$ (1) ) can be expressed as ${\begin{cases} Y_{i} \sim N (μ_{g_{i}}, σ_{y}^{2}) \\ g_{i} = k & if ψ_{k} < Z_{i} < ψ_{k + 1} \\ Z_{i} \sim N (λ_{i}, σ_{z}^{2}) . \end{cases}$ Here $Z_{i}$ is a latent continuous variable, $- \infty = ψ_{1} < ψ_{2} < \dots < ψ_{K + 1} = \infty$ are cutpoints. By placing a prior on $β$ and the cutpoints, one can compute the conditional distributions and carry out the full Bayesian analysis. For SDR without sparsity, the prior for $β = (β_{1}, \dots, β_{p})^{T}$ is set as $β_{j} \sim N (0, 1)$ , $j = 1, \dots, p$ . To introduce sparsity, a two-component mixture prior is assumed as ${\begin{cases} β_{j} \sim N (0, π_{j} + c^{2} (1 - π_{j})) \\ π_{j} \sim Ber (\bar{π}), \end{cases}$ where 0<c<1 is a fixed constant and $\bar{π}$ is the prior inclusion probability. If $π_{j} = 1$ , then the jth predictor is included in the model. Otherwise the jth predictor is removed from the model. Reich et al. (Citation2011) also discussed a similar Bayesian mixture model for the multiple-index model, and the details are omitted.

Motivated by a frequentist SDR method recently proposed by Fang and Yu (Citation2020), Power and Dong (Citation2020) proposed a new Bayesian approach for sparse sufficient dimension reduction. Let $Var (x) = Σ$ , $E (x) = μ$ , and denote ${J_{1}, \dots, J_{H}}$ as a partition for the support of Y. The classical sliced inverse regression (SIR) (Li, Citation1991) uses the kernel matrix $M_{SIR} = \sum_{h = 1}^{H} ξ_{h} ξ_{h}^{T} / p_{h}$ , where $ξ_{h} = Σ^{- 1} E {(x - μ) δ_{h}}$ with $δ_{h} = I (Y \in J_{h})$ and $p_{h} = E (δ_{h})$ for $h = 1, \dots, H$ . Note that $ξ_{h}$ can be solved as an optimisation problem (2) $ξ_{h} = \underset{γ \in R^{p}}{argmin} E [{δ_{h} - p_{h} - γ^{T} (x - μ)}^{2}] .$ (2) Fang and Yu (Citation2020) then applied the Mallows model averaging (MMA) of Hansen (Citation2007) to solve the least squares problem (Equation2(2) $ξ_{h} = \underset{γ \in R^{p}}{argmin} E [{δ_{h} - p_{h} - γ^{T} (x - μ)}^{2}] .$ (2) ). Power and Dong (Citation2020) utilised Bayesian model averaging (BMA) (Raftery et al., Citation1997) to solve (Equation2(2) $ξ_{h} = \underset{γ \in R^{p}}{argmin} E [{δ_{h} - p_{h} - γ^{T} (x - μ)}^{2}] .$ (2) ) instead. Similar to MMA, BMA works well for sparse models and may also adapt to models with dense signals. Furthermore, instead of solving for $ξ_{h}$ , $h = 1, \dots, H$ , individually, we may solve for them jointly. Let $W = (ξ_{1}, \dots, ξ_{H})$ and $U = (δ_{1} - p_{1}, \dots, δ_{H} - p_{H})^{T}$ . Then we have (3) $\begin{aligned} W = \underset{Θ \in R^{p \times H}}{argmin} E [{U - Θ^{T} (x - μ)}^{T} {U - Θ^{T} (x - μ)}] . \end{aligned}$ (3) To the best of our knowledge, there is no frequentist model averaging approach to solve (Equation3(3) $\begin{aligned} W = \underset{Θ \in R^{p \times H}}{argmin} E [{U - Θ^{T} (x - μ)}^{T} {U - Θ^{T} (x - μ)}] . \end{aligned}$ (3) ). On the other hand, multi-response BMA (Brown et al., Citation1998) can be easily adapted to solve (Equation3(3) $\begin{aligned} W = \underset{Θ \in R^{p \times H}}{argmin} E [{U - Θ^{T} (x - μ)}^{T} {U - Θ^{T} (x - μ)}] . \end{aligned}$ (3) ). As shown in Power and Dong (Citation2020), the multi-response BMA outperforms the frequentist MMA for SDR.

We congratulate the authors again for providing a stimulating review of existing sparse SDR techniques, which should motivate further development of new SDR methods. It is our belief that more Bayesian approaches may be applied for this cause.

References

Brown, P. J., Vannucci, M., & Fearn, T. (1998). Multivariate Bayesian variable selection and prediction. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(3), 627–641. https://doi.org/10.1111/rssb.1998.60.issue-3 doi: 10.1111/1467-9868.00144
Web of Science ®Google Scholar
Fang, F., & Yu, Z. (2020). Model averaging assisted sufficient dimension reduction. Computational Statistics and Data Analysis, 152. https://doi.org/10.1016/j.csda.2020.106993
Web of Science ®Google Scholar
Hansen, B. E. (2007). Least squares model averaging. Econometrica, 75(4), 1175–1189. https://doi.org/10.1111/ecta.2007.75.issue-4 doi: 10.1111/j.1468-0262.2007.00785.x
Web of Science ®Google Scholar
Li, K. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86(414), 316–327. https://doi.org/10.1080/01621459.1991.10475035
Web of Science ®Google Scholar
Power, M. D., & Dong, Y. (2020). Bayesian model averaging sufficient dimension reduction. Statistics and Probability Letters. Submitted.
Google Scholar
Raftery, A. E., Madigan, D., & Hoeting, J. A. (1997). Bayesian model averaging for linear regression models. Journal of the American Statistical Association, 92(437), 179–191. https://doi.org/10.1080/01621459.1997.10473615
Web of Science ®Google Scholar
Reich, B. J., Bondell, H. D., & Li, L. (2011). Sufficient dimension reduction via Bayesian mixture modeling. Biometrics, 67(3), 886–895. https://doi.org/10.1111/biom.2011.67.issue-3 doi: 10.1111/j.1541-0420.2010.01501.x
PubMed Web of Science ®Google Scholar
Tan, K., Shi, L., & Yu, Z. (2020). Sparse SIR: optimal rates and adaptive estimation. The Annals of Statistics, 48(1), 64–85. https://doi.org/10.1214/18-AOS1791
Web of Science ®Google Scholar
Yu, Z., Dong, Y., & Shao, J. (2016). On marginal sliced inverse regression for ultrahigh dimensional model-free feature selection. The Annals of Statistics, 44(6), 2594–2623. https://doi.org/10.1214/15-AOS1424
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Download PDF

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

Comment on ‘Review of sparse sufficient dimension reduction’

References

Information for

Open access

Opportunities

Help and information

Comment on ‘Review of sparse sufficient dimension reduction’

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date