480
Views
1
CrossRef citations to date
0
Altmetric
Graphical Methods

Hole or Grain? A Section Pursuit Index for Finding Hidden Structure in Multiple Dimensions

ORCID Icon, ORCID Icon, &
Pages 739-752 | Received 05 May 2020, Accepted 22 Jan 2022, Published online: 24 Mar 2022
 

Abstract

Multivariate data is often visualized using linear projections, produced by techniques such as principal component analysis, linear discriminant analysis, and projection pursuit. A problem with projections is that they obscure low and high density regions near the center of the distribution. Sections, or slices, can help to reveal them. This article develops a section pursuit method, building on the extensive work in projection pursuit, to search for interesting slices of the data. Linear projections are used to define sections of the parameter space, and to calculate interestingness by comparing the distribution of observations, inside and outside a section. By optimizing this index, it is possible to reveal features such as holes (low density) or grains (high density). The optimization is incorporated into a guided tour so that the search for structure can be dynamic. The approach can be useful for problems when data distributions depart from uniform or normal, as in visually exploring nonlinear manifolds, and functions in multivariate space. Two applications of section pursuit are shown: exploring decision boundaries from classification models, and exploring subspaces induced by complex inequality conditions from a multiple parameter model. The new methods are available in R, in the tourr package. Supplementary materials for this article are available online.

Supplementary Materials

  • Code and data is available at https://github.com/uschiLaa/paper-section-pursuit.

  • The Appendix contains the derivation of the radial cdf of a hypersphere projected onto a two-dimensional plane and the equations used to calculate the masses in the two-Higgs-doublet model, and an additional example from physics.

Acknowledgments

The authors gratefully acknowledge the support of the Australian Research Council. This article was created with knitr (Xie Citation2015) and R Markdown (Xie, Allaire, and Grolemund Citation2018) with embedded code, using the tidyverse (Wickham et al. Citation2019) packages. We thank the Wharton Statistics Department at the University of Pennsylvania for their hospitality while part of this work was conducted and Buja was on their faculty.

Notes

1 When a parameter does not contribute to the current projection, a point needs to have an observed value near the corresponding mean value to be inside the slice. The relation becomes much more complex for parameters that have a nonnegligible contribution to the current projection.

Additional information

Funding

The authors gratefully acknowledge the support of the Australian Research Council. This article was created with knitr (Xie Citation2015) and R Markdown (Xie, Allaire, and Grolemund Citation2018) with embedded code, using the tidyverse (Wickham et al. Citation2019) packages. We thank the Wharton Statistics Department at the University of Pennsylvania for their hospitality while part of this work was conducted and Buja was on their faculty.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.