271
Views
8
CrossRef citations to date
0
Altmetric
Original Articles

Principal Cluster Axes: A Projection Pursuit Index for the Preservation of Cluster Structures in the Presence of Data Reduction

, &
Pages 463-492 | Published online: 15 Jun 2012
 

Abstract

A measure of “clusterability” serves as the basis of a new methodology designed to preserve cluster structure in a reduced dimensional space. Similar to principal component analysis, which finds the direction of maximal variance in multivariate space, principal cluster axes find the direction of maximum clusterability in multivariate space. Furthermore, the principal clustering approach falls into the class of projection pursuit techniques. Comparisons are made with existing methodologies both in a simulation study and analysis of real-world data sets. Furthermore, a demonstration of how to interpret the results of the principal cluster axes is provided on the analysis of Supreme Court voting data and similarities between the interpretation of competing procedures (e.g., factor analysis and principal component analysis) are provided. In addition to the Supreme Court analysis, we analyze several data sets often used to test cluster analysis procedures, including Fisher's Iris data, Agresti's Crab data, and a data set on glass fragments. Finally, discussion is provided to help determine when the proposed procedure will be the most beneficial to the researcher.

Notes

1Additionally, in the subsequently described procedure, we screen for outlying observations in the projected data.

2Note here that what is really of concern is that the linear combinations themselves are orthogonal (i.e., c v c k k < v), not necessarily that the projections (i.e., Xc v ) are orthogonal.

3Note that this procedure is similar to the procedure denoted as parallel analysis by CitationHorn (1965) for determining the number of factors in factor analysis.

*Indicates best performing method for that factor level.

*p ≤ .0001, two-tailed.

The effect size was computed assuming independent groups to protect against overinflating the estimate as recommended by Dunlop, Cortina, Vaslow, and Burke (1996).

4In the present situation, inspection of the graph would lead to the conclusion of a weak cluster structure as the middle of the point cloud is fairly sparse—corresponding to the moderate to low ARI for the principal cluster structure.

5The data being analyzed can be obtained from the Supreme Court database (http://scdb.wustl.edu/data.php)

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 352.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.