271
Views
8
CrossRef citations to date
0
Altmetric
Original Articles

Principal Cluster Axes: A Projection Pursuit Index for the Preservation of Cluster Structures in the Presence of Data Reduction

, &
Pages 463-492 | Published online: 15 Jun 2012
 

Abstract

A measure of “clusterability” serves as the basis of a new methodology designed to preserve cluster structure in a reduced dimensional space. Similar to principal component analysis, which finds the direction of maximal variance in multivariate space, principal cluster axes find the direction of maximum clusterability in multivariate space. Furthermore, the principal clustering approach falls into the class of projection pursuit techniques. Comparisons are made with existing methodologies both in a simulation study and analysis of real-world data sets. Furthermore, a demonstration of how to interpret the results of the principal cluster axes is provided on the analysis of Supreme Court voting data and similarities between the interpretation of competing procedures (e.g., factor analysis and principal component analysis) are provided. In addition to the Supreme Court analysis, we analyze several data sets often used to test cluster analysis procedures, including Fisher's Iris data, Agresti's Crab data, and a data set on glass fragments. Finally, discussion is provided to help determine when the proposed procedure will be the most beneficial to the researcher.

Notes

1Additionally, in the subsequently described procedure, we screen for outlying observations in the projected data.

2Note here that what is really of concern is that the linear combinations themselves are orthogonal (i.e., c v c k k < v), not necessarily that the projections (i.e., Xc v ) are orthogonal.

3Note that this procedure is similar to the procedure denoted as parallel analysis by CitationHorn (1965) for determining the number of factors in factor analysis.

*Indicates best performing method for that factor level.

*p ≤ .0001, two-tailed.

The effect size was computed assuming independent groups to protect against overinflating the estimate as recommended by Dunlop, Cortina, Vaslow, and Burke (1996).

4In the present situation, inspection of the graph would lead to the conclusion of a weak cluster structure as the middle of the point cloud is fairly sparse—corresponding to the moderate to low ARI for the principal cluster structure.

5The data being analyzed can be obtained from the Supreme Court database (http://scdb.wustl.edu/data.php)

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.