Browse
We’re here to help

Find guidance on Author Services

Search
Browse
We’re here to help

Find guidance on Author Services

Home
All Journals
Multivariate Behavioral Research
List of Issues
Volume 43, Issue 1
A New Variable Weighting and Selection P ....

Your download is now in progress and you may close this window

Did you know that with a free Taylor & Francis Online account you can gain access to the following benefits?

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Have an account?
Login now Don't have an account?
Register for free

Login or register to access this feature

Have an account?
Login now Don't have an account?
Register for free

Choose new content alerts to be informed about new research of interest to you
Easy remote access to your institution's subscriptions on any device, from any location
Save your searches and schedule alerts to send you new results
Export your search results into a .csv file to support your research

Search in:

Advanced search

Multivariate Behavioral Research Volume 43, 2008 - Issue 1

Submit an article Journal homepage

1,001

Views

CrossRef citations to date

Altmetric

Original Articles

A New Variable Weighting and Selection Procedure for K-means Cluster Analysis

Douglas Steinley University of Missouri-Columbia

Michael J. Brusco Florida State University

Pages 77-108 | Published online: 19 Mar 2008

Cite this article
https://doi.org/10.1080/00273170701836695

Sample our Behavioral Sciences journals, sign in here to start your access, latest two full volumes FREE to you for 14 days

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions
Read this article /doi/full/10.1080/00273170701836695?needAccess=true

Abstract

A variance-to-range ratio variable weighting procedure is proposed. We show how this weighting method is theoretically grounded in the inherent variability found in data exhibiting cluster structure. In addition, a variable selection procedure is proposed to operate in conjunction with the variable weighting technique. The performances of these procedures are demonstrated in a simulation study, showing favorable results when compared with existing standardization methods. A detailed demonstration of the weighting and selection procedure is provided for the well-known Fisher Iris data and several synthetic data sets.

Notes

Note that the scaling of the variables using RC _j precludes comparison of RC _j values across data sets (each data set will always have an RC _j = 1. However, if so desired, variables could be compared across data sets using the ¹ M _j values. If employing this strategy, we recommend the standard cautions for comparing variables that were measured on different entities.

* = Best method.

²The proof by CitationSteinley and Henson (2005) is straightforward. Assuming each of the variables are independent, the marginal probability of overlap is defined as a specific integral on the variable of interest. The independence of variables assumption allows the overlap of the joint distribution to be calculated as the product of the values observed at the level of the marginal distributions. Thus, higher marginal overlap leads to higher joint overlap. For example, three variables having a marginal probability of overlap of .10 would have a joint probability of overlap of .001, whereas the same three variables having marginal probability of overlap of .25 would have a joint probability of overlap of (.25)³≈ .02, about 20 times that of the previous condition.

* p ≤ .0001, two-tailed.

* = < .01.

* = Best method.

* p ≤ .0001, two-tailed.

³Before employing this standardization technique, the user should carefully consider the nature of each of the variables as it can influence the RC values for each of the variables. As can seen by Equation Equation10, a binary variable with an equal number of 0's and 1's would have the greatest RC value possible; however, as the number of 0's and 1's becomes increasingly different, it becomes possible for the RC value of a binary variable to be lower than that of a continuous variable. Nonetheless, the user should be aware of the potential overweighting of discrete variables (with discrete variables with the fewest categories having the most potential for overweighting) when analyzing data sets with mixed data sets. To protect against this problem, the variable selection procedure should be implemented (see CitationBrusco, 2004, for a similar implementation that worked very effectively in the presence of binary variables).

^aThe best solution for each subset size.

Steinley , D. and Henson , R. 2005 . OCLUS: An analytic method for generating clusters with known overlap . Journal of Classification , 22 : 221 – 250 .

Web of Science ®Google Scholar

Brusco , M. J. 2004 . Clustering binary data in the presence of masking variables . Psychological Methods , 9 : 510 – 523 .

PubMed Web of Science ®Google Scholar

Log in via your institution

Access through your institution

Log in to Taylor & Francis Online

Shibboleth

Log in to Taylor & Francis Online

Username Password

Forgot password?

Keep me logged in (not suitable for shared devices).

You will otherwise be logged out automatically, after a limited period, and will need to log in again.

Restore content access

Restore content access for purchases made as guest

Purchase options * Save for later Item saved, go to cart

PDF download + Online access

48 hours access to article PDF & online version
Article PDF can be downloaded
Article PDF can be printed

USD 53.00 Add to cart

PDF download + Online access - Online Checkout

Issue Purchase

30 days online access to complete issue
Article PDFs can be downloaded
Article PDFs can be printed

USD 352.00 Add to cart

Issue Purchase - Online Checkout

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references