Abstract
In science, multiple measures of the same constructs can be useful, but they are unlikely to all be equally valid indicators. In psychological assessment, the many popular personality inventories available in the marketplace also may be useful, but their comparative validity has long remained unassessed. This is the first comprehensive comparison of 11 such multiscale instruments against each of three types of criteria: clusters of behavioral acts, descriptions by knowledgeable informants, and clinical indicators potentially associated with various types of psychopathology. Using 1,000 bootstrap resampling analyses from a sample of roughly 700 adult research participants, we assess the relative predictability of each criterion and the comparative validity of each inventory. Although there was a wide range of criterion predictability, most inventories exhibited quite similar cross-validities when averaged across all three types of criteria. On the other hand, there were important differences between inventories in their predictive capabilities for particular criteria. We discuss the factors that lead to differential validity across predictors and criteria.
ACKNOWLEDGMENTS
Support for this project was provided by Grant MH-49227 (LRG) and by K01-DA-16618 (RAG) from the National Institutes of Health. A portion of our report on Study 1 in this article was based on a chapter that originally was prepared by Goldberg for a Handbook of Adult Personality Inventories, to have been edited by Briggs, Cheek, and Donahue, but which is no longer under development; some findings from that original chapter were later summarized by Wiggins (Citation2003; pp. 146–148).
With the exception of the NEO-PI-R (for whose use we were charged), access to each of the other inventories was provided to the project free-of-charge by the test developer or the test publisher (along with scoring services providing item responses and scale scores) or both, in exchange for access to project data. The authors are extremely grateful to Harrison Gough, Heather Cattell, Robert Hogan, Robert Cloninger, Auke Tellegen, and Douglas N. Jackson for providing us with the CPI, 16PF, HPI, TCI, MPQ, JPI-R, and 6FPQ inventories.
In addition, the authors are grateful to each of the following colleagues for their thoughtful criticisms and suggestions: Jack Block, Shawn Boles, Michael H. Bond, Matthias Burisch, David Buss, David Campbell, Heather Cattell, William F. Chaplin, C. Robert Cloninger, Robyn M. Dawes, Eileen Donahue, Vincent Donnelly, Herbert Eber, Donald W. Fiske, David Funder, Robert Guion, Sarah E. Hampson, Paul J. Hoffman, Willem K. B. Hofstee, Robert Hogan, John A. Johnson, Edward Lichtenstein, Clarence McCormick, William McConochie, Alan Mead, Gregory J. Meyer, Dean Peabody, Robert Perloff, Steven Reise, William Revelle, Brent Roberts, Leonard G. Rorer, James A. Russell, Gerard Saucier, Oya Somer, Dennis Sweeney, Auke Tellegen, Thomas M. Vogt, Niels Waller, Erika Westling, Jerry S. Wiggins, and Richard Zinbarg.
Notes
1This classic view of the bandwidth–fidelity issue, however, has been questioned by CitationBurisch (1984b) and more recently by CitationMcGrath (2005).
2Preliminary analyses were conducted in which a criterion was predicted from all of the inventories, using the bootstrap cross-validation procedure that we described. In general, no substantial improvement in the cross-validated multiple correlation coefficients occurred after five steps of variable selection, but five-variable models were generally more effective than less exhaustive models. The choice of five steps is not necessarily optimal for all inventories and all criteria, but improvements in the coefficients beyond five steps are likely to be very small, on the order of .01 or less. Hence, we chose five step variable selection as a balance between more complex models, which tend to overfit the data, and simpler models that do not include enough of the predictive variance.
3Obviously, in predicting these clinical criteria, instruments developed specifically to measure various aspects of psychopathology, such as the Minnesota Multiphasic Personality Inventory (MMPI), should have an advantage over the broad-bandwidth inventories here under comparison. On the other hand, we assume that the MMPI and other such inventories may not be particularly appropriate for use in many nonclinical settings.
4These remarkably small differences in average cross-validity among the personality inventories are reminiscent of the findings from a related body of literature, namely, the average cross-validities of inventories developed from different scale-construction strategies, using the same item pool (e.g., CitationGoldberg, 1972; CitationHase & Goldberg, 1967). CitationBurisch (1984a), who provided an overview of 15 such comparative-validity studies, concluded, “A review of more than a dozen comparative studies revealed no consistent superiority of any strategy in terms of validity or predictive effectiveness” (p. 214). We can now say much the same thing for the 11 different inventories here under comparison.