4,323
Views
50
CrossRef citations to date
0
Altmetric
Focus Article

Rethinking Traditional Methods of Survey Validation

 

ABSTRACT

It is commonly believed that self-report, survey-based instruments can be used to measure a wide range of psychological attributes, such as self-control, growth mindsets, and grit. Increasingly, such instruments are being used not only for basic research but also for supporting decisions regarding educational policy and accountability. The validity of such instruments is typically investigated using a classic set of methods, including the examination of reliability coefficients, factor or principal components analyses, and correlations between scores on the instrument and other variables. However, these techniques may fall short of providing the kinds of rigorous, potentially falsifying tests of relevant hypotheses commonly expected in scientific research. This point is illustrated via a series of studies in which respondents were presented with survey items deliberately constructed to be uninterpretable, but the application of the aforementioned validation procedures nonetheless returned favorable-appearing results. In part, this disconnect may be traceable to the way in which operationalist modes of thinking in the social sciences have reinforced the perception that attributes do not need to be defined independently of particular sets of testing operations. It is argued that affairs might be improved via greater attention to the manner in which definitions of psychological attributes are articulated and greater openness to treating beliefs about the existence and measurability of psychological attributes as hypotheses rather than assumptions—in other words, as beliefs potentially subject to revision.

Notes

2. In all analyses, the negatively worded items were reverse-coded (e.g., with strongly disagree receiving the maximum score rather than the minimum score).

4. All correlations reported in this paper are between raw scores; none have been disattenuated for measurement error. If a disattenuation formula were applied, the correlations reported in these three sections would appear larger.

5. Lorem ipsum text, which is commonly used as placeholder text in publishing and graphic design applications, is itself derived from sections 1.10.32 and 1.10.33 of “de Finibus Bonorum et Malorum” (The Extremes of Good and Evil) by Cicero, written in 45 BC. However, words are scrambled, added, removed, and altered, rendering the final text unintelligible even to someone well-versed in Latin. As a check, all eight items employed in this study were submitted to Google Translate, which failed to return any meaningful translations.

6. Clearly it does happen at least some of the time that these validation procedures do return results that are not entirely positive and that at least some of the time these results are in turn used to improve the quality of the instrument. The argument in this paper does not aim to establish that these procedures are without value nor that they are categorically incapable of returning negative results in the presence of low-quality instrumentation—only that they cannot be relied upon to do so.

7. Seen this way, the fact that the correlation between the two scores is anything less than perfect could be explained by changes over time in real perseverance, method effects, or specific variance attributable to the two different measures or invalidity or unreliability of either measure (or, of course, some combination of these).

8. For a vivid illustration of this point, see Arnulf et al. (Citation2014).

9. For a more thorough review of operationalism than is possible here, see Chang (2009); for general critiques, see Green (Citation1992) and Bickhard (Citation2001); for critiques of operationalism in the social sciences in particular, see Markus and Borsboom (Citation2013) and Michell (Citation1990).

10. There are of course many other definitions of self-control that could have been used here; this one is chosen purely for illustrative reasons.

11. For example, yet another possibility is that “self-control” refers neither to a disposition nor a capacity of persons—perhaps not even an attribute of persons at all—but, rather, is a shorthand way of referring to an inductive summary of a (potentially very large or possibly even infinite) set of (actual and possible) behaviors on the part of the individual. Such an approach effectively circumvents any form of cognitive theory concerning the explanation for (variation in) such behaviors and is thus broadly inconsistent with the postcognitive revolution aims of psychological science, but may still be useful for purely descriptive purposes. Additionally, it would seem strained at best to refer to such an approach as measurement, especially if one understands measurement as a causal relation between (variation in) an attribute and (variation in) the observed outcomes of a procedure, a la Borsboom et al. (Citation2004).

12. Arguably, this is exactly what happened in the literature on emotional intelligence; a considerable amount of energy was spent attempting to differentiate typical-behavior and maximum-performance models of EI (e.g., Maul, Citation2012).

13. One possible explanation for the pervasiveness of this fallacy in the social sciences is that accounts of psychological measurement often analogize from the physical sciences, and specifically from concatenable attributes such as height, in which it is demonstrably the case that the structures of interindividual and of intraindividual differences are identical.

14. Additional studies reported in the same manuscript demonstrate that scores on the grit scale can predict some portion (an average of 4%) of the variance in other outcomes, such as success in a spelling bee competition. It is not clear whether these studies are meant to provide further evidence of the validity of the instrument used to measure grit or to test hypotheses regarding the effects of grit (which would require presupposing that the instrument is validly measuring grit).

15. The belief that measurement is universally necessary is associated with a second, related belief, which is that measurement is universally possible—reflected, for example, in the oft-repeated claims that “whatever exists at all exists in some amount” (Thorndike, Citation1918, p. 16) and “anything that exists in amount can be measured” (McCall, Citation1939, p. 15). This belief is highly debatable, and whether it can be considered credible depends on how one understands the concept of measurement and its requirements (cf., Michell, Citation2005, Markus & Borsboom, Citation2013).

16. Elsewhere, this belief is asserted even more directly as “you cannot study what you cannot measure” (http://angeladuckworth.com/qa/).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.