Abstract
The scientific study of accuracy in personality judgment typically involves the utilization of rating scales to make absolute decisions about a target individual. Although this method has many merits, it restricts some experimental options and is further removed from ecological validity than one would desire. These studies represent an attempt to develop an alternative methodology for the study of personality judgment—specifically for use in explorations of judgment process. A series of photo sets containing pictures of 3 individuals, each representing a different level of a specific personality trait, was created. The participant's task was to select high and low scorers on a dimension from the photos. Study 1 demonstrates that people can select targets with extreme scores from a photo lineup at a rate better than chance across several personality dimensions. Study 2 shows that this ability has some degree of temporal consistency. Study 3 represents an improvement on the general method via enhanced criteria for stimulus selection, incorporating both self and peer reports.
Acknowledgments
I would like to thank David Watson and Evan Krauter for comments on early drafts of this article.
Notes
Due to differences in the distributions between FFM traits and the single-item indicators used to assess thriftiness and conservatism, the selection criteria for extreme scorers evenly across all judgment categories could not be applied. Thus, thriftiness and conservatism are represented by only two trials (as opposed to the four for each FFM trait). Additionally, in Study 3 the extreme scorers for positive and negative affect overlapped fairly strongly with those for extraversion and neuroticism, respectively. Thus, I was forced to similarly restrict the stimulus pool for those trials.
The original study design involved an instructional manipulation. Half of the participants were instructed to think carefully and provide notes about their choices, and the other half were told to choose quickly. There were no significant differences in accuracy across the two groups, nor was there a consistent pattern in terms of direction of nonsignificant differences. Thus, the data were pooled.
To explain the baseline probability, there are six orders of the three photos (H = true high, M = true middle, L = true low): HML, HLM, MHL, MLH, LHM, and LMH. Only one of these puts the true best first and the true worst last. So the probability of two correct on one set is 1/6. The probability of exactly one correct on one set is 2/6. So the expected number correct on a single pair of questions is: E(X) = 2*(1/6) + 1*(2/6) = 4/6. For all 24 pairs of questions, one would expect 24*4/6 = 16, or 1/3 of 48. Thus, all tests concerning the differences in proportions will use.33 as the baseline comparison standard.
The formula for differences between observed and expected proportions is where
is the observed proportion, p is the expected proportion, q is 1 – p and n is the sample size.
A second possible explanation hinges on the fact that the two target populations were classified by measures with slightly different interpretations of openness to experience—although the descriptions provided to participants did not vary across the studies. To the extent that the description more closely tracked Goldberg's Intellect as opposed to the BFI's slightly broader Openness to Experience designation, it might have provided an advantage to participants in Studies 1 and 2. This is unlikely, however, given that the description provided was more in line with the BFI's operational definition of the construct.
Due to the disparate sample sizes, I could not compare these proportions across judge gender in the same manner that the previous analyses were conducted. However, the within-female judge difference (women judged men more accurately than women judged other women) observed was statistically significant (z = 2.61, p <.01), and the within-male judge difference was not.