ABSTRACT
Users of forced-choice questionnaires (FCQs) to measure personality commonly assume statement parameter invariance across contexts – between Likert and forced-choice (FC) items and between different FC items that share a common statement. In this paper, an empirical study was designed to check these two assumptions for an FCQ assessment measuring interpersonal and intrapersonal skills. We compared parameters of common statements between two Likert forms and two FCQ forms with a block size of two (statement pairs) and among five FCQ pair forms. In three of the five FCQ forms, statements were paired only with a statement they had appeared with in a triplet block. In the other two FCQ forms, statements were paired with statements they had not been paired with in a triplet block. This design allows us to evaluate statement parameter changes due to changes in context. The results do not support the statement parameter invariance assumption between Likert and FC items or the assumption between FC items when recombining statements form new items. However, the assumption between FC items was generally held for pairs formed by dropping a statement from a triplet item. There were some suggestions for sources of context effects, but the analyses were not definitive. Implications of the findings for test practice are discussed.
Acknowledgement
Zhitong Yang helped build the three subset pair forms from the two triplet forms. Daniel Fishtein and Yuan Wang managed the administration of all test forms. Nimmi Devasia and Steven Holtzman prepared the test data used in this paper. We are grateful for their contributions to this paper.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Data availability statement
The data supporting this study’s findings are properties of ETS and not publicly available. The data may be available on request from the corresponding author on a case-by-case basis.
Notes
1. For example, blocks can be assembled so that within-block statements match on social desirability, dimension information is maximized and forms match on expected information by dimension.
2. We use the term statements here because that is the entity on which we conduct analyses; the wider choice literature typically refers to items, options, objects, or simply choices. We avoid items here because, for our context, an item in an FCQ refers to a block of statements.
3. Items were randomly ordered separately for each participant for all forms.
4. These data were also fit by the GGUM; generally, the 2PLM fit better. The results of a model comparison are documented elsewhere (Fu et al., Citation2023c).
5. Data from Pairs Forms 2C, 2D, and 2E data were not included in the Likert-to-FC comparison because including all the five pair forms (2A to 2E) would lead to a very large dataset with many items (1,150 forced-choice items plus 560 Likert statements) making model estimation slow and difficult due to the demand on computer memory.
6. Concurrent calibration is a planned missing design method to estimate item parameters from multiple forms in a single analysis (Kolen & Brennan, Citation2014).