Search in:

Advanced search

Assessment in Education: Principles, Policy & Practice Volume 31, 2024 - Issue 1

Submit an article Journal homepage

Open access

481

Views

CrossRef citations to date

Altmetric

Research Articles

Who responds inconsistently to mixed-worded scales? Differences by achievement, age group, and gender

Isa Steinmanna Department of Primary and Secondary Teacher Education, Oslo Metropolitan University, Oslo, NorwayCorrespondence[email protected]

https://orcid.org/0000-0002-9940-4413 View further author information

Jianan Chena Department of Primary and Secondary Teacher Education, Oslo Metropolitan University, Oslo, Norway

https://orcid.org/0000-0002-0421-6058 View further author information

Johan Braekenb Centre for Educational Measurement, University of Oslo, Oslo, Norway

https://orcid.org/0000-0002-2119-3222 View further author information

Pages 5-31 | Received 06 Oct 2023, Accepted 06 Feb 2024, Published online: 15 Feb 2024

Cite this article
https://doi.org/10.1080/0969594X.2024.2318554
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Arias, V. B., Garrido, L. E., Jenaro, C., Martínez-Molina, A., & Arias, B. (2020). A little garbage in lots of garbage out: Assessing the impact of careless responding in personality survey data. Behavior Research Methods, 52(6), 2489–2505. https://doi.org/10.3758/s13428-020-01401-8
PubMed Web of Science ®Google Scholar
Baumgartner, H., Weijters, B., & Pieters, R. (2018). Misresponse to survey questions: A conceptual framework and empirical test of the effects of reversals, negations, and polar opposite core concepts. Journal of Marketing Research, 55(6), 869–883. https://doi.org/10.1177/0022243718811848
Web of Science ®Google Scholar
Bedard, K., & Dhuey, E. (2006). The persistence of early childhood maturity: International evidence of long-run age effects. The Quarterly Journal of Economics, 121(4), 1437–1472. https://doi.org/10.1162/qjec.121.4.1437
Web of Science ®Google Scholar
Bolt, D., Wang, Y. C., Meyer, R. H., & Pier, L. (2020). An IRT mixture model for rating scale confusion associated with negatively worded items in measures of social-emotional learning. Applied Measurement in Education, 33(4), 331–348. https://doi.org/10.1080/08957347.2020.1789140
Web of Science ®Google Scholar
Brevik, L. M., Olsen, R. V., & Hellekjær, G. O. (2016). The complexity of second language reading: Investigating the L1-L2 relationship. Reading in a Foreign Language, 28(2), 161–182. https://doi.org/10.125/66899
Google Scholar
Bulut, H. C., & Bulut, O. (2022). Item wording effects in self-report measures and reading achievement: Does removing careless respondents help? Studies in Educational Evaluation, 72, 101126. https://doi.org/10.1016/j.stueduc.2022.101126
Web of Science ®Google Scholar
Chen, J., Steinmann, I., & Braeken, J. (in print). Competing explanations for inconsistent responding to a mixed-worded self-esteem scale: Cognitive abilities or personality? Personality and Individual Differences.
Google Scholar
Cole, K. L., Turner, R. C., & Gitchel, W. D. (2019). A study of polytomous IRT methods and item wording directionality effects on perceived stress items. Personality and Individual Differences, 147, 63–72. https://doi.org/10.1016/j.paid.2019.03.046
Web of Science ®Google Scholar
Desa, D., Van De Vijver, F. J. R., Carstens, R., & Schulz, W. (2018). Measurement invariance in international large-scale assessments: Integrating theory and method. In T. P. Johnson, B.-E. Pennell, I. A. L. Stoop, & B. Dorer (Eds.), Advances in comparative survey methods (pp. 879–910). John Wiley & Sons, Inc. https://doi.org/10.1002/9781118884997.ch40
Google Scholar
DiStefano, C., & Motl, R. W. (2006). Further investigating method effects associated with negatively worded items on self-report surveys. Structural Equation Modeling: A Multidisciplinary Journal, 13(3), 440–464. https://doi.org/10.1207/s15328007sem1303_6
Web of Science ®Google Scholar
Ebbs, D., Wry, E., Wagner, J.-P., & Netten, A. (2020). Instrument translation and layout verification for TIMSS 2019. In Martin, M. O., Von Davier, M., & Mullis, I. V. S. Eds., Methods and procedures: TIMSS 2019 technical report (pp. 5.1–5.23). TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College and International Association for the Evaluation of Educational Achievement (IEA). https://timssandpirls.bc.edu/timss2019/methods/chapter-5.html
Google Scholar
Foy, P., Fishbein, B., von Davier, M., & Yin, L. (2020). Implementing the TIMSS 2019 scaling methodology. In Martin, M. O., Von Davier, M., & Mullis, I. V. S. eds., Methods and procedures: TIMSS 2019 technical report (pp. 12.1–12.146). TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College and International Association for the Evaluation of Educational Achievement (IEA). https://timssandpirls.bc.edu/timss2019/methods/chapter-12.html
Google Scholar
García-Batista, Z. E., Guerra-Peña, K., Garrido, L. E., Cantisano-Guzmán, L. M., Moretti, L., Cano-Vindel, A., Arias, V. B., & Medrano, L. A. (2021). Using constrained factor mixture analysis to validate mixed-worded psychological scales: The case of the Rosenberg self-esteem scale in the Dominican Republic. Frontiers in Psychology, 12, 636693. https://doi.org/10.3389/fpsyg.2021.636693
PubMed Web of Science ®Google Scholar
Gnambs, T., & Schroeders, U. (2020). Cognitive abilities explain wording effects in the Rosenberg self-esteem scale. Assessment, 27(2), 404–418. https://doi.org/10.1177/1073191117746503
PubMed Web of Science ®Google Scholar
Hong, M., Steedle, J. T., & Cheng, Y. (2020). Methods of detecting insufficient effort responding: Comparisons and practical recommendations. Educational and Psychological Measurement, 80(2), 312–345. https://doi.org/10.1177/0013164419865316
PubMed Web of Science ®Google Scholar
Huang, F. L. (2016). Alternatives to multilevel modeling for the analysis of clustered data. The Journal of Experimental Education, 84(1), 175–196. https://doi.org/10.1080/00220973.2014.952397
Web of Science ®Google Scholar
Kam, C. C. S., & Chan, G. H. (2018). Examination of the validity of instructed response items in identifying careless respondents. Personality and Individual Differences, 129, 83–87. https://doi.org/10.1016/j.paid.2018.03.022
Web of Science ®Google Scholar
Kam, C. C. S., & Meyer, J. P. (2015). Implications of item keying and item valence for the investigation of construct dimensionality. Multivariate Behavioral Research, 50(4), 457–469. https://doi.org/10.1080/00273171.2015.1022640
PubMed Web of Science ®Google Scholar
Kazak, A. E. (2018). Editorial: Journal article reporting standards. American Psychologist, 73(1), 1–2. https://doi.org/10.1037/amp0000263
PubMed Web of Science ®Google Scholar
Koda, K. (2007). Reading and language learning: Crosslinguistic constraints on second language reading development: Reading and language learning. Language Learning, 57(s1), 1–44. https://doi.org/10.1111/0023-8333.101997010-i1
Google Scholar
LaRoche, S., Joncas, M., & Foy, P. (2020). Sample design in TIMSS 2019. In M. O. Martin, M. V. Davier, & I. V. S. Mullis (Eds.), Methods and procedures: TIMSS 2019 technical report (pp. 31–333) TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College and International Association for the Evaluation of Educational Achievement (IEA). https://timssandpirls.bc.edu/timss2019/methods/chapter-3.html
Google Scholar
Lenzner, T., & Menold, N. (2016). GESIS survey guidelines: Question wording (2.0). SDM-Survey Guidelines (GESIS Leibniz Institute for the Social Sciences). https://doi.org/10.15465/gesis-sg_en_017
Google Scholar
Likert, R. (1974). The method of constructing an attitude scale. In G. M. Maranell (Ed.), Scaling. A sourcebook for behavioral scientists (pp. 233–243). Aldine Publ.
Google Scholar
Lindwall, M., Barkoukis, V., Grano, C., Lucidi, F., Raudsepp, L., Liukkonen, J., & Thøgersen-Ntoumani, C. (2012). Method effects: The problem with negatively versus positively keyed items. Journal of Personality Assessment, 94(2), 196–204. https://doi.org/10.1080/00223891.2011.645936
PubMed Web of Science ®Google Scholar
Lumley, T. (2019). Mitools: Tools for Multiple Imputation of Missing Data (2.4) [Computer Software]. https://cran.r-project.org/web/packages/mitools/index.html
Google Scholar
Lumley, T. (2023). Survey: Analysis of Complex Survey Samples (4.2) [Computer Software]. https://cran.r-project.org/web/packages/survey/index.html
Google Scholar
Marsh, H. W. (1996). Positive and negative global self-esteem: A substantively meaningful distinction or artifactors? Journal of Personality and Social Psychology, 70(4), 810–819. https://doi.org/10.1037/0022-3514.70.4.810
PubMed Web of Science ®Google Scholar
Marsh, H. W., Abduljabbar, A. S., Abu-Hilal, M. M., Morin, A. J. S., Abdelfattah, F., Leung, K. C., Xu, M. K., Nagengast, B., & Parker, P. (2013). Factorial, convergent, and discriminant validity of TIMSS math and science motivation measures: A comparison of Arab and Anglo-Saxon countries. Journal of Educational Psychology, 105(1), 108–128. https://doi.org/10.1037/a0029907
Web of Science ®Google Scholar
Martin, M. O., Von Davier, M., & Mullis, I. V. S. (Eds.). (2020). Methods and procedures: TIMSS 2019 technical report. TIMSS & PIRLS International Study Center, Lynch School of Education and Human Development, Boston College and International Association for the Evaluation of Educational Achievement (IEA).
Google Scholar
Meinck, S. (2020). Sampling, weighting, and variance estimation. In H. Wagemaker (Ed.), Reliability and validity of international large-scale assessment (Vol. 10, pp. 113–129). Springer International Publishing. https://doi.org/10.1007/978-3-030-53081-5_7
Google Scholar
Melnick, S. A., & Gable, R. K. (1990). The use of negative item stems. Educational Research Quarterly, 14(3), 31–36.
Google Scholar
Michaelides, M. P. (2019). Negative keying effects in the factor structure of TIMSS 2011 motivation scales and associations with reading achievement. Applied Measurement in Education, 32(4), 365–378. https://doi.org/10.1080/08957347.2019.1660349
Web of Science ®Google Scholar
Mullis, I. V. S., Martin, M. O., Foy, P., & Hooper, M. (2017). PIRLS 2016 international results in reading. TIMSS & PIRLS International Study Center, Lynch School of Education, Boston College and the International Association for the Evaluation of Educational Achievement (IEA). https://pirls2016.org/wp-content/uploads/structure/CompletePDF/P16-PIRLS-International-Results-in-Reading.pdf
Google Scholar
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). McGraw-Hill. http://www.loc.gov/catdir/description/mh022/93022756.html
Google Scholar
OECD. (2019). PISA 2018 results (volume II): Where all students can succeed. OECD Publishing. https://doi.org/10.1787/b5fd1b8f-en
Google Scholar
Patton, J. M., Cheng, Y., Hong, M., & Diao, Q. (2019). Detection and treatment of careless responses to improve item parameter estimation. Journal of Educational and Behavioral Statistics, 44(3), 309–341. https://doi.org/10.3102/1076998618825116
Web of Science ®Google Scholar
Peng, K., & Nisbett, R. E. (1999). Culture, dialectics, and reasoning about contradiction. American Psychologist, 54(9), 741–754. https://doi.org/10.1037/0003-066X.54.9.741
Web of Science ®Google Scholar
Podsakoff, P. M., MacKenzie, S. B., Lee, J.-Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. The Journal of Applied Psychology, 88(5), 879–903. https://doi.org/10.1037/0021-9010.88.5.879
PubMed Web of Science ®Google Scholar
Quilty, L. C., Oakman, J. M., & Risko, E. (2006). Correlates of the Rosenberg self-esteem scale method effects. Structural Equation Modeling: A Multidisciplinary Journal, 13(1), 99–117. https://doi.org/10.1207/s15328007sem1301_5
Web of Science ®Google Scholar
R Core Team. (2022). R: A language and environment for statistical computing [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/
Google Scholar
Rohde, T. E., & Thompson, L. A. (2007). Predicting academic achievement with cognitive ability. Intelligence, 35(1), 83–92. https://doi.org/10.1016/j.intell.2006.05.004
Web of Science ®Google Scholar
Roszkowski, M. J., & Soven, M. (2010). Shifting gears: Consequences of including two negatively worded items in the middle of a positively worded questionnaire. Assessment & Evaluation in Higher Education, 35(1), 113–130. https://doi.org/10.1080/02602930802618344
Web of Science ®Google Scholar
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. Wiley.
Google Scholar
Schmitt, N., & Stults, D. M. (1985). Factors defined by negatively keyed items: The result of careless respondents? Applied Psychological Measurement, 9(4), 367–373. https://doi.org/10.1177/014662168500900405
Web of Science ®Google Scholar
Schulz, W., & Carstens, R. (2020). Questionnaire development in international large-scale assessment studies. In H. Wagemaker (Ed.), Reliability and validity of international large-scale assessment (Vol. 10, pp. 61–83). Springer International Publishing. https://doi.org/10.1007/978-3-030-53081-5_5
Google Scholar
Silm, G., Pedaste, M., & Täht, K. (2020). The relationship between performance and test-taking effort when measured with self-report or time-based instruments: A meta-analytic review. Educational Research Review, 31, 100335. https://doi.org/10.1016/j.edurev.2020.100335
Web of Science ®Google Scholar
Steedle, J. T., Hong, M., & Cheng, Y. (2019). The effects of inattentive responding on construct validity evidence when measuring social–emotional learning competencies. Educational Measurement Issues & Practice, 38(2), 101–111. https://doi.org/10.1111/emip.12256
Google Scholar
Steinmann, I., & Olsen, R. V. (2022). Equal opportunities for all? Analyzing within-country variation in school effectiveness. Large-Scale Assessments in Education, 10(1), 2. https://doi.org/10.1186/s40536-022-00120-0
Web of Science ®Google Scholar
Steinmann, I., Sánchez, D., van Laar, S., & Braeken, J. (2022). The impact of inconsistent responders to mixed-worded scales on inferences in international large-scale assessments. Assessment in Education Principles, Policy & Practice, 29(1), 5–26. https://doi.org/10.1080/0969594X.2021.2005302
Google Scholar
Steinmann, I., Strietholt, R., & Braeken, J. (2022). A constrained factor mixture analysis model for consistent and inconsistent respondents to mixed-worded scales. Psychological Methods, 27(4), 667–702. https://doi.org/10.1037/met0000392
PubMed Web of Science ®Google Scholar
Swain, S. D., Weathers, D., & Niedrich, R. W. (2008). Assessing three sources of misresponse to reversed likert items. Journal of Marketing Research, 45(1), 116–131. https://doi.org/10.1509/jmkr.45.1.116
Web of Science ®Google Scholar
TIMSS & PIRLS International Study Center. (2019). TIMSS: Trends in International Mathematics and Science Study. https://timssandpirls.bc.edu/timss-landing.html
Google Scholar
Tourangeau, R., Rips, L. J., & Rasinski, K. (2000). The psychology of survey response (1st ed.). Cambridge University Press. https://doi.org/10.1017/CBO9780511819322
Google Scholar
van Buuren, S. (2011). Multiple imputation of multilevel data. In J. J. Hox & J. K. Roberts (Eds.), Handbook of advanced multilevel analysis (pp. 173–196). Routledge.
Google Scholar
Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 36(3. https://doi.org/10.18637/jss.v036.i03
PubMed Web of Science ®Google Scholar
von Davier, M., Gonzalez, E. J., & Mislevy, R. (2009). What are plausible values and why are they useful? In M. von Davier & D. Hastedt (Eds.), IERI Monograph Series: Issues and Methodologies in Large-Scale Assessments (Vol. 2, pp. 9–36). IERI.
Google Scholar
Woods, C. M. (2006). Careless responding to reverse-worded items: Implications for confirmatory factor analysis. Journal of Psychopathology and Behavioral Assessment, 28(3), 186–191. https://doi.org/10.1007/s10862-005-9004-7
Web of Science ®Google Scholar
Zeng, B., Wen, H., & Zhang, J. (2020). How does the valence of wording affect features of a scale? The method effects in the undergraduate learning burnout scale. Frontiers in Psychology, 11, 585179. https://doi.org/10.3389/fpsyg.2020.585179
PubMed Web of Science ®Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Who responds inconsistently to mixed-worded scales? Differences by achievement, age group, and gender

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Who responds inconsistently to mixed-worded scales? Differences by achievement, age group, and gender

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date