596
Views
5
CrossRef citations to date
0
Altmetric
Articles

Validating a partial-credit scoring approach for multiple-choice science items: an application of fundamental ideas in science

ORCID Icon &
Pages 1640-1666 | Received 29 Oct 2020, Accepted 26 Apr 2021, Published online: 16 May 2021

References

  • Adams, R. J., Wu, M. L., & Wilson, M. R. (2012). ACER ConQuest 3.0. 1 [Computer Software]. Australian Council for Educational Research.
  • Alonzo, A. C., & Steedle, J. T. (2009). Developing and assessing a forces and motion learning progression. Science Education, 93(3), 389–421. https://doi.org/10.1002/sce.20303
  • Andersen, E. B. (1977). Sufficient statistics and latent trait models. Psychometrika, 42(1), 69–81. https://doi.org/10.1007/BF02293746
  • Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561–573. https://doi.org/10.1007/BF02293814
  • Andrich, D. (2005). The Rasch model explained. In Maclean R. et al. (Ed.), Applied Rasch measurement: A book of exemplars (Vol. 4, pp. 27–59). Dordrecht: Springer.
  • Andrich, D. (2010). Sufficiency and conditional estimation of person parameters in the polytomous Rasch model. Psychometrika, 75(2), 292–308. https://doi.org/10.1007/s11336-010-9154-8
  • Arslan, H. O., Cigdemoglu, C., & Moseley, C. (2012). A three-tier diagnostic test to assess pre-service teachers’ misconceptions about global warming, greenhouse effect, ozone layer depletion, and acid rain. International Journal of Science Education, 34(11), 1667–1686. https://doi.org/10.1080/09500693.2012.680618
  • Bejar, I. I., & Weiss, D. J. (1977). A comparison of empirical differential option weighting scoring procedures as a function of inter-item correlation. Educational and Psychological Measurement, 37(2), 335–340.
  • Ben-Simon, A., Budescu, D. V., & Nevo, B. (1997). A comparative study of measures of partial knowledge in multiple-choice tests. Applied Psychological Measurement, 21(1), 65–88.
  • Bennett, R. E. (2010). Cognitively based assessment of, for, and as learning (CBAL): A preliminary theory of action for summative and formative assessment. Measurement, 8(2-3), 70–91. https://doi.org/10.1080/15366367.2010.508686
  • Bo, Y. E., Lewis, C., & Budescu, D. V. (2015). An option-based partial credit item response model. In R. Millsap, D. Bolt, L. van der Ark, & W. C. Wang (Eds.), Quantitative psychology research (pp. 45–72). Springer.
  • Bond, T. G., Fox, C. M., & Lacey, H. (2007). Applying the Rasch model: Fundamental measurement in the social sciences (2nd ed.). Routledge Taylor and Francis Group.
  • Briggs, D. C., Alonzo, A. C., Schwab, C., & Wilson, M. (2006). Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11(1), 33–63. https://doi.org/10.1207/s15326977ea1101_2
  • Brown, J. (1965). Multiple response evaluation of discrimination. British Journal of Mathematical and Statistical Psychology, 18(1), 125–137. https://doi.org/10.1111/j.2044-8317.1965.tb00696.x
  • Bush, M. (2001). A multiple choice test that rewards partial knowledge. Journal of Further and Higher Education, 25(2), 157–163. https://doi.org/10.1080/03098770120050828
  • Bush, M. (2015). Reducing the need for guesswork in multiple-choice tests. Assessment & Evaluation in Higher Education, 40(2), 218–231. https://doi.org/10.1080/02602938.2014.902192
  • Cavers, M., & Ling, J. (2016). Confidence weighting procedures for multiple-choice tests. In D. G. Chen, J. Chen, X. Lu, G. Yi, & H. Yu (Eds.), Advanced statistical methods in data science (pp. 171–181). Springer.
  • Champagne, A. B., Klopfer, L. E., & Anderson, J. H. (1980). Factors influencing the learning of classical mechanics. American Journal of Physics, 48(12), 1074–1079. https://doi.org/10.1119/1.12290
  • Chi, S., Wang, Z., & Liu, X. (2019). Investigating disciplinary context effect on student scientific inquiry competence. International Journal of Science Education, 41(18), 2736–2764.
  • Clement, J. (1982). Students’ preconceptions in introductory mechanics. American Journal of Physics, 50(1), 66–71. https://doi.org/10.1119/1.12989
  • Coombs, C. H. (1953). On the use of objective examinations. Educational and Psychological Measurement, 13(2), 308–310.
  • Coombs, C. H., Milholland, J. E., & Womer, F. B. (1956). The assessment of partial knowledge. Educational and Psychological Measurement, 16(1), 13–37.
  • Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 1–28.
  • Cross, L. H., Ross, F. K., & Geller, E. S. (1980). Using choice-weighted scoring of multiple-choice tests for determination of grades in college courses. The Journal of Experimental Education, 48(4), 296–301.
  • Davis, F. B., & Fifer, G. (1959). The effect on test reliability and validity of scoring aptitude and achievement tests with weights for every choice. Educational and Psychological Measurement, 19(2), 159–170. https://doi.org/10.1177/001316445901900202
  • Davis, F. B., & Fifer, G. (1959). The effect on test reliability and validity of scoring aptitude and achievement tests with weights for every choice. Educational and Psychological Measurement, 19(2), 159–170.
  • DeMars, C. E. (2008, March). Scoring multiple choice items: A comparison of IRT and classical polytomous and dichotomous methods. Paper presented at the national Council on Measurement in education, New York, NY, USA.
  • Diedenhofen, B., & Musch, J. (2015). Empirical option weights improve the validity of a multiple-choice knowledge test. European Journal of Psychological Assessment, 33(5), 336–344. https://doi.org/10.1027/1015-5759/a000295
  • disessa, A. A. (2014). A history of conceptual change research: Threads and fault lines. In The Cambridge handbook of the learning sciences (2nd ed.). UC Berkeley.
  • diSessa, A. (1983). Phenomenology and the evolution of intuition. In D. Gentner, & A. L. Stevens (Eds.), Mental models (pp. 15–33). Erlbaum.
  • diSessa, A. (2007). Changing conceptual change. Human Development, 50(1), 39–46. https://doi.org/10.1159/000097683
  • diSessa, A. (2013). A bird’s-eye view of the “pieces” vs “coherence” controversy (from the “pieces” side of the fence”). Stella Vosniadou, págs. 31-48.
  • disessa, A. A., & Sherin, B. L. (1998). What changes in conceptual change? International Journal of Science Education, 20(10), 1155–1191.
  • Dressel, P., & Schmid, J. (1953). Some modifications of the multiple-choice item. Educational and Psychological Measurement, 13(4), 574–595. https://doi.org/10.1177/001316445301300404
  • Echternacht, G. (1976). Reliability and validity of item option weighting schemes. Educational and Psychological Measurement, 36(2), 301–309.
  • Frary, R. B. (1989). Partial-credit scoring methods for multiple-choice tests. Applied Measurement in Education, 2(1), 79–96. https://doi.org/10.1207/s15324818ame0201_5
  • Fulmer, G. W., Liang, L. L., & Liu, X. (2014). Applying a forces and motion learning progression over an extended time span using the force concept inventory. International Journal of Science Education, 36(17), 2918–2936. https://doi.org/10.1080/09500693.2014.939120
  • Gao, Y., Zhai, X., Andersson, B., Zeng, P., & Xin, T. (2020). Developing a learning progression of buoyancy to model conceptual change: A latent class and rule space model analysis. Research in Science Education, 50(4), 1369–1388.
  • Gilman, D. A., & Ferry, P. (1972). Increasing test reliability through self-scoring procedures. Journal of Educational Measurement, 9(3), 205–207.
  • Haladyna, T. M. (2004). Developing and validating multiple-choice test items (3 ed.). Lawrence Erlbaum.
  • Halloun, I. A., & Hestenes, D. (1985). Common sense concepts about motion. American Journal of Physics, 53(11), 1056–1065. https://doi.org/10.1119/1.14031
  • Hardy, J., Bates, S. P., Casey, M. M., Galloway, K. W., Galloway, R. K., Kay, A. E., … McQueen, H. A. (2014). Student-generated content: Enhancing learning through sharing multiple-choice questions. International Journal of Science Education, 36(13), 2180–2194. https://doi.org/10.1080/09500693.2014.916831
  • Harris, C. J., Krajcik, J. S., Pellegrino, J. W., & Debarger, A. H. (2019). Designing knowledge-in-use assessments to promote deeper learning. Educational Measurement: Issues and Practice, 38(2), 53–67.
  • Härtig, H., Nordine, J. C., & Neumann, K. (2020). Contextualisation in the assessment of students’ learning about science. In T. I. Sánchez (Ed.), International perspectives on the contextualisation of science education (pp. 113–144). Springer.
  • Higham, P. A. (2013). Regulating accuracy on university tests with the plurality option. Learning and Instruction, 24, 26–36. https://doi.org/10.1016/j.learninstruc.2012.08.001
  • Kansup, W., & Hakstian, A. R. (1975). A comparison of several methods of assessing partial knowledge in multiple-choice tests: I. Scoring procedures. Journal of Educational Measurement, 12, 219–230.
  • Koretsky, M. D., Brooks, B. J., & Higgins, A. Z. (2016). Written justifications to multiple-choice concept questions during active learning in class. International Journal of Science Education, 38(11), 1747–1765. https://doi.org/10.1080/09500693.2016.1214303
  • Lesage, E., Valcke, M., & Sabbe, E. (2013). Scoring methods for multiple choice assessment in higher education–Is it still a matter of number right scoring or negative marking? Studies in Educational Evaluation, 39(3), 188–193. https://doi.org/10.1016/j.stueduc.2013.07.001
  • Linacre, J. (2002). Judging debacle in pairs figure skating. Rasch Measurement Transactions, 15(4), 839–840. https://www.rasch.org/rmt/rmt154a.htm
  • Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174. https://doi.org/10.1007/BF02296272
  • Minstrell, J. (1992). Facets of students’ knowledge and relevant instruction. Research in Physics Learning: Theoretical Issues and Empirical Studies, 110–128.
  • Mortaz Hejri, S., Khabaz Mafinezhad, M., & Jalili, M. (2014). Guessing in multiple choice questions: Challenges and strategies. Iranian Journal of Medical Education, 14(7), 594–604. https://www.sid.ir/en/journal/ViewPaper.aspx?id=428372
  • National Research Council. (2012). A framework for K-12 science education: Practices, crosscutting concepts, and core ideas. National Academies Press.
  • Nedelsky, L. (1954). Ability to avoid gross error as a measure of achievement. Educational and Psychological Measurement, 14(3), 459–472. https://journals.sagepub.com/doi/pdf/10.1177/001316445401400303?casa_token=uxRB79T8IyUAAAAA:-KJYVOQgd2W7P8xIUmFn6gHii0wrojDi3keeo2dkOp2fAhCf91LiasTCFQ__bv3W4x6j2DPWEgf1
  • Neumann, I., Neumann, K., & Nehm, R. (2011). Evaluating instrument quality in science education: Rasch-based analyses of a nature of science test. International Journal of Science Education, 33(10), 1373–1405.
  • Neumann, K., Viering, T., Boone, W. J., & Fischer, H. E. (2013). Towards a learning progression of energy. Journal of Research in Science Teaching, 50(2), 162–188. https://doi.org/10.1002/tea.21061
  • NGSS Lead States. (2013). Next generation science standards: For states, by states. National Academies Press.
  • Oon, P. T., & Fan, X. (2017). Rasch analysis for psychometric improvement of science attitude rating scales. International Journal of Science Education, 39(6), 683–700. https://doi.org/10.1080/09500693.2017.1299951
  • Opitz, S. T., Harms, U., Neumann, K., Kowalzik, K., & Frank, A. (2015). Students’ energy concepts at the transition between primary and secondary school. Research in Science Education, 45(5), 691–715. https://doi.org/10.1007/s11165-014-9444-8
  • Opitz, S. T., Neumann, K., Bernholt, S., & Harms, U. (2017). How do students understand energy in biology, chemistry, and physics? Development and validation of an assessment instrument. Eurasia Journal of Mathematics, Science and Technology Education, 13(7), 3019–3042. https://doi.org/10.12973/eurasia.2017.00703a
  • Patnaik, D., & Traub, R. E. (1973). Differential weighting by judged degree of correctness. Journal of Educational Measurement, 10(4), 281–286.
  • Piaget, J. (1966). The child’s conception of physical causality by jean piaget; marjorie gabain. Littlefield, Adams.
  • Rasch, G. (1961, June). On general laws and the meaning of measurement in psychology. Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, 4, 321–333.
  • Romine, W. L., Barrow, L. H., & Folk, W. R. (2013). Exploring secondary students’ knowledge and misconceptions about influenza: Development, validation, and implementation of a multiple-choice influenza knowledge scale. International Journal of Science Education, 35(11), 1874–1901. https://doi.org/10.1080/09500693.2013.778439
  • Romine, W. L., Schaffer, D. L., & Barrow, L. (2015). Development and application of a novel Rasch-based methodology for evaluating multi-tiered assessment instruments: Validation and utilization of an undergraduate diagnostic test of the water cycle. International Journal of Science Education, 37(16), 2740–2768.
  • Ruiz-Primo, M. A., Zhai, X., Li, M., Hernandez, D., Kanopka, K., Dong, D., Minstrell, J. (2019). Contextualised science assessments: Addressing the use of information and generalisation of inferences of students’ performance. Paper presented at the annual conference of the AERA, Toronto, Canada, April.
  • Sadler, P. M. (1998). Psychometric models of student conceptions in science: Reconciling qualitative studies and distractor-driven assessment instruments. Journal of Research in Science Teaching, 35(3), 265–296.
  • Shuford, E. H., Albert, A., & Massengill, H. E. (1966). Admissible probability measurement procedures. Psychometrika, 31(2), 125–145.
  • Slepkov, A. D., & Godfrey, A. T. K. (2019). Partial credit in answer-until-correct multiple-choice tests deployed in a classroom setting. Applied Measurement in Education, 32(2), 138–150. https://doi.org/10.1080/08957347.2019.1577249
  • Slepkov, A. D., Vreugdenhil, A. J., & Shiell, R. C. (2016). Score increase and partial-credit validity when administering multiple-choice tests using an answer-until-correct format. Journal of Chemical Education, 93(11), 1839–1846. https://doi.org/10.1021/acs.jchemed.6b00028
  • Socan, G. (2015). Empirical option weights for multiple-choice items: Interactions with item properties and testing design. Advances in Methodology & Statistics / Metodoloski Zvezki, 12(1), 25–43. https://search.ebscohost.com/login.aspx?direct=true&AuthType=ip,shib&db=a9h&AN=113123494&site=ehost-live
  • Toffoli, S. F. L., de Andrade, D. F., & Bornia, A. C. (2016). Evaluation of open items using the many-facet Rasch model. Journal of Applied Statistics, 43(2), 299–316. https://doi.org/10.1080/02664763.2015.1049938
  • Woitkowski, D. (2020). Tracing physics content knowledge gains using content complexity levels. International Journal of Science Education, 42(10), 1585–1608. https://doi.org/10.1080/09500693.2020.1772520
  • Wright, B. D., & Stone, M. H. (1979). Best test design. MESA Press.
  • Wu, M. L., Adams, R., Wilson, M., & Haldane, S. (2007). ACER conquest version 2.0. ACER Press, Australian Council for Educational Research.
  • Zhai, X., Haudek, K. C., Stuhlsatz, M. A., & Wilson, C. (2020). Evaluation of construct-irrelevant variance yielded by machine and human scoring of a science teacher PCK constructed response assessment. Studies in Educational Evaluation, 67, 100916. https://doi.org/10.1016/j.stueduc.2020.100916
  • Zhai, X., Li, M., & Guo, Y. (2018). Teachers' use of learning progression-based formative assessment to inform teachers' instructional adjustment: A case study of two physics teachers' instruction. International Journal of Science Education, 40(15), 1832–1856. https://doi.org/10.1080/09500693.2018.1512772

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.