Search in:

Advanced search

International Journal of Science Education Volume 43, 2021 - Issue 10

Submit an article Journal homepage

596

Views

CrossRef citations to date

Altmetric

Articles

Validating a partial-credit scoring approach for multiple-choice science items: an application of fundamental ideas in science

Xiaoming Zhaia Department of Mathematics and Science Education, University of Georgia, Athens, GA, USACorrespondence[email protected]

https://orcid.org/0000-0003-4519-1931 View further author information

Min Lib College of Education, University of Washington, Seattle, WA, USAView further author information

Pages 1640-1666 | Received 29 Oct 2020, Accepted 26 Apr 2021, Published online: 16 May 2021

Cite this article
https://doi.org/10.1080/09500693.2021.1923856
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Adams, R. J., Wu, M. L., & Wilson, M. R. (2012). ACER ConQuest 3.0. 1 [Computer Software]. Australian Council for Educational Research.
Google Scholar
Alonzo, A. C., & Steedle, J. T. (2009). Developing and assessing a forces and motion learning progression. Science Education, 93(3), 389–421. https://doi.org/10.1002/sce.20303
Web of Science ®Google Scholar
Andersen, E. B. (1977). Sufficient statistics and latent trait models. Psychometrika, 42(1), 69–81. https://doi.org/10.1007/BF02293746
Web of Science ®Google Scholar
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43(4), 561–573. https://doi.org/10.1007/BF02293814
Web of Science ®Google Scholar
Andrich, D. (2005). The Rasch model explained. In Maclean R. et al. (Ed.), Applied Rasch measurement: A book of exemplars (Vol. 4, pp. 27–59). Dordrecht: Springer.
Google Scholar
Andrich, D. (2010). Sufficiency and conditional estimation of person parameters in the polytomous Rasch model. Psychometrika, 75(2), 292–308. https://doi.org/10.1007/s11336-010-9154-8
Web of Science ®Google Scholar
Arslan, H. O., Cigdemoglu, C., & Moseley, C. (2012). A three-tier diagnostic test to assess pre-service teachers’ misconceptions about global warming, greenhouse effect, ozone layer depletion, and acid rain. International Journal of Science Education, 34(11), 1667–1686. https://doi.org/10.1080/09500693.2012.680618
Web of Science ®Google Scholar
Bejar, I. I., & Weiss, D. J. (1977). A comparison of empirical differential option weighting scoring procedures as a function of inter-item correlation. Educational and Psychological Measurement, 37(2), 335–340.
Web of Science ®Google Scholar
Ben-Simon, A., Budescu, D. V., & Nevo, B. (1997). A comparative study of measures of partial knowledge in multiple-choice tests. Applied Psychological Measurement, 21(1), 65–88.
Web of Science ®Google Scholar
Bennett, R. E. (2010). Cognitively based assessment of, for, and as learning (CBAL): A preliminary theory of action for summative and formative assessment. Measurement, 8(2-3), 70–91. https://doi.org/10.1080/15366367.2010.508686
Google Scholar
Bo, Y. E., Lewis, C., & Budescu, D. V. (2015). An option-based partial credit item response model. In R. Millsap, D. Bolt, L. van der Ark, & W. C. Wang (Eds.), Quantitative psychology research (pp. 45–72). Springer.
Google Scholar
Bond, T. G., Fox, C. M., & Lacey, H. (2007). Applying the Rasch model: Fundamental measurement in the social sciences (2nd ed.). Routledge Taylor and Francis Group.
Google Scholar
Briggs, D. C., Alonzo, A. C., Schwab, C., & Wilson, M. (2006). Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11(1), 33–63. https://doi.org/10.1207/s15326977ea1101_2
Google Scholar
Brown, J. (1965). Multiple response evaluation of discrimination. British Journal of Mathematical and Statistical Psychology, 18(1), 125–137. https://doi.org/10.1111/j.2044-8317.1965.tb00696.x
PubMed Web of Science ®Google Scholar
Bush, M. (2001). A multiple choice test that rewards partial knowledge. Journal of Further and Higher Education, 25(2), 157–163. https://doi.org/10.1080/03098770120050828
Google Scholar
Bush, M. (2015). Reducing the need for guesswork in multiple-choice tests. Assessment & Evaluation in Higher Education, 40(2), 218–231. https://doi.org/10.1080/02602938.2014.902192
Web of Science ®Google Scholar
Cavers, M., & Ling, J. (2016). Confidence weighting procedures for multiple-choice tests. In D. G. Chen, J. Chen, X. Lu, G. Yi, & H. Yu (Eds.), Advanced statistical methods in data science (pp. 171–181). Springer.
Google Scholar
Champagne, A. B., Klopfer, L. E., & Anderson, J. H. (1980). Factors influencing the learning of classical mechanics. American Journal of Physics, 48(12), 1074–1079. https://doi.org/10.1119/1.12290
Web of Science ®Google Scholar
Chi, S., Wang, Z., & Liu, X. (2019). Investigating disciplinary context effect on student scientific inquiry competence. International Journal of Science Education, 41(18), 2736–2764.
Web of Science ®Google Scholar
Clement, J. (1982). Students’ preconceptions in introductory mechanics. American Journal of Physics, 50(1), 66–71. https://doi.org/10.1119/1.12989
Web of Science ®Google Scholar
Coombs, C. H. (1953). On the use of objective examinations. Educational and Psychological Measurement, 13(2), 308–310.
Web of Science ®Google Scholar
Coombs, C. H., Milholland, J. E., & Womer, F. B. (1956). The assessment of partial knowledge. Educational and Psychological Measurement, 16(1), 13–37.
Web of Science ®Google Scholar
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 1–28.
PubMed Web of Science ®Google Scholar
Cross, L. H., Ross, F. K., & Geller, E. S. (1980). Using choice-weighted scoring of multiple-choice tests for determination of grades in college courses. The Journal of Experimental Education, 48(4), 296–301.
Web of Science ®Google Scholar
Davis, F. B., & Fifer, G. (1959). The effect on test reliability and validity of scoring aptitude and achievement tests with weights for every choice. Educational and Psychological Measurement, 19(2), 159–170. https://doi.org/10.1177/001316445901900202
Web of Science ®Google Scholar
Davis, F. B., & Fifer, G. (1959). The effect on test reliability and validity of scoring aptitude and achievement tests with weights for every choice. Educational and Psychological Measurement, 19(2), 159–170.
Web of Science ®Google Scholar
DeMars, C. E. (2008, March). Scoring multiple choice items: A comparison of IRT and classical polytomous and dichotomous methods. Paper presented at the national Council on Measurement in education, New York, NY, USA.
Google Scholar
Diedenhofen, B., & Musch, J. (2015). Empirical option weights improve the validity of a multiple-choice knowledge test. European Journal of Psychological Assessment, 33(5), 336–344. https://doi.org/10.1027/1015-5759/a000295
Web of Science ®Google Scholar
disessa, A. A. (2014). A history of conceptual change research: Threads and fault lines. In The Cambridge handbook of the learning sciences (2nd ed.). UC Berkeley.
Google Scholar
diSessa, A. (1983). Phenomenology and the evolution of intuition. In D. Gentner, & A. L. Stevens (Eds.), Mental models (pp. 15–33). Erlbaum.
Google Scholar
diSessa, A. (2007). Changing conceptual change. Human Development, 50(1), 39–46. https://doi.org/10.1159/000097683
Web of Science ®Google Scholar
diSessa, A. (2013). A bird’s-eye view of the “pieces” vs “coherence” controversy (from the “pieces” side of the fence”). Stella Vosniadou, págs. 31-48.
Google Scholar
disessa, A. A., & Sherin, B. L. (1998). What changes in conceptual change? International Journal of Science Education, 20(10), 1155–1191.
Web of Science ®Google Scholar
Dressel, P., & Schmid, J. (1953). Some modifications of the multiple-choice item. Educational and Psychological Measurement, 13(4), 574–595. https://doi.org/10.1177/001316445301300404
Web of Science ®Google Scholar
Echternacht, G. (1976). Reliability and validity of item option weighting schemes. Educational and Psychological Measurement, 36(2), 301–309.
Web of Science ®Google Scholar
Frary, R. B. (1989). Partial-credit scoring methods for multiple-choice tests. Applied Measurement in Education, 2(1), 79–96. https://doi.org/10.1207/s15324818ame0201_5
Google Scholar
Fulmer, G. W., Liang, L. L., & Liu, X. (2014). Applying a forces and motion learning progression over an extended time span using the force concept inventory. International Journal of Science Education, 36(17), 2918–2936. https://doi.org/10.1080/09500693.2014.939120
Web of Science ®Google Scholar
Gao, Y., Zhai, X., Andersson, B., Zeng, P., & Xin, T. (2020). Developing a learning progression of buoyancy to model conceptual change: A latent class and rule space model analysis. Research in Science Education, 50(4), 1369–1388.
Web of Science ®Google Scholar
Gilman, D. A., & Ferry, P. (1972). Increasing test reliability through self-scoring procedures. Journal of Educational Measurement, 9(3), 205–207.
Web of Science ®Google Scholar
Haladyna, T. M. (2004). Developing and validating multiple-choice test items (3 ed.). Lawrence Erlbaum.
Google Scholar
Halloun, I. A., & Hestenes, D. (1985). Common sense concepts about motion. American Journal of Physics, 53(11), 1056–1065. https://doi.org/10.1119/1.14031
Web of Science ®Google Scholar
Hardy, J., Bates, S. P., Casey, M. M., Galloway, K. W., Galloway, R. K., Kay, A. E., … McQueen, H. A. (2014). Student-generated content: Enhancing learning through sharing multiple-choice questions. International Journal of Science Education, 36(13), 2180–2194. https://doi.org/10.1080/09500693.2014.916831
Web of Science ®Google Scholar
Harris, C. J., Krajcik, J. S., Pellegrino, J. W., & Debarger, A. H. (2019). Designing knowledge-in-use assessments to promote deeper learning. Educational Measurement: Issues and Practice, 38(2), 53–67.
Web of Science ®Google Scholar
Härtig, H., Nordine, J. C., & Neumann, K. (2020). Contextualisation in the assessment of students’ learning about science. In T. I. Sánchez (Ed.), International perspectives on the contextualisation of science education (pp. 113–144). Springer.
Google Scholar
Higham, P. A. (2013). Regulating accuracy on university tests with the plurality option. Learning and Instruction, 24, 26–36. https://doi.org/10.1016/j.learninstruc.2012.08.001
Web of Science ®Google Scholar
Kansup, W., & Hakstian, A. R. (1975). A comparison of several methods of assessing partial knowledge in multiple-choice tests: I. Scoring procedures. Journal of Educational Measurement, 12, 219–230.
Web of Science ®Google Scholar
Koretsky, M. D., Brooks, B. J., & Higgins, A. Z. (2016). Written justifications to multiple-choice concept questions during active learning in class. International Journal of Science Education, 38(11), 1747–1765. https://doi.org/10.1080/09500693.2016.1214303
Web of Science ®Google Scholar
Lesage, E., Valcke, M., & Sabbe, E. (2013). Scoring methods for multiple choice assessment in higher education–Is it still a matter of number right scoring or negative marking? Studies in Educational Evaluation, 39(3), 188–193. https://doi.org/10.1016/j.stueduc.2013.07.001
Web of Science ®Google Scholar
Linacre, J. (2002). Judging debacle in pairs figure skating. Rasch Measurement Transactions, 15(4), 839–840. https://www.rasch.org/rmt/rmt154a.htm
Google Scholar
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174. https://doi.org/10.1007/BF02296272
Web of Science ®Google Scholar
Minstrell, J. (1992). Facets of students’ knowledge and relevant instruction. Research in Physics Learning: Theoretical Issues and Empirical Studies, 110–128.
Google Scholar
Mortaz Hejri, S., Khabaz Mafinezhad, M., & Jalili, M. (2014). Guessing in multiple choice questions: Challenges and strategies. Iranian Journal of Medical Education, 14(7), 594–604. https://www.sid.ir/en/journal/ViewPaper.aspx?id=428372
Google Scholar
National Research Council. (2012). A framework for K-12 science education: Practices, crosscutting concepts, and core ideas. National Academies Press.
Google Scholar
Nedelsky, L. (1954). Ability to avoid gross error as a measure of achievement. Educational and Psychological Measurement, 14(3), 459–472. https://journals.sagepub.com/doi/pdf/10.1177/001316445401400303?casa_token=uxRB79T8IyUAAAAA:-KJYVOQgd2W7P8xIUmFn6gHii0wrojDi3keeo2dkOp2fAhCf91LiasTCFQ__bv3W4x6j2DPWEgf1
Web of Science ®Google Scholar
Neumann, I., Neumann, K., & Nehm, R. (2011). Evaluating instrument quality in science education: Rasch-based analyses of a nature of science test. International Journal of Science Education, 33(10), 1373–1405.
Web of Science ®Google Scholar
Neumann, K., Viering, T., Boone, W. J., & Fischer, H. E. (2013). Towards a learning progression of energy. Journal of Research in Science Teaching, 50(2), 162–188. https://doi.org/10.1002/tea.21061
Web of Science ®Google Scholar
NGSS Lead States. (2013). Next generation science standards: For states, by states. National Academies Press.
Google Scholar
Oon, P. T., & Fan, X. (2017). Rasch analysis for psychometric improvement of science attitude rating scales. International Journal of Science Education, 39(6), 683–700. https://doi.org/10.1080/09500693.2017.1299951
Web of Science ®Google Scholar
Opitz, S. T., Harms, U., Neumann, K., Kowalzik, K., & Frank, A. (2015). Students’ energy concepts at the transition between primary and secondary school. Research in Science Education, 45(5), 691–715. https://doi.org/10.1007/s11165-014-9444-8
Web of Science ®Google Scholar
Opitz, S. T., Neumann, K., Bernholt, S., & Harms, U. (2017). How do students understand energy in biology, chemistry, and physics? Development and validation of an assessment instrument. Eurasia Journal of Mathematics, Science and Technology Education, 13(7), 3019–3042. https://doi.org/10.12973/eurasia.2017.00703a
Web of Science ®Google Scholar
Patnaik, D., & Traub, R. E. (1973). Differential weighting by judged degree of correctness. Journal of Educational Measurement, 10(4), 281–286.
Web of Science ®Google Scholar
Piaget, J. (1966). The child’s conception of physical causality by jean piaget; marjorie gabain. Littlefield, Adams.
Google Scholar
Rasch, G. (1961, June). On general laws and the meaning of measurement in psychology. Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, 4, 321–333.
Google Scholar
Romine, W. L., Barrow, L. H., & Folk, W. R. (2013). Exploring secondary students’ knowledge and misconceptions about influenza: Development, validation, and implementation of a multiple-choice influenza knowledge scale. International Journal of Science Education, 35(11), 1874–1901. https://doi.org/10.1080/09500693.2013.778439
Web of Science ®Google Scholar
Romine, W. L., Schaffer, D. L., & Barrow, L. (2015). Development and application of a novel Rasch-based methodology for evaluating multi-tiered assessment instruments: Validation and utilization of an undergraduate diagnostic test of the water cycle. International Journal of Science Education, 37(16), 2740–2768.
Web of Science ®Google Scholar
Ruiz-Primo, M. A., Zhai, X., Li, M., Hernandez, D., Kanopka, K., Dong, D., Minstrell, J. (2019). Contextualised science assessments: Addressing the use of information and generalisation of inferences of students’ performance. Paper presented at the annual conference of the AERA, Toronto, Canada, April.
Google Scholar
Sadler, P. M. (1998). Psychometric models of student conceptions in science: Reconciling qualitative studies and distractor-driven assessment instruments. Journal of Research in Science Teaching, 35(3), 265–296.
Web of Science ®Google Scholar
Shuford, E. H., Albert, A., & Massengill, H. E. (1966). Admissible probability measurement procedures. Psychometrika, 31(2), 125–145.
PubMed Web of Science ®Google Scholar
Slepkov, A. D., & Godfrey, A. T. K. (2019). Partial credit in answer-until-correct multiple-choice tests deployed in a classroom setting. Applied Measurement in Education, 32(2), 138–150. https://doi.org/10.1080/08957347.2019.1577249
Web of Science ®Google Scholar
Slepkov, A. D., Vreugdenhil, A. J., & Shiell, R. C. (2016). Score increase and partial-credit validity when administering multiple-choice tests using an answer-until-correct format. Journal of Chemical Education, 93(11), 1839–1846. https://doi.org/10.1021/acs.jchemed.6b00028
Web of Science ®Google Scholar
Socan, G. (2015). Empirical option weights for multiple-choice items: Interactions with item properties and testing design. Advances in Methodology & Statistics / Metodoloski Zvezki, 12(1), 25–43. https://search.ebscohost.com/login.aspx?direct=true&AuthType=ip,shib&db=a9h&AN=113123494&site=ehost-live
Google Scholar
Toffoli, S. F. L., de Andrade, D. F., & Bornia, A. C. (2016). Evaluation of open items using the many-facet Rasch model. Journal of Applied Statistics, 43(2), 299–316. https://doi.org/10.1080/02664763.2015.1049938
Web of Science ®Google Scholar
Woitkowski, D. (2020). Tracing physics content knowledge gains using content complexity levels. International Journal of Science Education, 42(10), 1585–1608. https://doi.org/10.1080/09500693.2020.1772520
Web of Science ®Google Scholar
Wright, B. D., & Stone, M. H. (1979). Best test design. MESA Press.
Google Scholar
Wu, M. L., Adams, R., Wilson, M., & Haldane, S. (2007). ACER conquest version 2.0. ACER Press, Australian Council for Educational Research.
Google Scholar
Zhai, X., Haudek, K. C., Stuhlsatz, M. A., & Wilson, C. (2020). Evaluation of construct-irrelevant variance yielded by machine and human scoring of a science teacher PCK constructed response assessment. Studies in Educational Evaluation, 67, 100916. https://doi.org/10.1016/j.stueduc.2020.100916
Google Scholar
Zhai, X., Li, M., & Guo, Y. (2018). Teachers' use of learning progression-based formative assessment to inform teachers' instructional adjustment: A case study of two physics teachers' instruction. International Journal of Science Education, 40(15), 1832–1856. https://doi.org/10.1080/09500693.2018.1512772
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Validating a partial-credit scoring approach for multiple-choice science items: an application of fundamental ideas in science

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Validating a partial-credit scoring approach for multiple-choice science items: an application of fundamental ideas in science

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date