3,606
Views
99
CrossRef citations to date
0
Altmetric
Review Articles

Applying machine learning in science assessment: a systematic review

ORCID Icon, , , &
Pages 111-151 | Received 02 Aug 2019, Accepted 05 Feb 2020, Published online: 18 Mar 2020

References

  • Abrahams, I., Reiss, M. J., & Sharpe, R. M. (2013). The assessment of practical work in school science. Studies in Science Education, 49(2), 209–251.
  • Anderson, C. W., Santos, E. X. D. L., Bodby, S., Covitt, B. A., Edwards, K. D., Hancock, J. B., … Welch, M. M. (2018). Designing educational systems to support enactment of the next generation science standards. Journal of Research in Science Teaching, 55(7), 1026–1052.
  • Ary, D., Jacobs, L. C., Irvine, C. K. S., & Walker, D. (2018). Introduction to research in education. Boston, MA: Cengage Learning.
  • Assis Gomes, C. M., & Almeida, L. S. (2017). Advocating the broad use of the decision tree method in education. Practical Assessment, Research & Evaluation, 22(10), 1–10.
  • Beggrow, E. P., Ha, M., Nehm, R. H., Pearl, D., & Boone, W. J. (2014). Assessing scientific practices using machine-learning methods: How closely do they match clinical interview performance? Journal of Science Education and Technology, 23(1), 160–182.
  • Bejar, I. I., Mislevy, R. J., & Zhang, M. (2016). Automated scoring with validity in mind. In A. A. Rupp & J. P. Leighton (Eds.),  The handbook of cognition and assessment: Frameworks, methodologies, and applications (pp. 226–246). Chichester, England: Wiley-Blackwell.
  • Bennett, R. (2018). Educational assessment: What to watch in a rapidly changing world. Educational Measurement: Issues and Practice, 37(4), 7–15.
  • Bennett, R. E., & Bejar, I. I. (1998). Validity and automated scoring: It’s not only the scoring. Educational Measurement: Issues and Practice, 17(4), 9–17.
  • Bennett, R. E., & Zhang, M (2016). Validity and automated scoring. Technology and testing: Improving educational and psychological measurement, 142–173.
  • Breiman, L. (2001). Statistical modelling: The two cultures (with comments and a rejoinder by the author). Statistics Science, 16(3), 199–231.
  • Chen, C.-K. (2010). Curriculum assessment using artificial neural network and support vector machine modelling approaches: A case study. IR Applications, 9, 1–24.
  • Clauser, B., Kane, M., & Swanson, D. (2002). Validity issues for performance-based tests scored with computer-automated scoring systems. Applied Measurement in Education, 15(4), 413–432.
  • Crooks, T., Kane, M., & Cohen, A. (1996). Threats to the valid use of assessments. Assessment in Education, 3, 265–285.
  • Dietterich, T. G. (2000, June). Ensemble methods in machine learning. In J. Kittler & F. Roli (Eds.), International workshop on multiple classifier systems (pp. 1–15). Berlin, Heidelberg: Springer.
  • Donnelly, D. F., Vitale, J. M., & Linn, M. C. (2015). Automated guidance for thermodynamics essays: Critiquing versus revisiting. Journal of Science Education and Technology, 24(6), 861–874.
  • Elluri, S. (2017). A machine learning approach for identifying the effectiveness of simulation tools for conceptual understanding (10686333 M.S.). Purdue University, Ann Arbor.
  • Fleiss, J. L., Levin, B., & Paik, M. C. (1981). The measurement of interrater agreement. Statistical Methods for Rates and Proportions, 2(212–236), 22–23.
  • Frank, E., & Witten, I. H. (1998). Generating accurate rule sets without global optimization (Working paper 98/2). Hamilton, New Zealand: University of Waikato, Department of Computer Science.
  • Gane, B. D., Zaidi, S. Z., & Pellegrino, J. W. (2018). Measuring what matters: Using technology to assess multidimensional learning. European Journal of Education, 53(2), 176–187.
  • Gerard, L., Kidron, A., & Linn, M. C. (2019). Guiding collaborative revision of science explanations. International Journal of Computer-Supported Collaborative Learning, 14, 1–34.
  • Gerard, L., Matuk, C., McElhaney, K., & Linn, M. C. (2015). Automated, adaptive guidance for K-12 education. Educational Research Review, 15, 41–58.
  • Gerard, L. F., & Linn, M. C. (2016). Using automated scores of student essays to support teacher guidance in classroom inquiry. Journal of Science Teacher Education, 27(1), 111–129.
  • Gerard, L. F., Ryoo, K., McElhaney, K. W., Liu, O. L., Rafferty, A. N., & Linn, M. C. (2016). Automated guidance for student inquiry. Journal of Educational Psychology, 108(1), 60–81.
  • Ghali, R., Ouellet, S., & Frasson, C. (2016). LewiSpace: An exploratory study with a machine learning model in an educational game. Journal of Education and Training Studies, 4(1), 192–201.
  • Gobert, J. D., Baker, R., & Pedro, M. S. (2011). Using machine-learned detectors to assess and predict students’ inquiry performance. Retrieved from http://proxy.cc.uic.edu/login?url=https://search.proquest.com/docview/964185951?accountid=14552
  • Gobert, J. D., Baker, R., & Wixon, M. B. (2015). Operationalizing and detecting disengagement within online science microworlds. Educational Psychologist, 50(1), 43–57.
  • Gobert, J. D., Sao Pedro, M., Raziuddin, J., & Baker, R. S. (2013). From log files to assessment metrics: Measuring students’ science enquiry skills using educational data mining. Journal of the Learning Sciences, 22(4), 521–563.
  • Ha, M. (2015). Assessing scientific practices using machine learning methods: Development of automated computer scoring models for written evolutionary explanations (Doctoral dissertation). The Ohio State University. Predicting the accuracy of computer scoring of text: Probabilistic, multi-model, and semantic similarity approaches (pp. 1112-1117). IEEE.
  • Ha, M., & Nehm, R. (2016a). The impact of misspelled words on automated computer scoring: A case study of scientific explanations. Journal of Science Education and Technology, 25(3), 358–374.
  • Ha, M., & Nehm, R. (2016b, April, 14-17). Predicting the accuracy of computer scoring of text: Probabilistic, multi-model, and semantic similarity approaches. Paper in proceedings of the National Association for Research in Science Teaching, Baltimore, MD.
  • Ha, M., Nehm, R. H., Urban-Lurain, M., & Merrill, J. E. (2011). Applying computerized-scoring models of written biological explanations across courses and colleges: Prospects and limitations. CBE—Life Sciences Education, 10(4), 379–393.
  • Haudek, K. C., Prevost, L. B., Moscarella, R. A., Merrill, J., & Urban-Lurain, M. (2012). What are they thinking? Automated analysis of student writing about acid-base chemistry in introductory biology. CBE—Life Sciences Education, 11(3), 283–293.
  • Huang, C. C., Yeh, T. K., Li, T. Y., & Chang, C. Y. (2010). The idea storming cube: Evaluating the effects of using game and computer agent to support divergent thinking. Journal of Educational Technology & Society, 13(4), 180–191.
  • Huang, C.-J., Wang, Y.-W., Huang, T.-H., Chen, Y.-C., Chen, H.-M., & Chang, S.-C. (2011). Performance evaluation of an online argumentation learning assistance agent. Computers & Education, 57(1), 1270–1280.
  • Hutson, M. (2018). AI researchers allege that machine learning is alchemy. Science, 360(6388), 861.
  • Jordan, M. I. (2014). Statistics and machine learning. Retrieved from https://www.reddit.com/r/MachineLearning/comments/2fxi6v/ama_michael_i_jordan/ckelmtt/?context=3
  • Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260.
  • Jovic, A., Brkic, K., & Bogunovic, N. (2014). An overview of free software tools for general data mining. Paper presented at the 37th International Convention on the Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia.
  • Kane, M. T. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38(4), 319–342.
  • Kim, K. J., Pope, D. S., Wendel, D., & Meir, E. (2017). WordBytes: Exploring an intermediate constraint format for rapid classification of student answers on constructed response assessments. Journal of Educational Data Mining, 9(2), 45–71.
  • Klebanov, B., Burstein, J., Harackiewicz, J. M., Priniski, S. J., & Mulholland, M. (2017). Reflective writing about the utility value of science as a tool for increasing stem motivation and retention – Can AI help scale up? International Journal of Artificial Intelligence in Education, 27(4), 791–818.
  • Kyrilov, A, & Noelle, D. C. (2014). Using case-based reasoning to improve the quality of feedback generated by automated grading systems. International Association for the Development of the Information Society. doi:10.1145/2632320
  • Lead States, NGSS. (2013). Next generation science standards: for states, by states. Washington, DC: The National Academies Press.
  • Lee, H.-S., Pallant, A., Pryputniewicz, S., Lord, T., Mulholland, M., & Liu, O. L. (2019). Automated text scoring and real-time adjustable feedback: Supporting revision of scientific arguments involving uncertainty. Science Education, 103(3), 590–622.
  • Lintean, M., Rus, V., & Azevedo, R. (2012). Automatic detection of student mental models based on natural language student input during metacognitive skill training. International Journal of Artificial Intelligence in Education, 21(3), 169–190.
  • Liu, O. L., Brew, C., Blackmore, J., Gerard, L., Madhok, J., & Linn, M. C. (2014). Automated scoring of constructed‐response science items: Prospects and obstacles. Educational Measurement: Issues and Practice, 33(2), 19–28.
  • Liu, O. L., Rios, J. A., Heilman, M., Gerard, L., & Linn, M. C. (2016). Validation of automated scoring of science assessments. Journal of Research in Science Teaching, 53(2), 215–233.
  • Lottridge, S., Wood, S., & Shaw, D. (2018). The effectiveness of machine score-ability ratings in predicting automated scoring performance. Applied Measurement in Education, 31(3), 215–232.
  • Luaces, O., Díez, J., & Bahamonde, A. (2018). A peer assessment method to provide feedback, consistent grading and reduce students’ burden in massive teaching settings. Computers & Education, 126, 283–295.
  • Mao, L., Liu, O. L., Roohr, K., Belur, V., Mulholland, M., Lee, H.-S., & Pallant, A. (2018). Validation of automated scoring for a formative assessment that employs scientific argumentation. Educational Assessment, 23(2), 121–138.
  • Mason, R. A., & Just, M. A. (2016). Neural representations of physics concepts. Psychological Science, 27(6), 904–913.
  • Mayfield, E., & Rosé, C. (2010). SIDE: The summarization IDE (User’s manual). Carnegie Mellon University.
  • Mayfield, E., & Rosé, C. (2013). LightSIDE: Open source machine learning for text handbook of automated essay evaluation (pp. 146–157). New York, NY: Routledge.
  • Mertens, D. M. (2015). Research and evaluation in education and psychology: Integrating diversity with quantitative, qualitative, and mixed methods (Fourth ed.). Thousand Oaks, California: SAGE.
  • Mitchell, T. (1997). Machine learning. New York, NY: McGraw Hill.
  • Moharreri, K., Ha, M., & Nehm, R. H. (2014). EvoGrader: An online formative assessment tool for automatically evaluating written evolutionary explanations. Evolution: Education and Outreach, 7(1), 15.
  • Montalvo, O., Baker, R. S., Sao Pedro, M. A., Nakama, A., & Gobert, J. D. (2010). Identifying students’ inquiry planning using machine learning. Paper presented at the Educational Data Mining 2010. Pittsburgh, PA.
  • Moscarella, R. A., Urban-Lurain, M., Merritt, B., Long, T., Richmond, G., Merrill, J., ... & Wilson, C. (2008). Understanding undergraduate students' conceptions in science: Using lexical analysis software to analyze students' constructed responses in biology. In National Association for Research in Science Teaching 2008 Annual International Conference, Baltimore, MD
  • Muldner, K., Burleson, W., Van de Sande, B., & VanLehn, K. (2011). An analysis of students’ gaming behaviours in an intelligent tutoring system: Predictors and impacts. User Modeling and User-adapted Interaction, 21(1–2), 99–135.
  • Nakamura, C. M. (2012). The pathway active learning environment: an interactive web-based tool for physics education (Doctoral dissertation, Kansas State University).
  • Nakamura, C. M., Murphy, S. K., Christel, M. G., Stevens, S. M., & Zollman, D. A. (2016). Automated analysis of short responses in an interactive synthetic tutoring system for introductory physics. Physical Review Physics Education Research, 12(1), 010122.
  • National Research Council (2012). A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas. Committee on a Conceptual Framework for New K-12 Science Education Standards. Board on Science Education, Division of Behavioral and Social Sciences and Education. Washington, DC: The National Academies Press.
  • Nehm, R. H., Ha, M., & Mayfield, E. (2012). Transforming biology assessment with machine learning: Automated scoring of written evolutionary explanations. Journal of Science Education and Technology, 21(1), 183–196.
  • Nehm, R. H., & Haertig, H. (2012). Human vs. computer diagnosis of students’ natural selection knowledge: Testing the efficacy of text analytic software. Journal of Science Education and Technology, 21(1), 56–73.
  • Okoye, I., Sumner, T., & Bethard, S. (2013). Automatic extraction of core learning goals and generation of pedagogical sequences through a collection of digital library resources. Paper presented at the Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries. New York, NY, USA.
  • Okoye, I. U. (2015). Building an educational recommender system based on conceptual change learning theory to improve students’ understanding of science concepts (AAI3704786 PhD). University of Colorado at Boulder.
  • Pellegrino, J. W., DiBello, L. V., & Goldman, S. R. (2016). A framework for conceptualizing and evaluating the validity of instructional relevant assessments. Educational Psychologist, 51(1), 59–81.
  • Pellegrino, J. W., Wilson, M. R., Koenig, J. A., & Beatty, A. S. (2014). Developing assessments for the next generation science standards. Washington, D.C.: National Academies Press.
  • Pelletreau, K. N., Andrews, T., Armstrong, N., Bedell, M. A., Dastoor, F., & Dean, N., ... & Hall, D. (2016). A clicker-based study that untangles student thinking about the processes in the central dogma. CourseSource
  • Prevost, L. B., Smith, M. K., & Knight, J. K. (2016). Using student writing and lexical analysis to reveal student thinking about the role of stop codons in the central dogma. CBE—Life Sciences Education, 15(4), ar65.
  • Rodriguez, G., Pérez, J., Cueva, S., & Torres, R. (2017). A framework for improving web accessibility and usability of open course ware sites. Computers & Education, 109, 197–215.
  • Rupp, A. A. (2018). Designing, evaluating, and deploying automated scoring systems with validity in mind: Methodological design decisions. Applied Measurement in Education, 31(3), 191–214.
  • Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. Ibm Journal Of Research and Development, 3(3), 210-229. doi:10.1147/rd.33.0210
  • Sao Pedro, M., Baker, R. S., Montalvo, O., Nakama, A., & Gobert, J. D. (2010). Using text replay tagging to produce detectors of systematic experimentation behaviour patterns. Paper presented at the Educational Data Mining 2010.Pittsburgh, PA.
  • Shermis, M. D. (2015). Contrasting state-of-the-art in the machine scoring of short-form constructed responses. Educational Assessment, 20(1), 46–65.
  • Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., & Lanctot, M. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484.
  • Singh, S., Okun, A., & Jackson, A. (2017). Artificial intelligence: Learning to play Go from scratch. Nature, 550(7676), 336.
  • Steele, M. M., Merrill, J., Haudek, K., & Urban-Lurain, M. (2016). The development of constructed response astronomy assessment items. Paper presented at the National Association for Research in Science Teaching (NARST), Baltimore, MD.
  • Tansomboon, C., Gerard, L. F., Vitale, J. M., & Linn, M. C. (2017). Designing automated guidance to promote the productive revision of science explanations. International Journal of Artificial Intelligence in Education, 27(4), 729–757.
  • Vitale, J., Lai, K., & Linn, M. (2015). Taking advantage of automated assessment of student-constructed graphs in science. Journal of Research in Science Teaching, 52(10), 1426–1450.
  • Vitale, J. M., McBride, E., & Linn, M. C. (2016). Distinguishing complex ideas about climate change: Knowledge integration vs. specific guidance. International Journal of Science Education, 38(9), 1548–1569.
  • Wang, H.-C., Chang, C.-Y., & Li, T.-Y. (2008). Assessing creative problem-solving with automated text grading. Computers & Education, 51(4), 1450–1466.
  • Wiley, J., Hastings, P., Blaum, D., Jaeger, A. J., Hughes, S., Wallace, P., … Britt, M. A. (2017). Different approaches to assessing the quality of explanations following a multiple-document inquiry activity in science. International Journal of Artificial Intelligence in Education, 27(4), 758–790.
  • Williamson, D. M., Xi, X., & Breyer, F. J. (2012). A framework for evaluation and uses of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2–13.
  • WISE. (2019). Embedded automated scoring in Web-based Inquiry Science Environment. Retrieved from https://wise-research.berkeley.edu/projects/
  • Yan, J. (2014). A computer-based approach for identifying student conceptual change (1565371 M.S.). Purdue University, Ann Arbor.
  • Yang, Y., Buckendahl, C. W., Juszkiewicz, P. J., & Bhola, D. S. (2002). A review of strategies for validating computer-automated scoring. Applied Measurement in Education, 15(4), 391–412.
  • Yoo, J., & Kim, J. (2014). Can online discussion participation predict group project performance? Investigating the roles of linguistic features and participation patterns. International Journal of Artificial Intelligence in Education, 24(1), 8–32.
  • Zehner, F., Saelzer, C., & Goldhammer, F. (2016). Automatic coding of short text responses via clustering in educational assessment. Educational and Psychological Measurement, 76(2), 280–303.
  • Zhai, X. (2019, June) Applying machine learning in science assessment: Opportunity and challenge. ForJournal of Science Education and Technology. DOI: 10.13140/RG.2.2.10914.07365
  • Zhai, X. (2019, June) Parameters for machine learning (Internal Document: unpublished). DOI: 10.13140/RG.2.2.33563.31525
  • Zhai, X., Haudek, K., Shi, L., Nehm, R., & Urban-Lurain, M. (In press). From substitution to redefinition: A framework of machine learning-based science assessment. Journal of Research in Science Teaching.
  • Zhai, X., Zhang, M., & Li, M. (2018). One‐to‐one mobile technology in high school physics classrooms: Understanding its use and outcome. British Journal of Educational Technology, 49(3), 516–532.
  • Zhu, M., Lee, H.-S., Wang, T., Liu, O. L., Belur, V., & Pallant, A. (2017). Investigating the impact of automated feedback on students’ scientific argumentation. International Journal of Science Education, 39(12), 1648–1668.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.