1,385
Views
4
CrossRef citations to date
0
Altmetric
Original Articles

Unattended consequences: how text responses alter alongside PISA’s mode change from 2012 to 2015

ORCID Icon, ORCID Icon, & ORCID Icon

References

  • Bär, D., Zesch, T., & Gurevych, I. (2013). DKPro Similarity: An open source framework for text similarity. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp. 121–126). Sofia, Bulgaria: Association for Computational Linguistics.
  • Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412.
  • Barton, K. (2017). MuMIn: Multi-model Inference. Retrieved from https://CRAN.R-project.org/package=MuMIn
  • Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48.
  • Bridgeman, B., Lennon, M. L., & Jackenthal, A. (2001). Effects of screen size, screen resolution, and display rate on computer-based test performance (RR-01-23 ed.). Princeton, NJ: Educational Testing Service.
  • Brown, C., Snodgrass, T., Kemper, S. J., Herman, R., & Covington, M. A. (2008). Automatic measurement of propositional idea density from part-of-speech tagging. Behavior Research Methods, 40(2), 540–545.
  • Choi, S. W., & Tinkler, T. (2002). Evaluating comparability of paper-and-pencil and computer- based assessment in a K-12 setting. In Annual meeting of the national council on measurement in education. New Orleans, LA: NCME.
  • Clariana, R., & Wallace, P. (2002). Paper-based versus computer-based assessment: Key factors associated with the test mode effect. British Journal of Educational Technology, 33(5), 593–602.
  • Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.
  • Eklöf, H., & Knekta, E. (2017). Using large-scale educational data to test motivation theories: A synthesis of findings from Swedish studies on test-taking motivation. International Journal of Quantitative Research in Education, 4(1/2), 52.
  • Ercikan, K., & Pellegrino, J. W. (2015). Validation of score meaning for the next generation of assessments: The use of response processes. London: Taylor and Francis.
  • Feng, L., Lindner, A., Ji, X. R., & Malatesha Joshi, R. (2017). The roles of handwriting and keyboarding in writing: A meta-analytic review. Reading and Writing, 85, 1–31.
  • Goldhammer, F., Martens, T., & Lüdtke, O. (2017). Conditioning factors of test-taking engagement in PIAAC: An exploratory IRT modelling approach considering person and item characteristics. Large-Scale Assessments in Education, 5(1), 18. Retrieved from https://doi.org/10.1186/s40536-017-0051-9.
  • Goldhammer, F., Naumann, J., Rölke, H., Stelter, A., & Tóth, K. (2017). Relating product data to process data from computer-based competency assessment. In D. Leutner, J. Fleischer, J. Grünkorn, & E. Klieme (Eds.), Competence assessment in education: Research, models and instruments (pp. 407–425). Cham: Springer International Publishing. Retrieved from https://doi.org/10.1007/978-3-319-50030-0_24
  • Goldhammer, F., & Zehner, F. (2017). What to make of and how to interpret process data. Measurement: Interdisciplinary Research and Perspectives, 15(3–4), 128–132.
  • Graesser, A. C., & Clark, L. F. (1985). Structures and procedures of implicit knowledge (Vol. 17). Norwood, N.J.: Ablex.
  • Graesser, A. C., & Franklin, S. P. (1990). QUEST: A cognitive model of question answering. Discourse Processes, 13(3), 279–303.
  • Graesser, A. C., & Murachver, T. (1985). Symbolic procedures of question answering. In A. C. Graesser & J. Black (Eds.), The psychology of questions (pp. 15–88). Hillsdale, N. J: Erlbaum.
  • Gurevych, I., Mühlhäuser, M., Müller, C., Steimle, J., Weimer, M., & Zesch, T. (2007). Darmstadt knowledge processing repository based on UIMA. In Proceedings of the First Work- shop on Unstructured Information Management Architecture at Biannual Conference of the Society for Computational Linguistics and Language Technology. Tübingen, Germany.
  • Horkay, N., Bennett, R. E., Allen, N., Kaplan, B., & Yan, F. (2006). Does it matter if I take my writing test on computer? An empirical study of mode effects in NAEP. Journal of Technology, Learning, and Assessment, 5 (2).
  • Jaeger, B. (2017). r2glmm: Computes R squared for mixed (multilevel) models. Retrieved from https://CRAN.R-project.org/package=r2glmm
  • Jurgens, D., & Stevens, K. (2010). The S-Space package: An open source package for word space models. In Association for Computational Linguistics (Ed.), 48th Annual Meeting of the Association for Computational Linguistics (pp. 30–35).
  • Kintsch, W., & van Dijk, T. A. (1978). Toward a model of text comprehension and production. Psychological Review, 85(5), 363–394.
  • Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking: Methods and practices. (3rd. ed.). New York, NY: Springer.
  • Kröhne, U., & Martens, T. (2011). Computer-based competence tests in the National educational panel study: The challenge of mode effects. Zeitschrift für Erziehungswissenschaft, 14(S2), 169–186.
  • Lafontaine, D., & Monseur, C. (2009). Gender gap in comparative studies of reading compre- hension: To what extent do the test characteristics make a difference? European Educational Research Journal, 8(1), 69–79.
  • Mazzeo, J., & von Davier, M. (2008). Review of the Programme for International Student Assessment (PISA) test design: Recommendations for fostering stability in assessment results. Education Working Papers EDU/PISA/GB (2008), 28, 23–24.
  • Mead, A. D., & Drasgow, F. (1993). Equivalence of computerized and paper-and-pencil cognitive ability tests: A meta-analysis. Psychological Bulletin, 114(3), 449–458.
  • Mullis, I. V. S., Martin, M. O., Foy, P., & Drucker, K. T. (2012). PIRLS 2011 international results in reading. Chestnut Hill, MA: Boston College.
  • Nakagawa, S., Schielzeth, H., & O’Hara, R. B. (2013). A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution, 4(2), 133–142.
  • Naumann, J., & Sälzer, C. (2017). Digital reading proficiency in German 15-year olds: Evidence from PISA 2012. Zeitschrift für Erziehungswissenschaft, 20(4), 585–603.
  • NCES. (2015). The nation’s report card: 2015 mathematics and reading assessments. Retrieved 16.01.2016, from http://www.nationsreportcard.gov/reading_math_2015
  • Noyes, J. M., & Garland, K. J. (2008). Computer- vs. paper-based tasks: Are they equivalent? Ergonomics, 51(9), 1352–1375.
  • OECD. (2006). PISA released items - reading. Retrieved 18.02.2016, from http://www.oecd.org/pisa/38709396.pdf
  • OECD. (2010). PISA 2009 assessment framework: Key competencies in reading, mathematics and science. Paris: OECD Publishing.
  • OECD. (2016). PISA 2015 results (volume I). Paris: OECD Publishing.
  • OECD. (2017a). PISA 2015 assessment and analytical framework: Science, reading, mathematic, financial literacy and collaborative problem solving. Paris: OECD Publishing.
  • OECD. (2017b). PISA 2015 technical report. Paris: OECD Publishing.
  • Piaw, C. Y. (2011). Comparisons between computer-based testing and paper-pencil testing: Testing effect, test scores, testing time and testing motivation. In Proceedings of the Informatics Conference (pp. 1–9).
  • Porter, M. (2001). Snowball: A language for stemming algorithms. Retrieved from http://snowball.tartarus.org/texts/introduction.html
  • Prenzel, M., Sälzer, C., Klieme, E., & Köller, O. (Eds.). (2013). PISA 2012: Fortschritte und Herausforderungen in Deutschland. Münster: Waxmann.
  • R Core Team. (2017). R: A language and environment for statistical computing. Vienna. Retrieved from http://www.R-project.org/
  • Rafferty, A. N., & Manning, C. D. (2008). Parsing three German treebanks: Lexicalized and unlexicalized baselines. In Proceedings of the Workshop on Parsing German (pp. 40–46).
  • Reiss, K., Sälzer, C., Schiepe-Tiska, A., Klieme, E., & Köller, O. (Eds.). (2016). PISA 2015: Eine Studie zwischen Kontinuität und Innovation. Münster: Waxmann.
  • Robitzsch, A., Lüdtke, O., Köller, O., Kröhne, U., Goldhammer, F., & Heine, J.-H. (2017). Herausforderungen bei der Schätzung von Trends in Schulleistungsstudien [Challenges for Trend Estimation in Educational Assessments]. Diagnostica, 63(2), 148–165.
  • Schiller, A., Teufel, S., Stöckert, C., & Thielen, C. (1999). Guidelines für das Tagging deutscher Textkorpora mit STTS. University of Stuttgart and University of Tübingen.
  • Schwabe, F., McElvany, N., & Trendtel, M. (2015). The school age gender gap in reading achievement: Examining the influences of item format and intrinsic reading motivation. Reading Research Quarterly, 50(2), 219–232.
  • Singer, L. M., & Alexander, P. A. (2017). Reading on paper and digitally: What the past decades of empirical research reveal. Review of Educational Research, 87(6), 1007–1041.
  • Stanat, P., Böhme, K., Schipolowski, S., & Haag, N. (Eds.). (2016). IQB-Bildungstrend 2015: Sprachliche Kompetenzen am Ende der 9. Jahrgangsstufe im zweiten Ländervergleich. Münster and New York: Waxmann.
  • Stroup, W. W. (2012). Generalized Linear Mixed Models: Modern concepts, methods and applications. Hoboken: CRC Press.
  • Tierney, L., Rossini, A. J., Li, N., & Sevcikova, H. (2016). Snow: Simple network of workstations. Retrieved from https://CRAN.R-project.org/package=snow
  • Tourangeau, R., Rips, L. J., & Rasinski, K. A. (2009). The psychology of survey response (10th print ed.). Cambridge: Cambridge Univ. Press.
  • Wang, S., Jiao, H., Young, M. J., Brooks, T., & Olson, J. (2007). Comparability of computer- based and paper-and-pencil testing in K–12 reading assessments. Educational and Psychological Measurement, 68(1), 5–24.
  • Weis, M., Zehner, F., Sälzer, C., Strohmaier, A., Artelt, C., & Pfost, M. (2016). Lesekompetenz in PISA 2015: Ergebnisse, Veränderungen und Perspektiven. In K. Reiss & C. Sälzer, A. Schiepe-Tiska, E. Klieme, & O. Köller (Eds.), PISA 2015: Eine Studie zwischen Kontinuität und Innovation (pp. 249–283). Münster: Waxmann.
  • White, S., Kim, Y. Y., Chen, J., & Liu, F. (2015). Performance of fourth-grade students in the 2012 NAEP computer-based writing pilot assessment: Scores, text length, and use of editing tools. Working Paper Series, NCES 2015–119.
  • Zehner, F., Goldhammer, F., & Sälzer, C. (2018). Automatically analyzing text responses for exploring gender-specific cognitions in PISA reading. Large-scale Assessments in Education, 6(7). Retrieved from https://doi.org/10.1186/s40536-018-0060-3.
  • Zehner, F., Sälzer, C., & Goldhammer, F. (2016). Automatic coding of short text responses via clustering in educational assessment. Educational and Psychological Measurement, 76(2), 280–303. Retrieved from https://doi.org/10.1177/0013164415590022.
  • Zesch, T., Müller, C., & Gurevych, I. (2008). Extracting lexical semantic knowledge from Wikipedia and Wiktionary. In Proceedings of the 6th International Conference on Language Resources and Evaluation. Marrakech and Morocco.