Search in:

Advanced search

Research in Science & Technological Education Volume 37, 2019 - Issue 4

Submit an article Journal homepage

478

Views

CrossRef citations to date

Altmetric

Articles

Validating performance assessments: measures that may help to evaluate students’ expertise in ‘doing science’

Pitt HildZurich University of Teacher Education, Zurich, SwitzerlandCorrespondence[email protected]

https://orcid.org/0000-0001-8019-5132

Christoph GutZurich University of Teacher Education, Zurich, Switzerland

Maja BrückmannDepartment of Educational and Social Sciences, University of Oldenburg, Germany

https://orcid.org/0000-0003-0724-908X

Pages 419-445 | Published online: 28 Dec 2018

Cite this article
https://doi.org/10.1080/02635143.2018.1552851
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Berry, D. C. 1991. “The Role of Action in Implicit Learning.” The Quarterly Journal of Experimental Psychology 43A (4). doi:10.1080/14640749108400961.
Google Scholar
Brennan, R. L. 1996. “Generalizability of Performance Assessments.” In Technical Issues in Large-Scale Performance Assessment, edited by G. W. Phillips, 19–58. Washington, DC: National Center for Education Statistics.
Google Scholar
Brennan, R. L. 2000. “Performance Assessments from the Perspective of Generalizability Theory.” Applied Psychological Measurement 24 (4): 339–353. doi:10.1177/01466210022031796.
Web of Science ®Google Scholar
Commons, M. L., E. A. Goodheart, A. Pekker, T. L. Dawson, K. Draney, and K. M. Adams. 2008. “Using Rasch Scaled Stage Scores to Validate Orders of Hierarchical Complexity of Balance Beam Task Sequences.” Journal of Applied Measurement 9 (2): 182.
PubMedGoogle Scholar
Cronbach, L. J., N. Rajaratnam, and G. C. Gleser. 1963. “Theory of Generalizability: A Liberalization of Reliability Theory.” British Journal of Mathematical and Statistical Psychology 16 (2). doi:10.1111/j.2044-8317.1963.tb00206.x.
Web of Science ®Google Scholar
Cronbach, L. J., R. L. Linn, R. L. Brennan, and E. H. Haertel. 1997. “Generalizability Analysis for Performance Assessments of Student Achievement or School Effectiveness.” Educational and Psychological Measurement 57 (3): 373–399. doi:10.1177/0013164497057003001.
Web of Science ®Google Scholar
DeBoer, G. E. 2000. “Scientific Literacy: Another Look at Its Historical and Contemporary Meanings and Its Relationship to Science Education Reform.” Journal of Research in Science Teaching 37 (6): 582–601. doi:10.1002/1098-2736(200008)37:6<582::AID-TEA5>3.0.CO;2-L.
Web of Science ®Google Scholar
Erickson, G. 1994. “Pupils’ Understanding of Magnetism in a Practical Assessment Context: The Relationship between Content, Process and Progression.” In The Content of Science, edited by P. Fensham, G. Richard, and R. White, 80–97. London: Falmer.
Google Scholar
Fechner, S. 2009. “Effects of Context-Oriented Learning on Student Interest and Achievement in Chemistry Education.” Dissertation, Berlin: Logos-Verlag.
Google Scholar
Gao, X., R. J. Shavelson, and G. P. Baxter. 1994. “Generalizability of Large-Scale Performance Assessments in Science: Promises and Problems.” Applied Measurement in Education 7 (4): 323–342. doi:10.1207/s15324818ame0704_4.
Google Scholar
Gott, R., and S. Duggan. 1996. “Practical Work: Its Role in the Understanding of Evidence in Science.” International Journal of Science Education 18 (7): 791–806. doi:10.1080/0950069960180705.
Web of Science ®Google Scholar
Gott, R., and S. Duggan. 2002. “Problems with the Assessment of Performance in Practical Science: Which Way Now?” Cambridge Journal of Education 32 (2). doi:10.1080/03057640220147540.
Google Scholar
Gut, C. 2012. “Modellierung und Messung experimenteller Kompetenz: Analyse eines large-scale Experimentiertests.” Dissertation. Berlin: Logos-Verlag.
Google Scholar
Gut, C., P. Hild, S. Metzger, and J. Tardent. 2017. "Vorvalidierung des ExKoNawi-Modells". In Implementation fachdidaktischer Innovation im Spiegel von Forschung und Praxis, edited by C. Maurer. Gesellschaft für Didaktik der Chemie und Physik, Jahrestagung in Zürich 2016, 328–331.
Google Scholar
Gut, C., S. Metzger, P. Hild, and J. Tardent. 2014. "Problemtypenbasierte Modellierung und Messung experimenteller Kompetenzen von 12- bis 15-jährigen Jugendlichen". PhyDid B, Beiträge zur DPG-Frühjahrstagung 2014.
Google Scholar
Haertel, E. H., and R. L. Linn. 1996. “Comparability.” In Technical Issues in Large-Scale Performance Assessment, edited by G. W. Phillips, 59–78. Washington, DC: National Center for Education Statistics.
Google Scholar
Hammann, M., T. T. H. Phan, M. Ehmer, and T. Grimm. 2008. “Assessing Pupils Skills in Experimentation.” Journal of Biological Education 42 (2): 66–72. doi:10.1080/00219266.2008.9656113.
Web of Science ®Google Scholar
Harmon, M., T. A. Smith, M. O. Martin, D. L. Kelley, A. E. Beaton, I. V. S. Mullis, E. J. Gonzalez, et al. 1997. Performance Assessment in IEA’s Third International Mathematics and Science Study. Chestnut Hill, MA: Boston College.
Google Scholar
Hart, C., P. Mulhall, A. Berry, J. Loughran, and R. Gunstone. 2000. “What Is the Purpose of This Experiment? or Can Students Learn Something from Doing Experiments?” Journal of Research in Science Teaching 37 (7): 655–675. doi:10.1002/1098-2736(200009)37:7<655::AID-TEA3>3.0.CO;2-E.
Web of Science ®Google Scholar
Hild, P., Tardent, C. Gut, and S. Metzger. 2015. “Typenspezifische Kompetenzprogressionen bei hands-on Testaufgaben.” In Heterogenität Und Diversität - Vielfalt der Voraussetzungen im naturwissenschaftlichen Unterricht, edited by S. Bernholt. Gesellschaft für Didaktik der Chemie und Physik, Jahrestagung in Bremen 2014, 145–147.
Google Scholar
Hild, P., C. Gut, S. Metzger, and J. Tardent. 2018. “Zur Generalisierbarkeit bei Experimentiertests.” Qualitätsvoller Chemie- und Physikunterricht - normative und empirische Dimensionen, edited by C. Maurer. Gesellschaft für Didaktik der Chemie und Physik, Jahrestagung in Regensburg, 2017, 348–351.
Google Scholar
Hild, P., S. Metzger, and I. Parchmann. 2018. "Beurteilung und Förderung experimenteller Kompetenzen anhand von Aufgaben zum effektbasierten Vergleichen". ChemKon (3): 90–97. doi: 10.1002/ckon.201810322
Google Scholar
Hodson, D. 2009. Teaching and Learning about Science: Language, Theories, Methods, History, Traditions and Values. Rotterdam: Sense Publ.
Google Scholar
Hofstein, A. 2004. “The Laboratory in Chemistry Education: Thirty Years of Experience with Developments, Implementation, and Research.” Chemistry Education Research and Practice 5 (3): 247–264. doi:10.1039/B4RP90027H.
Google Scholar
Hungerford, H. R., and D. T. Miles. 1969. “A Test to Measure Observation and Comparison Skills in Science.” Science Education 53 (1): 61–66. doi:10.1002/sce.3730530115.
Google Scholar
Jovanovic, J., G. Solano-Flores, and R. J. Shavelson. 1994. “Performance-Based Assessments.” Education and Urban Society 26: 352–366. doi:10.1177/0013124594026004004.
Web of Science ®Google Scholar
Kane, M., T. Crooks, and A. Cohen. 1999. “Validating Measures of Performance.” Educational Measurement: Issues and Practice 18 (2): 5–17. doi:10.1111/j.1745-3992.1999.tb00010.x.
Google Scholar
Labudde, P., C. Niedegger, M. Adamina, and F. Gingins. 2012. “The Development, Validation, and Implementation of Standards in Science Education: Chances and Difficulties in the Swiss Project HarmoS.” In Making It Tangible. Learning Outcomes in Science Education, edited by S. Bernholt, K. Neumann, and P. Nentwig, 235–259. Münster: Waxmann.
Google Scholar
Marsh, H. W., U. Trautwein, O. Lüdtke, O. Köller, and J. Baumert. 2005. “Academic Self-Concept, Interest, Grades, and Standardized Test Scores: Reciprocal Effects Models of Causal Ordering.” Child Development 76 (2): 397–416. doi:10.1111/j.1467-8624.2005.00853.x.
PubMed Web of Science ®Google Scholar
Messick, S. 1989. “Validity.” In Educational Measurement, edited by R. L. Linn, 13–103. 3rd ed. New York: Macmillan.
Google Scholar
Messick, S. 1994. “The Interplay of Evidence and Consequences in the Validation of Performance Assessments.” Educational Researcher 23 (2): 13–23. doi:10.3102/0013189X023002013.
Google Scholar
Messick, S. 1996. “Validity of Performance Assessments.” In Technical Issues in Large-Scale Performance Assessment, edited by G. W. Phillips, 1–18. Washington, DC: National Center for Education Statistics.
Google Scholar
Metzger, S., C. Gut, P. Hild, and J. Tardent. 2014. “Modelling and Assessing Experimental Competence: an Interdisciplinary Progression Model for Hands-on Assessments.” In E-proceedings of The ESERA 2013 conference in Nicosia.
Google Scholar
Meyer, K., and R. Carlisle. 1996. “Children as Experimenters.” International Journal of Science Education 18 (2): 231–248. doi:10.1080/0950069960180207.
Web of Science ®Google Scholar
Millar, R., R. Gott, F. Lubben, and S. Duggan. 1996. “Children’s Performance of Investigative Tasks in Science: A Framework for considering Progression.” In Progression in Learning, edited by M. Hughes, 82–108. Clevedon, UK: Multilingual Matters .
Google Scholar
Miller, M. D. 1998. Generalizability of Performance-Based Assessments. Washington DC: Council of the Chief State School Ofﬁcers.
Google Scholar
Miller, M. D., and R. L. Linn. 2000. “Validation of Performance-Based Assessments.” Applied Psychological Measurement 24 (4): 367–378. doi:10.1177/01466210022031813.
Web of Science ®Google Scholar
Mushquash, C., and B. P. O´Connor. 2006. “SPSS and SAS Programs for Generalizability Theory Analyses.” Behavior Research Methods 38 (3): 542–547. doi:10.3758/BF03192810.
PubMed Web of Science ®Google Scholar
Pintrich, P. R., and E. V. DeGroot. 1990. “Motivational and Self-Regulated Learning Components of Classroom Academic Performance.” Journal of Educational Psychology 82 (1): 33. doi:10.1037//0022-0663.82.1.33.
Web of Science ®Google Scholar
Ruiz-Primo, M. A., G. P. Baxter, and R. J. Shavelson. 1993. “On the Stability of Performance Assessments.” Journal of Educational Measurement 30 (1): 41–53. doi:10.1111/j.1745-3984.1993.tb00421.x.
Web of Science ®Google Scholar
Ruiz-Primo, M. A., and R. J. Shavelson. 1996. “Rhetoric and Reality in Science Performance Assessments: An Update.” Journal of Research in Science Teaching 33 (10): 1045–1063. doi:10.1002/(SICI)1098-2736(199612)33:10<1045::AID-TEA1>3.0.CO;2-S.
Web of Science ®Google Scholar
Schauble, L., L. E. Klopfer, and K. Raghavan. 1991. “Students’ Transition from an Engineering Model to a Science Model of Experimentation.” Journal of Research in Science Teaching 28 (9): 859–882. doi:10.1002/tea.3660280910.
Web of Science ®Google Scholar
Schreiber, N., H. Theyßen, and H. Schecker. 2016. “Process-Oriented and Product-Oriented Assessment of Experimental Skills in Physics: A Comparison.” In Insights from Research in Science Teaching and Learning, Contributions from Science Education Research, edited by N. Papadouris, A. Hadjigeorgiou, Angela, C. Constantinou, 29–43. Switzerland: Springer-Verlag.
Google Scholar
Schreiber, N., H. Theyßen, and H. Schecker. 2014. “Diagnostik experimenteller Kompetenz: Kann man Realexperimente durch Simulationen ersetzen?” [Diagnostics of experimental skills: On the exchangeability of hands-on and simulation-based assessment tools.] Zeitschrift für Didaktik der Naturwissenschaften 20 (1): 161–173. doi:10.1007/s40573-014-0017-1.
Google Scholar
Schwichow, M., C. Zimmerman, S. Croker, and H. Härtig. 2016. “What Students Learn from Hands-On Activities: Hands-On versus Paper-and-Pencil.” Journal of Research in Science Teaching 53 (7): 980–1002. doi:10.1002/tea.21320.
Web of Science ®Google Scholar
Shavelson, R. J., G. Solano-Flores, and M. A. Ruiz-Primo. 1998. “Toward a Science Performance Assessment Technology.” Evaluation and Programming Planning 21. doi:10.1016/S0149-7189(98)00005-6.
Web of Science ®Google Scholar
Shavelson, R. J., G. P. Baxter, and J. Pine. 1991. “Performance Assessment in Science.” Applied Measurement in Education 4 (4): 347–362. doi:10.1207/s15324818ame0404_7.
Google Scholar
Shavelson, R. J., and M. A. Ruiz-Primo. 1998. On the Assessment of Science Achievement - Conceptual Underpinnings for the Design of Performance Assessments. California: University of Los Angeles. National Center for Research on Evaluation, Standards, and Student Testing. Report 491.
Google Scholar
Shavelson, R. J., M. A. Ruiz-Primo, and E. W. Wiley. 1999. “Note on Sources of Sampling Variability in Science Performance Assessments.” Journal of Educational Measurement 36 (1): 61–71. doi:10.1111/j.1745-3984.1999.tb00546.x.
Web of Science ®Google Scholar
Shavelson, R. J., and N. M. Webb. 1981. “Generalizability Theory: 1973–1980.” British Journal of Mathematical and Statistical Psychology 34: 133–166. doi:10.1111/j.2044-8317.1981.tb00625.x.
Web of Science ®Google Scholar
Shavelson, R. J., X. Gao, and G. P. Baxter. 1993. Sampling Variability of Performance Assessments. California: University of Los Angeles. National Center for Research on Evaluation, Standards, and Student Testing. Report 142.
Google Scholar
Solano-Flores, G. 1994. “A Logical Model for the Development of Science Performance Assessments.” Dissertation, University of Santa Barbara.
Google Scholar
Solano-Flores, G., J. Javanovic, R. J. Shavelson, and M. Bachman. 1999. “On the Development and Evaluation of a Shell for Generating Science Performance Assessments.” International Journal of Science Education 21 (3): 293–315. doi:10.1080/095006999290714.
Web of Science ®Google Scholar
Solano-Flores, G., and R. J. Shavelson. 1997. “Development of Performance Assessments in Science: Conceptual, Practical, and Logistical Issues.” Educational Measurement: Issues and Practice 16 (3): 16–24. doi:10.1111/j.1745-3992.1997.tb00596.x.
Google Scholar
Solano-Flores, G., R. J. Shavelson, S. E. Schultz, and E. W. Wiley. 1997. On the Development and Scoring of Classification and Observation Science Performance Assessments. California: University of Los Angeles. National Center for Research on Evaluation, Standards, and Student Testing. Report 458.
Google Scholar
Stecher, B. M. 1996. Performance Assessments in Science. Santa Monica: RAND.
Google Scholar
Stecher, B. M., S. P. Klein, G. Solano-Flores, D. McCaffrey, A. Robyn, R. J. Shavelson, and E. H. Haertel. 2000. “The Effects of Content, Format, and Inquiry Level on Science Performance Assessment Scores.” Applied Measurement in Education 13 (2): 139–160. doi:10.1207/S15324818AME1302_2.
Web of Science ®Google Scholar
Taut, S., and K. Rakoczy. 2016. “Observing Instructional Quality in the Context of School Evaluation.” Learning and Instruction 46: 45–60. doi:10.1016/j.learninstruc.2016.08.003.
Web of Science ®Google Scholar
Toh, K.-A., and B. E. Woolnough. 1990. “Assessing, through Reporting, the Outcomes of Scientific Investigations.” Educational Research 32 (1): 59–65. doi:10.1080/0013188900320107.
Web of Science ®Google Scholar
Tomera, A. N. 1974. “Transfer and Retention of Transfer of the Science Process of Observation and Comparison in Junior High School Students.” Science Education 58 (2). doi:10.1002/sce.3730580209.
Google Scholar
Vorholzer, A., C. von Aufschnaiter, and S. Kirschner. 2016. “Entwicklung und Erprobung eines Tests zur Erfassung des Verständnisses experimenteller Denk- und Arbeitsweisen.” [Development of an instrument to assess students’ knowledge of scientific inquiry.] Zeitschrift für Didaktik der Naturwissenschaften 22: 25–41. doi:10.1007/s40573-015-0039-3.
Google Scholar
Webb, N. M., J. Schlackman, and B. Sugrue. 2000. “The Dependability and Interchangeability of Assessment Methods in Science.” Applied Measurement in Education 13 (3): 277–301. doi:10.1207/S15324818AME1303_4.
Web of Science ®Google Scholar
Webb, N. M., R. J. Shavelson, and E. H. Haertel. 2006. “Reliability Coefficients and Generalizability Theory”. In Handbook of Statistics, Vol. 26, 1st ed., 81–124. North Holland: Elsevier Psychometrics. doi:10.1016/S0169-7161(06)26004-8.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Validating performance assessments: measures that may help to evaluate students’ expertise in ‘doing science’

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Validating performance assessments: measures that may help to evaluate students’ expertise in ‘doing science’

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date