Validating performance assessments: measures that may help to evaluate students’ expertise in ‘doing science’

Pitt HildZurich University of Teacher Education, Zurich, SwitzerlandCorrespondence[email protected]

https://orcid.org/0000-0001-8019-5132

Christoph GutZurich University of Teacher Education, Zurich, Switzerland

Maja BrückmannDepartment of Educational and Social Sciences, University of Oldenburg, Germany

https://orcid.org/0000-0003-0724-908X

ABSTRACT

Background: Several different measures have been proposed to solve persistent validity problems, such as high task-sampling variability, in the assessment of students’ expertise in ‘doing science’. Such measures include working with a-priori progression models, using standardised item shells and rating manuals, augmenting the number of tasks per student and comparing different measurement methods.

Purpose: The impact of these measures on instrument validity is examined here under three different aspects: structural validity, generalisability and external validity.

Sample: Performance assessments were administered to 418 students (187 girls, ages 12–16) in grades 7, 8 and 9 in the 2 lowest school performance tracks in (lower) secondary school in the Swiss canton of Zurich.

Design and methods: Students worked with printed test sheets on which they were asked to report the outcomes of their investigations. In addition to the written protocols, direct observations and interviews were used as measurement methods. Evidence of the instruments’ validity was reported by using different reliability and generalisability coefficients and by comparing our results to those found in literature.

Results: An a-priori progression model was successfully used to improve the instrument’s structural validity. The use of a standardised item shell and rating manual ensured reliable rating of the written protocols (.79 ≤ p₀ ≤ .98; .56 ≤ κ ≤ .97). Augmenting the number of tasks per student did not solve the challenge of reducing task-sampling variability. The observed performance differed from the performance assessed via the written protocols.

Conclusions: Students’ performance in doing science can be reliably assessed with instruments that show good generalisability coefficients (ρ² = 0.72 in this case). Even after implementing the different measures, task-sampling variability remains high $({\hat{σ}}_{p t}^{2} = 47.2 %)$ . More elaborate studies that focus on the substantive aspect of validity must be conducted to understand why students’ expertise as shown in written protocols differs so markedly from their observed performance.

KEYWORDS:

Disclosure statement

No potential conflict of interest was reported by the authors.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Validating performance assessments: measures that may help to evaluate students’ expertise in ‘doing science’

Information for

Open access

Opportunities

Help and information

Validating performance assessments: measures that may help to evaluate students’ expertise in ‘doing science’

ABSTRACT

Disclosure statement

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature