505
Views
16
CrossRef citations to date
0
Altmetric
Original Articles

Measuring Classroom Assessment Practice Using Instructional Artifacts: A Validation Study of the QAS Notebook

, , , &
Pages 107-131 | Published online: 20 Sep 2012
 

Abstract

We report the results of a pilot validation study of the Quality Assessment in Science Notebook, a portfolio-like instrument for measuring teacher assessment practices in middle school science classrooms. A statewide sample of 42 teachers collected 2 notebooks during the school year, corresponding to science topics taught in the fall and spring. Each notebook was scored on 9 dimensions of assessment practice by 3 trained raters. Our analysis investigated the reliability and validity of notebook ratings, with particular emphasis on identifying key sources of error in the ratings. The results suggest that variation in teacher practice across notebooks (i.e., over time) was more important than idiosyncratic rater inconsistencies as a source of error in the scores. The validity results point to a dominant factor underlying the ratings and some predictive power of notebook ratings on student achievement. We discuss implications of the results for measuring assessment practice through artifacts, drawing conceptual and methodological lessons about our model of assessment practice, the consistency of raters, and the estimation of variance over time with classroom-based measures of instruction.

Notes

1The 1996 Standards are in the process of being replaced with new standards based on the Framework for K–12 Science Education (NRC, 2011). It is important to note that the two frameworks share many key features as it relates to instruction and particularly to assessment practice (NRC, 2001).

2The group included two university-based science education experts and two experienced science teachers. To operationalize the model of classroom assessment, the research team and expert advisors engaged in an iterative process, drafting, reviewing, discussing, and refining multiple drafts of the dimensions and the accompanying rubrics over a period of 5 months. Then the two teachers piloted a draft version of the notebook instrument and were asked to provide in-depth feedback on it and on the draft dimensions at that point (in writing and during a debriefing interview

3Because teachers were dispersed across the state we could not meet in person to offer training for completing the notebook. In addition to the detailed instructions offered in the notebook, we developed a training video for teachers, which was available for viewing at the teacher's convenience over the Internet, with a detailed account of notebook contents and the steps needed to complete the notebook.

1Participant teachers received $400 for collecting two notebooks. Students who returned signed consent forms were entered into a raffle for an iPod nano. Raters received an honorarium of $1,000 for attending training sessions and rating 28 notebooks over 1 week's time.

5The eight topics in the eighth-grade California Science Standards are motion, forces, structure of matter, solar system, chemical reactions, organic chemistry, periodic table, and density and buoyancy.

6Preliminary analyses looking at sets of notebooks covering the same science topics did not show a consistent pattern that could suggest that the scores are consistently higher or lower for some of the topics.

7A true fully crossed rating design with all raters (n r = 11) scoring all notebooks (n n = 84) was unfeasible due to time and resource constraints (each rater would have had to rate notebooks for three weeks). We used the method proposed by CitationChiu and Wolfe (2002) to subdivide the design into three independent fully crossed segments (i.e., three groups of raters scored both notebooks for a different sample of 14 teachers) combining the results to obtain a single parameter estimate for the whole sample. Two remaining raters were assigned to teachers across blocks using a Modified Balanced Incomplete Blocks to compare the results with a sparse data matrix to those of Chiu and Wolfe's subdivision method. The results of this comparison are not presented here.

8The formula for relative reliability (ρ) resembles (1) and (2) but omits the terms for raters and notebooks (σ2r, σ2n) from the denominator, as absolute error (e.g., variation in rater stringency) would not affect rank orders in a crossed design.

9Descriptive analyses of the scores revealed systematic scoring patterns for some raters. One rater in particular (H) showed erratic rating behavior, consistently scoring much higher or lower than others for a particular notebook; this rater was removed from the data set for all subsequent analyses.

aVariance components are averages computed over three fully crossed segments (CitationChiu & Wolfe, 2002).

bVariance components accounting for less than 5% of the variance not shown for ease of interpretation.

aVariance components shown are averages over three fully crossed segments (CitationChiu & Wolfe, 2002).

bVariance components accounting for less than 5% of the variance not shown for ease of interpretation.

*Correlation is statistically significant at p < .10.

*Correlation is statistically significant at p < .10.

a n = 42.

b n = 310.

aVariance components shown are averages computed over eight topic segments (CitationChiu & Wolfe, 2002). n = 40. Average of separate designs estimated for each topic.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 290.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.