Formative assessment: a critical review: Assessment in Education: Principles, Policy & Practice: Vol 18 , No 1

Abstract

This paper covers six interrelated issues in formative assessment (aka, ‘assessment for learning’). The issues concern the definition of formative assessment, the claims commonly made for its effectiveness, the limited attention given to domain considerations in its conceptualisation, the under‐representation of measurement principles in that conceptualisation, the teacher‐support demands formative assessment entails, and the impact of the larger educational system. The paper concludes that the term, ‘formative assessment’, does not yet represent a well‐defined set of artefacts or practices. Although research suggests that the general practices associated with formative assessment can facilitate learning, existing definitions admit such a wide variety of implementations that effects should be expected to vary widely from one implementation and student population to the next. In addition, the magnitude of commonly made quantitative claims for effectiveness is suspect, deriving from untraceable, flawed, dated, or unpublished sources. To realise maximum benefit from formative assessment, new development should focus on conceptualising well‐specified approaches built around process and methodology rooted within specific content domains. Those conceptualisations should incorporate fundamental measurement principles that encourage teachers and students to recognise the inferential nature of assessment. The conceptualisations should also allow for the substantial time and professional support needed if the vast majority of teachers are to become proficient users of formative assessment. Finally, for greatest benefit, formative approaches should be conceptualised as part of a comprehensive system in which all components work together to facilitate learning.

Keywords:

Acknowledgements

I am grateful to Steve Chappuis, Joe Ciofalo, Terry Egan, Dan Eignor, Drew Gitomer, Steve Lazer, Christy Lyon, Yasuyo Sawaki, Cindy Tocci, Caroline Wylie, and two anonymous reviewers for their helpful comments on earlier drafts of this paper or the presentation upon which the paper was based; to Brent Bridgeman, Shelby Haberman, and Don Powers for their critique of selected effectiveness studies; to Dylan Wiliam, Jim Popham and Rick Stiggins for their willingness to consider differing points of view; and to Caroline Gipps for suggesting (however unintentionally) the need for a paper such as this one.

Notes

1. Influential members of the group have included Paul Black, Patricia Broadfoot, Caroline Gipps, Wynne Harlen, Gordon Stobart, and Dylan Wiliam. See http://www.assessment-reform-group.org/ for more information on the Assessment Reform Group.

2. How does formative assessment differ from diagnostic assessment? Wiliam and Thompson (Citation2008, 62) consider an assessment to be diagnostic when it provides information about what is going amiss and formative when it provides guidance about what action to take. They note that not all diagnoses are instructionally actionable. Black (Citation1998, 26) offers a somewhat different view, stating that: ‘… diagnostic assessment is an expert and detailed enquiry into underlying difficulties, and can lead to a radical re‐appraisal of a pupil's needs, whereas formative assessment is more superficial in assessing problems with particular classwork, and can lead to short‐term and local changes in the learning work of a pupil’.

3. Expected growth was calculated from the norms of the Metropolitan Achievement Test Eighth Edition (Harcourt Educational Measurement Citation2002), the Iowa Tests of Basic Skills Complete Battery (Hoover, Dunbar, and Frisbie Citation2001), and the Stanford Achievement Test Series Tenth Edition (Pearson Citation2004).

4. Stiggins is reported to no longer stand by the claims quoted here (S. Chappuis, April 6, Citation2009, personal communication). I have included them because they are published ones still frequently taken by others as fact. See Kahl (Citation2007) for an example.

5. Cohen (Citation1988, 25–7) considers effects of .2 to be small, .5 to be medium, and .8 to be large.

6. It is possible that these values represent Black and Wiliam's retrospective extraction from the Citation1998 review of the range of mean effects found across multiple meta‐analytical studies done by other investigators on different topics (i.e., the mean effect found in a meta‐analysis on one topic was .4 and the mean effect found in a meta‐analysis on a second topic was .7). If so, the range of observed effects across individual studies would, in fact, be wider than the oft‐quoted .4 to .7 range of effects, as each meta‐analytic mean itself represents a distribution of study effects. But more fundamentally, the construction of any such range would seem specious according to Black and Wiliam's (Citation1998c) very own critique – i.e., ‘… the underlying differences between the studies are such that any amalgamations of their results would have little meaning’ (53).

7. A partial list of concerns includes confusing association with causation in the interpretation of results, ignoring in the interpretation the finding that results could be explained by (irrelevant) method factors, seemingly computing effect sizes before coding the same studies for the extent of use of formative assessment (introducing the possibility of bias in coding), giving no information on the reliability of the coding, and including many dated studies (57 of the 86 included articles were 30 or more years old) without considering publication date as a moderator variable.

8. The replicability of inferences and adjustments may be challenging to evaluate. It would be easiest to assess in team‐teaching situations in which both teachers might be expected to have a shared understanding of their classroom context and students. Outside of team contexts, replicability might be evaluated through video recording of teachers' formative assessment practice; annotation of the recording by those teachers to indicate their inferences, adjustments, and associated rationales; and review of the recordings and annotations by expert teachers for reasonableness.

9. Kane (Citation2006, 23) uses ‘interpretive argument’ to refer to claims and ‘validity argument’ to refer to the backing. For simplicity, I've used ‘validity argument’ to refer to both claims and backing.

10. One could certainly conceptualise the relationship between the validity and efficacy arguments the other way around; that is, with the efficacy argument being part of a broader validity argument, a formulation that would be consistent with Kane's (Citation2006, 53–6) views. Regardless of which argument is considered to be overarching, there is no disagreement on the essential point: both arguments are needed.

11. As suggested, there are other possible underlying causes for student error, some of which may be cognitive and others of which may be affective (e.g., not trying one's hardest to respond). Black and Wiliam (Citation2009, 17) suggest a variety of cognitive causes, including misinterpretation of language, question purpose or context, or the requirements of the task itself. Affective causes may be situational ones related, for instance, to the type of feedback associated with a particular task or teacher, or such causes may be more deeply rooted, as when a student's history of academic failure dampens motivation to respond even when he or she possesses the requisite knowledge. Boekaerts (as cited in Boekaerts and Corno Citation2005, 202–3) offers a model to explain how students attempt to balance achievement goals and emotional well‐being in classroom situations.

Formative assessment: a critical review

Log in via your institution

Log in to Taylor & Francis Online

Restore content access

Related Research

Information for

Open access

Opportunities

Help and information

Formative assessment: a critical review

Abstract

Acknowledgements

Notes

Log in via your institution

Log in to Taylor & Francis Online

Log in to Taylor & Francis Online

Restore content access

Related Research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature