2,280
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Applying Sadler’s principles in holistic assessment design: a retrospective account

ORCID Icon & ORCID Icon
Received 17 Feb 2023, Accepted 23 Jun 2023, Published online: 10 Aug 2023

ABSTRACT

Holistic assessment is an evaluative approach in which assessors work backwards from an overall appraisal of work to determine the criteria relevant to individual student responses. One of the strongest proponents of this approach in higher education is Royce Sadler, whose theoretical contributions over recent decades provide a strong conceptual rationale for holistic assessment. While Sadler’s contributions are well renowned, few explicit accounts are available of Sadlerian theory in practice. The purpose of this reflective paper, as such, is to explain how we have attempted to synthesise and apply a Sadlerian theory of holistic assessment design in one of our own courses. Our intended contribution is to provide a relatively concrete interpretation of holistic assessment design which may serve as a point of reference for assessment design in higher education coursework.

Introduction

This paper is a retrospective account of the authors’ collaborative efforts in designing a holistic assessment task according to the theoretical propositions put forward by Sadler over the past two decades (see Sadler Citation2005, Citation2007, Citation2009a, Citation2009b, Citation2010, Citation2011, Citation2013, Citation2014, Citation2015, Citation2016, Citation2020). While Sadler’s critiques and propositions relating to assessment in higher education have been extremely influential from a conceptual standpoint, they remain difficult to deploy in practice, for the reason that they reject the practice of formulating predetermined, fine-grained codifications of criteria and standards which remains the norm in higher education (see Panadero and Jonsson Citation2020; Sadler Citation2014). In light of the course redesign necessitated by the global pandemic in 2020, the authors of this paper collaborated in developing a holistic assessment task for use in the second author’s English for Academic Purposes (EAP) course at a Canadian university. Our purpose in this paper is to reflect on our own experiences of developing a holistic task from a Sadlerian standpoint, such that it may serve as a resource for others in higher education who wish to explore similar possibilities.

In the first part of this paper, we identify aspects of Sadler’s theoretical oeuvre which have underpinned our approach – our aim here has been to synthesise an accessible impression of Sadler’s key propositions as borne out by our own experience and reading of Sadler’s work. Following this, we introduce the more specific context for which our holistic assessment task was designed, as well as the specific features of the task itself. The latter part of this paper reflects on the apparent affordances of this design, while also identifying challenges inherent in its application.

A Sadlerian view of assessment

A general trend in assessment scholars’ views over recent decades has been the shift from a measurement-oriented view of assessment towards an interpretation of assessment as an opportunity for students to undertake authentic, complex tasks (Boud et al. Citation2018). Such tasks tend to invite responses from students which do not (and should not) look exactly alike; a characteristic which Sadler (Citation2009b) describes in terms of divergence, highlighting that there are diverse paths students may take in the production of responses to these tasks. It is worth noting that this label does not apply to all tasks encountered in higher education. Some tasks involve entirely convergent responses, in which there is no latitude at for variety within students’ responses to a given task (Walton Citation2020). One example of this type of task would be the multiple-choice exam, which requires students to make a definite selection from a predetermined set of options. Tasks of this nature are not the subject of this paper and fall outside the scope of discussion.

For the assessment of divergent work, Sadler has proposed holistic assessment as a strategy for accommodating divergence while retaining trustworthiness, with respect to both the learning and certification purposes of assessment (Sadler Citation2009a, Citation2009b). Though influential, Sadler’s conception of holistic assessment remains challenging to implement, because it challenges predominating values and structures of higher education which value a segmental approach to assessment design (Panadero and Jonsson Citation2020; Sadler Citation2014). Prior to defining these concepts more explicitly, it is contextually important to note that, in higher education, formal assessments of students’ work are generally expected to be criterion-referenced and undertaken with respect to some kind of ostensibly fixed standards framework. These two concepts – criteria and standards – are frequently used in ways that conflate their meanings (Sadler Citation1987), and indeed, some scholars do not separate the two (e.g. Ajjawi, Bearman, and Boud Citation2021). For the purpose of this discussion, however, their distinction usefully separates qualitative reference points (criteria) from quantitative degrees of quality (standards). Sadler (Citation1987) elaborates these concepts as follows:

criterion A distinguishing property or characteristic of any thing, by which its quality can be judged or estimated, or by which a decision or classification may be made. (From Greek kriterion, a means for judging).

standard A definite level of excellence or attainment, or a definite degree of any quality viewed as a prescribed object of endeavour or as the recognised measure of what is adequate for some purpose, so established by authority, custom, or consensus. (From Roman estendre, to extend). (194, italics original)

The reason for beginning with this distinction is that the basic premise underlying Sadler's conception of holistic assessment involves looking at students’ work as a whole to evaluate the overall standard of achievement, and then working backwards to distinguish and comment on the criteria that are relevant to the assertion made about the quality of the work by the assessor (Sadler Citation2009a, Citation2009b, Citation2015). This is distinct from an analytic approach (Sadler’s term, see Sadler Citation2009a, Citation2009b) in that the latter involves the specification of criteria for use in an assessment event in advance of that event – what Sadler (Citation2011) refers to as preset criteria. A key limitation of preset criteria, in this respect, is that such criteria (alongside the weightings commonly applied to them) cannot apply in the same way to each individual work, precisely because the works are themselves divergent from one another. Put another way, the exact mix of criteria relevant to each assessment of students’ responses will differ, and relevant expressions of criteria in a holistic approach may therefore emerge during the assessment event in ways that are not fully predictable by the assessor (Sadler Citation2009b, Citation2014, Citation2015). This issue likewise manifests in the selection of preset criteria for artefacts such as rubrics, since the formulation of these criteria involve selecting some material from a greater pool of possibilities (Sadler Citation2009b).

An alternative strategy is, therefore, necessary to allow assessors to more fully acknowledge actual bases for students’ achievement. For holistic assessment, the functional roles accorded standards and criteria relate in an important way to the ontological positions by which we understand them because they directly affect claims about the trustworthiness of our assessment practices. Both standards and criteria are understood here as social and intersubjectively constituted, and therefore abstract (see Sadler Citation2014). This is the principal reason why preset criteria cannot, by definition, be guaranteed to encapsulate the totality of relevant criteria, and the proportions in which those criteria are applicable.

Allowing criteria to occur inductively to an assessor within a holistic assessment approach does not mean students should be surprised by these criteria – this would be patently unfair to students. In lieu of preset criteria, students’ evaluative capabilities must be developed, in the sense that they learn to distinguish what is valued in advance of the assessment (and therefore are unsurprised by the criteria invoked by assessors, see Sadler Citation2015, Citation2016). Accordingly, students need to be provided with opportunities throughout their coursework to develop their sense of what constitutes success and productivity, so that when they are presented with complex or divergent tasks they are capable of making judgements about quality (Sadler Citation1989; see also Tai et al. Citation2018). Disciplinary criteria, it follows, are not the exclusive purview of artefacts such as rubrics, attached to single, isolated assessment tasks, but should rather be evoked by the whole of a course, including content instruction, activities, prior formative and summative assessments, and the assessment instructions themselves. This approach breaks down the conceptual division so often seen in assessment design, between formal assessment events and the ‘usual business’ of learning and teaching that students are engaged in outside of these.

In the absence of preset criteria, on what grounds may we be assured students’ work is actually assessed according to an appropriate and consistent standard? The answer is that the assessor themself is the source of trustworthiness, in that they should know what they are doing and be capable of appropriately exercising expert knowledge in relation to the work being assessed. This is particularly relevant in the current environment of farming out of marking to casual assessors, and of filling assessment quotas with additional tasks for the sake of policy strictures (e.g. inserting an additional task into a course due to a requirement for at least three assessment items). The holistic assessor’s validity is developed through education and experience in negotiation with other domain experts, which Sadler (Citation2013) calls calibration. For this reason,

[a]s Scriven (1972) pointed out, the quality of a judgement made by a single assessor is not automatically suspect and deserving of dismissal merely because it has been made without collaboration and without the help of instrumentation. Academics as professionals who consistently arrive at sound judgements are effectively ‘calibrated’ against their competent peers and also, in professional contexts, against any relevant socially constructed external norms. (Sadler Citation2013, 14)

The focus of calibration negotiations is on the criteria and standards relevant to exemplar artefacts from within the domain of expertise. Sadler (Citation2013) suggests that calibration can be formally achieved through implementation of periodic exercises, similar in a sense to the common practice of moderation, but broader in scope, not being restricted to individual assessment events. It follows that the logistical implications of achieving calibration will vary, depending on the size of a given course. While it is beyond the scope of this paper to provide a solution to this issue, we note that achieving trustworthiness of individual calibrated assessors’ independent judgments is an important step, since this is a key mechanism to reduce moderation loads. We suggest that a pragmatic approach would involve establishing an initial consistency amongst a group of assessors which may then be maintained through iterative calibration exercises, including the checking of a sample of assessments in a more traditional moderation fashion.

Applying Sadlerian principles of holistic assessment in task design

Having reviewed the overarching premise of Sadlerian holistic assessment design, we now turn to an explanation of a holistic assessment task we designed for use in an EAP course in Canada within a programme for first-year international Science students. The premise of this course is to support students’ language development with a view to enhancing their experiences of their concurrent science coursework, and especially their capacity to succeed in the assessment items associated with this coursework.

Context of the course

The course is distinctive in that it is informed by systemic functional linguistics (Halliday and Matthiessen Citation2013) and genre-based pedagogy (J.R. Martin and Rose Citation2007, Citation2008; Dreyfus et al. Citation2015), and specifically aims to support students use of English in physics, chemistry and maths (calculus). The course description states:

[The course] provides students with specific practical strategies for understanding and using the language of science that emerges in the science stream courses. The overarching goal of the [course] tutorials is to help students direct their own engagement in the sciences using new knowledge of the language of science. The basis of teaching and learning in this course is the strong link between the ways language is used in science, and the ways knowledge is built up in science.

As part of the goal of supporting students’ scientific language development, one of the needs identified was to help students deconstruct the technicality and complexity of language in their textbooks. During the COVID-19 pandemic, previously established assessment approaches within the course became infeasible. While the programme is typically for international students who have travelled to Canada to study, travel restrictions meant that the majority of students remained in their home countries, making synchronous online interactions particularly difficult to manage in view of problematic time differences. This situation provided impetus for the design of a novel form of assessment that addressed these challenges while continuing to support students’ learning and offer the instructor an opportunity for useful evaluation of their capabilities.

The assessment served two purposes within its general remit to support students’ scientific language development for their concurrent courses. Firstly, the assessment evaluated students’ individual development of language knowledge, in this case through systemic functional grammar, and secondly, it assessed their ability to apply that knowledge within their science courses, in this case reading their textbooks. Therefore, the assessment involved a prior grammatical analysis undertaken by the students but focused the evaluation on the conversation with the instructor, where students explained the impact of the language choices and how they related to the topic and purpose of the texts. In conjunction with these purposes, the design of the assessment was also informed by several additional contextual factors. Notably, it provided a synchronous experience following what was an asynchronous term of instruction, enabling individual spoken contact between instructor and student, scaffolded in the form of a conversation-type task. It also aimed to be authentic to students’ needs at the time, insofar as they did not at this point in the course need to write about grammatical analysis, but could productively use it to read their textbooks and discuss with their colleagues. It related as well to activities students had been involved in throughout the term, including small group discussions focused on how content from the EAP course arose in the science courses they were undertaking at the same time. It further provided students with an individualised assessment experience, in contrast to a prior series of group tasks, offering the instructor an opportunity to engage with students in a personalised manner, and to identify students who might be struggling in their studies. Finally, it mitigated the logistical challenges of assessing 65 students across three sections in a meaningful way.

In designing this assessment, the instructor initially attempted to produce a rubric for use in grading the students’ responses, drawing on both linguistic and sociological frameworks for understanding language and knowledge (see Monbec et al. Citation2021, for an example of this), but struggled to account for variations in what students at different ability levels would achieve. Notably, the instructor found that initial designs did not sufficiently provide space to reward students for providing sophisticated interpretations while not overly penalising students for providing acceptable descriptions. Since students were able to formulate divergent responses to the task, an alternative strategy was required that would provide adequate flexibility in this respect. This led to a sustained discussion between the authors of this paper about holistic assessment as an alternative to the use of predetermined criteria. This discussion – and in particular, the principles outlined in the following section – informed the fuller design of the assessment, which was implemented in 2020 and repeated in 2021 and 2022.

Designing the assessment

The impetus for new assessment design was influenced by the need to address a number of socio-pedagogical factors, some of which were particular to Author 2’s context of 2020 emergency remote teaching, and others to the remit of the EAP course in supporting international students in their science courses. Although the design described here is therefore context-specific, part of our aim in pursuing this specificity is to distinguish aspects that could be implemented into or inspire similar approaches in other contexts and courses. At a general level, the synchronous nature of the assessment provided an intentional change of pace and an opportunity to engage with students in one-on-one interactions in ways that had largely been unfeasible through the asynchronous delivery of the course thus far. The task was constructed as a spoken activity since students did not need to write about language or grammar at this point in their coursework (as in Shoecraft, J.L. Martin, and Perris Citation2022), but rather, would benefit from talking about the language used in their science textbooks, which was a skill that had been addressed in weekly unsupervised small group discussions throughout the term. In this, the assessment specifically addressed two of the overall course learning outcomes:

  1. identify and employ strategies to pack and unpack information, concepts, and/or arguments

  2. utilise critical thinking skills (in listening, speaking, reading, and writing) to engage productively with ideas and practices that emerge in the science courses.

The assessment took the form of one-on-one discussions for five to seven minutes between the instructor and the student over Zoom. The discussion was focused on the student’s analysis of one or more language features that had been taught in the course in one or more of three textbook excerpts. The excerpts were nominated by the instructors of each of the science courses as useful to review in preparation for the exams the following week, and vetted by the EAP instructor to ensure the analysis was not inordinately difficult yet offered a range of complexity so that students could demonstrate a range of analytical ability and interpretive insight. The three texts were:

  • Physics: chapter introduction on rotational motion (Ling, Sanny, and Moebs Citation2018)

  • Chemistry: the structure of an organic molecule used in the synthesis of rubber (Stewart et al. Citation2020)

  • Maths: application of derivatives related to velocity and acceleration (Feldman, Rechnitzer, and Yeager Citation2016Citation2021)

Students were expected to prepare their analysis ahead of time and could display their analysis through screen share. Based on the analysis they showed and their initial responses to general questions, the instructor then co-constructed a dialogue to help students reveal their analytical and interpretive understanding. The discussions were recorded to allow for subsequent review, enabling the instructor to engage fully in the discussion. Students thus interrogated texts that they were motivated to read for subsequent exams, using language knowledge to deconstruct those texts and connect the use of language to how scientific knowledge was represented in the text.

Example 1 below shows the excerpt from the physics textbook, constituted by two paragraphs from the introduction of a chapter on rotational motion.

Example 1. Physics text excerpt (Ling, Sanny, and Moebs Citation2018, 471).

Chapter 10: Introduction

In previous chapters, we described motion (kinematics) and how to change motion (dynamics), and we defined important concepts such as energy for objects that can be considered as point masses. Point masses, by definition, have no shape and so can only undergo translational motion. However, we know from everyday life that rotational motion is also very important and that many objects that move have both translation and rotation. The wind turbines in our chapter opening image are a prime example of how rotational motion impacts our daily lives, as the market for clean energy sources continues to grow.

We begin to address rotational motion in this chapter, starting with fixed-axis rotation. Fixed-axis rotation describes the rotation around a fixed axis of a rigid body; that is, an object that does not deform as it moves. We will show how to apply all the ideas we’ve developed up to this point about translational motion to an object rotating around a fixed axis. In the next chapter, we extend these ideas to more complex rotational motion, including objects that both rotate and translate, and objects that do not have a fixed rotational axis.

While introducing the technical concept of rotational motion, the text is written in a relatively conversational style, mimicking perhaps a friendly teacher voice, reminding students of what they had already studied and connecting that knowledge to the new topic. Although written to be accessible to novice learners, the conversational style could be a challenge to English as an Additional Language users who may be more familiar with and educated in more formal written styles (see Tian Citation2013; To Citation2017). The majority of students chose to analyse the physics text, which was perceived by them to be easier, and some few compared the physics text with either the chemistry or the maths text.

Students were required to analyse for at least one language feature discussed during the term, drawing on concepts from systemic functional linguistics. These included finiteness of verbs, clause structure, logical relationships, and circumstances. Sample analysis answers, shown below (see Example 2), were provided to students along with their grades after the assessment, analysed for the language features students could have chosen to focus on.

Example 2. Physics text excerpt annotated (Ling, Sanny, and Moebs Citation2018, 471).Footnote1

KEY

Chapter 10: Introduction

In previous chapters Location: place, we described F motion (kinematics) and [[how to change NF motion (dynamics)]], || and+ addition we defined F important concepts such as energy [forobjects [[that can be considered MPF as point masses Role: guise]] ]Cause: behalf. || Point masses, by definition Manner: means, have F no shape || and so x cause can

<only > undergo MF translational motion. || However+ contrast, we know F from everyday life Location: place / Manner: means || thatmental rotational motion is F also very important || and that +’ addition mental many objects [[that move F]] have F both translation and rotation. || The wind turbines in our chapter opening image Location: place are F a prime example of [[how rotational motion impacts F our daily lives]], || as x time the market [for clean energy sources]Cause: behalf continues to grow F.

We begin to address F rotational motion in this chapter Location: place, || starting with NF fixed-axis rotation. || Fixed-axis rotation describes F the rotation [around a fixed axis of a rigid body] Location: place; || that is = reformulating, an object [[that does not deform F || as x time it moves F]]. || We will show [[how to apply NF all the ideas [[we’ve developed F up to this point Location: time/place]] about translational motion Matter to an object [[rotating NF around a fixed axis Location: place]] ]]. || In the next chapter Location: place, we extend F these ideas to more complex rotational motion, including objects [[that both rotate F || and+ addition translate F]], and objects [[that do not have F a fixed rotational axis]]. ||

The conversational format was chosen to enable the instructor to scaffold students to demonstrate a deeper or more nuanced understanding than a monologic presentation from the students might provide. The assessment instructions given to students (see Appendix A) provided a rationale for the task, description of available topics, and guidance on how the discussion might unfold, drawing on related concepts and structures from another EAP course that the students were taking concurrently.

Before the assessment, the instructor provided models of analysis reflecting what was expected of students, and aligned the assessment to similar learning activities throughout the term. During the assessment conversations, which were recorded through video call software, the instructor tracked noteworthy features of students’ performances that related to their achievement, along with initial grades. After the conversations were completed, the instructor summarised those criteria that emerged as most relevant to the determination of students’ grades as a cohort – a simple analysis was produced to indicate how different criteria tended to be associated with different levels of achievement (see Example 3). Each conversation video recording was then reviewed by the instructor, and individual students’ conversation points documented. These were triangulated with the cohort criteria to finalise the grade. It is worth noting that it would be impossible for students to achieve all the criteria at any of the bands, yet is relatively easy to allocate a grade based on the specific combination and range of criteria; for example, students with mostly signs of excellence received As, while those with no signs of excellence were more likely to receive Cs or Ds (where A is the highest possible grade and F the lowest). Students received their individual conversation summary as well as the instructor’s summary of emergent evaluative criteria (see Example 3) to help explain how their individual grades were determined.

Example 3: Summary of Evaluative Criteria.

Potential signs of excellence
  • Connecting language feature to text book purpose, pedagogy, science

  • Accurate identification: form, type

  • Working with multiple language features which complement each other

  • Comparing two texts

  • Importance of language feature for reading/science/writing/specific topic

  • Using an example without the language feature in it to show the importance of the language feature

  • Triangulating observations with register (field, tenor, mode)

  • Functional mistakes: ones that still help make meanings without obscuring or confusing

  • Focusing on a difficult example and finding a plausible explanation

  • Not starting at the beginning of the text

  • Identifying an interesting example to talk about

  • Personal connection to language feature (I like it because … /It helps me to do X …)

  • Conducting analysis of multiple language features or of multiple texts to choose which to talk about

Potential signs of competency
  • Defining language feature and explaining how to identify

  • Accurate identification of basic forms and types

  • Reading script

  • Choosing language feature or text because it’s easy

  • Irrelevant everyday example of language feature

  • Working with multiple language features superficially

  • Starting at the beginning of the text and working through the first few sentences only

Potential signs of failure
  • Incorrect analysis which actively confuses or obscures accurate interpretations

  • Unable to give specific examples

  • Unable to answer questions that haven’t been specifically prepared for

It is worth highlighting that in the second iteration of this assessment in 2021 the evaluative criteria were similar but different (see Appendix B), and the instructor’s summary included a revision of the standards to four bands (excellent, good, acceptable, problematic) instead of three. For example, ‘reading a script’ was no longer included as a criterion as students were more prepared for discussion from in-class conversations and did not use scripts. This reflects a strength of holistic assessment insofar as it does not over-privilege students’ responses on previous iterations of an assessment task as the basis for distinguishing relevant criteria – rather, it provides for the nuanced differences that may arise in students’ responses as a function of correspondingly nuanced changes in their coursework. While similarities would seem likely to appear in the bodies of criteria that emerge as most relevant year-to-year, resources such as the summary of criteria listed above should remain resources for calibration rather than becoming formally required or expected criteria – adopting such criteria as mandatory would effect a transition from a holistic assessment design into an analytic one. The fact that these two iterations of the assessment span emergency remote teaching in 2020–2021 and the return to in-person instruction in 2021–2022 accentuates this difference between pedagogical experiences. At the same time, this difference between realisations of a course pedagogy can still be a significant factor in differences in assessment even when instruction is in the same mode across iterations. What students do in response to an assessment task will invariably change over time in relation to broader changes in the world around them. While variations in students’ responses to tasks may vary with respect to contextual influences (e.g. changes to pedagogy due to the pandemic, adjustments to the assessment task description, and so forth), the amount of expected variation is not related to the question of assessment strategy – the key issue is whether the task presented to the student is divergent or not. Variation in students’ responses to tasks from year to year does, however, have implications for the amount of effort required to achieve calibration.

Insights and implications

This section reviews the whole-course and contextual factors that we have retrospectively determined to contribute to the successful deployment of the holistic assessment, where we consider success in terms of its capacity to function cohesively with the broader context of the course. Author 2 noted that students’ achievement on this task appeared commensurate with their standards of achievement in other courses and on subsequent tasks (i.e. there were no radical outliers), and also that it was reasonably achievable for both students and instructor without requiring additional resources of time or expertise. A notable source of efficiency, in this respect, was the inclusion of the assessment activities within scheduled class time, meaning that the only burden on the assessor’s time was review of the relatively short recordings as a confirmation of the original grades assigned to the students’ performances. Compared to assessment designs which require individual after-the-fact marking of submitted works (e.g. assignments), this represents a considerable saving of time. Specifically, Author 2 observed this process to takes equal or less time (approximately 10–15 min) than marking an equivalent written assignment with a rubric in a concurrent course (approximately 20–30 min). While further research may be required to see if this economy holds true in other contexts, it provides a promising indication of potential. Additionally, no students within the assessed cohorts asked for extensions or sought additional coaching in relation to the assessment. It is worth noting, however, that the time saved may be attributed in part to earlier preparation and course design. Indeed, in order to integrate holistic assessment in practice, we argue that it is necessary to consider the whole course in designing assessments that speak to pedagogy and pedagogy that speaks to assessment. While we agree that holistic assessment can require a potentially challenging shift of perspective towards assessment practices, this study highlights that a holistic perspective of pedagogy and assessment is more feasible than perhaps anticipated, as it aligns with other fundamentals of effective course design. Indeed, any of the strategies described could easily and productively be deployed in other contexts, whether or not holistic assessment is implemented. The onus is especially on assessors not to design tasks which assess things the students haven’t been taught, or for which prior learning cannot reasonably account. This, in fact, is the crux of academic language teaching: it asserts that students should be taught how to complete assignments they are asked to write, and should not be left to guess or luck into meeting assessor expectations.

As noted earlier in this paper, a key factor in the success of holistic assessment design is that students’ evaluative skills are developed through the course before the assessment, such that they understand what they need to do to succeed (Sadler Citation2015). The course incorporated a number of elements which were intended to scaffold students’ success. Firstly, students were holding similar discussions in small groups throughout the term, either synchronously through live video calls or asynchronously through file share and chat. In this way, they developed a sense of both conducting analysis and using it to productively generate insight into science. Secondly, when introducing the task, a variety of resources were provided. Both a model and a practice analysis text with answers were provided to the students from textbook excerpts also suggested by the science instructors, and a recording was made of a real time analysis of the model text with commentary by the instructor. In the video, the instructor also discussed how various elements of the analysis may be talked about in the assessment itself. Therefore, although the grammatical analysis was only one component of the assessment, a lot of scaffolding was provided to ensure students were supported to conduct their own analysis, and had a strong basis for identifying valuable elements for discussion. Lastly, by purposefully prescribing a conversation with rather than a presentation by the students, the instructor was in a position to scaffold the speech where students struggled with expression. Individually, none of these design choices – joint construction through small groups, modelling and practice examples, and scaffolding during assessment – is particularly ground-breaking nor were they originally pursued with the intention of facilitating holistic assessment. Nevertheless, they contributed to the success of holistic assessment since they align with the principles discussed earlier in this paper. Additionally, such approaches are not contextually bound by the circumstances of the COVID-19 pandemic response, and were maintained with the return to in-person teaching the following year (2021). These approaches demonstrate practical features of applying holistic assessment.

An important basis for legitimacy of assessors’ evaluations in this holistic model is the degree to which the assessors involved are calibrated to the local standards of the context. While Sadler (Citation2013) provides an aspirational model for a formalised calibration process involving a regular, thorough process of exemplar comparisons by assessors, the reality is that assessors may be operating as individuals without co-assessor colleagues to provide a direct calibration experience, or may simply lack local support for such efforts. It is important, therefore, to consider ways in which assessors’ broader professional activities support their capacity to interpret criteria and standards appropriately. Here, the assessor in the course was an expert in the linguistic content being taught. While it might seem obvious that discipline experts should reasonably be expected to assess within their disciplinary remit, the reality is that this is not as often the case as might be desired (consider again, for instance, the common practice of casualising assignment grading). This is especially true in liminal disciplines such as EAP, where the field knowledge base of EAP itself remains contested (Campion Citation2016; Ding and Bruce Citation2017) or where instructors may lack disciplinary (as contrasted with linguistic) content knowledge (e.g. physics). The legitimacy of a holistic design relies, in our view, on the assessor’s possession of disciplinary expertise; in this case, an intimate knowledge of systemic functional linguistics. In this instance, Author 2 has (a) formal education in systemic functional grammar, (b) research and teaching experience relating to writing through examination of grammar, and (c) two years of prior experience teaching within the EAP programme discussed in this paper, with frequent opportunities for discussion with other Academic English instructors and with the science content instructors her course supported. Furthermore, Author 2 was embedded within a disciplinary cluster of other language academics with similar skill sets, and was thereby provided with constant opportunities to engage in disciplinary discourse, which in turn provides a basis for calibration within her local context. Finally, the instructor’s derivation of evaluative criteria from the assessment process provided a resource for internal moderation across the cohort. In our view, instructors seeking to implement holistic assessment would productively engage in regular discussions with colleagues as a means of calibrating their own evaluative practices to the greater contexts of their institution, field, or discipline.

Although we find ourselves optimistic about the potentials of holistic assessment, we are conscious that other instructors may be put off by the amount of perceived time and energy required to implement it. While we have noted potential practical affordances in terms of the time saving mentioned above, it is difficult to account for the amount of time and energy required to effect a change in practice from psychological, social, and cultural standpoints, since these factors will invariably be context-dependent. It is worth noting that in the case of this project, both authors were intrinsically motivated to collaborate on a proactive change to the pre-existing assessment design, and this paper is itself evidence for the degree of development in personal knowledge and perspective that may be required to effect such a change. Where the appropriate motivation does not exist, it may be difficult to replicate this process. Such challenges notwithstanding, Author 2 noted several positive outcomes worth mentioning here. Firstly, the absence of predetermined criteria allowed her to fully engage in the assessment discussions, providing an opportunity to experience a range of responses and internalise emergent standards of achievement. This reduced pressure to generate an on-the-spot final evaluation of students’ achievement, which proved especially useful during the initial assessments as Author 2 developed a feel for the range of differences in students’ responses to the task, or when many students’ assessment bookings were back-to-back and her evaluative capacity was reduced in the moment. Secondly, it provided an efficient and effective method of marking: for each 5-minute recording, the marking could generally be completed within 10 min. Specifically, this involved taking notes of exactly what the students talked about while watching the recording of the conversation, and then triangulating against the evaluative criteria list to finalise the mark. Positioning part of the assessment process after the synchronous assessment event (through the use of recordings) provided an opportunity for evaluation to take place separate to the logistical challenges and stresses of the assessment event. While simultaneously valuing the reality of the lived experience, overall the instructor finds it challenging to be both participant/audience and audience/assessor within a language event. Thirdly, the results of the assessment anecdotally seemed to be suggestive of their subsequent performance in term two: students who performed poorly on this task – received a grade of 60% or less – were the ones who subsequently performed poorly at the end of the course. This correlation provides the opportunity to prompt earlier intervention to students at risk of receiving a poor mark, especially since the programme requires students to achieve 60% in both of their Academic English courses to progress to second year and integrate with the main programme. Overall, this is suggestive that the holistic assessment was an effective means of assessment with strong potential for further application beyond the given context.

Conclusion

In our experience, the holistic assessment practice was interesting, effective, and valuable for both the students and instructor involved in the EAP course detailed here. Our hope is that this work will serve as something of a reference point – a resource that can inform practice in other contexts, but which need not be followed exactly, and which can no doubt be productively refined. A general issue for curricular design work in higher education is the challenge of aligning educational theories with the realities of institutional life. Working with theories takes time and effort, and sometimes involves aligning to a particular stance that may be contested. This is especially true for holistic assessment, which challenges enculturated approaches to the specification of criteria and standards. Moving from theory to practice is therefore likely to be complicated, both from the standpoint of understanding theory adequately, as well as satisfying the sociocultural obligations of the institution in question. It is important to note that the assessment design and application of holistic assessment principles described in this paper was able to occur in part thanks to the autonomy afforded the instructor by their institution. At the same time, this institution provides a cohesive environment for its academic workforce, in subscribing to a unified set of theoretical paradigms for the teaching of EAP, meaning that instructors have opportunities for professional calibration that may not be mirrored in other contexts. While the specific challenges of providing EAP coursework through the COVID-19 pandemic response with a remote international cohort were a unique driver for the assessment design, the reduction in time commitments of teaching due to the move to asynchronous instruction provided a unique opportunity to grapple with the assessment process and embrace new possibilities. The resultant holistic assessment was successful enough that the instructor chose to repeat the assessment in 2021 with the return to in-person, on-campus instruction, with only minor changes. Furthermore, it has further influenced the redesign of other assessments within the course to incorporate more holistic principles, particularly for presentations.

One issue with the assessment design was that in the first year, a small number of students were limited in their achievement due to misunderstanding the task. This was likely due at least in part to the asynchronous teaching context which placed an onus on the students to autonomously engage with the scaffolding materials. It is worth noting that in the second iteration of the course the following year, fewer students struggled in this respect, and when they did, they were nonetheless successful in achieving the core purpose of explaining the scientific text through its language use. Overall, students reported finding value in reviewing content in preparation for their science exams, and none contested their marks or assessment design, even though they had adequate opportunity given the two-term duration of the course.

A potential challenge to deploying holistic assessment may be the need for instructors to possess relevant expertise and experience to enact it, including both disciplinary content knowledge and knowledge of the theoretical underpinnings of holistic assessment itself. As argued earlier in this paper, however, the theoretical requirements for trustworthy holistic assessment align closely to good teaching principles and can be applied in part or in combination in other educational contexts. Just as reliable holistic assessment relies on the cultivation of evaluative capacity amongst students, so too should each instructor possess sufficient expertise in what they are teaching their students to do – the students’ responses will likely reflect their teaching. While some may see the absence of predetermined criteria as eliminating a safeguard to trustworthiness, the potential advantage of establishing the personal expertise of assessors as reliable is that it affords greater fidelity in the formulation of appraisals of students’ work, and provides students with latitude to express individuality in their responses. The integrity of the design outlined in this paper is, importantly, contingent on a whole-of-course approach: under a holistic assessment approach, the assessment necessarily shapes and is shaped by everything that comes before it.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 some analytical codings were decided according to the logic of the concepts as presented in the classroom, rather than the full complexity of a grammatical analysis.

References

  • Ajjawi, R., M. Bearman, and D. Boud. 2021. Performing standards: A critical perspective on the contemporary Use of standards in assessment. Teaching in Higher Education 26, no. 5: 728–41. doi:10.1080/13562517.2019.1678579.
  • Boud, D., P. Dawson, M. Bearman, S. Bennett, G. Joughin, and E. Molloy. 2018. Reframing assessment research: through a practice perspective. Studies in Higher Education 43, no. 7: 1107–18. doi:10.1080/03075079.2016.1202913.
  • Campion, G.C. 2016. ‘The learning never ends’: exploring teachers’ views on the transition from general English to EAP. Journal of English for Academic Purposes 23: 59–70. doi:10.1016/j.jeap.2016.06.003.
  • Ding, A., and I. Bruce. 2017. The English for academic purposes practitioner: operating on the edge of academia. Cham: Palgrave Macmillan.
  • Dreyfus, S.J., S. Humphrey, A. Mahboob, and J.R. Martin. 2015. Genre Pedagogy in Higher Education: The SLATE Project. Palgrave Macmillan.
  • Feldman, J., A. Rechnitzer, and E. Yeager. 2016-21. CLP-1 differential calculus. Department of Mathematics, UBC. http://www.math.ubc.ca/~CLP/CLP1/.
  • Halliday, M.A.K., and C.M.I.M. Matthiessen. 2013. Halliday’s introduction to functional grammar (4th ed.). London: Routledge.
  • Ling, S.J., J. Sanny, and W. Moebs. 2018. University physics. Houston: OpenStax.
  • Martin, J.R., and D. Rose. 2007. Working with discourse: meaning beyond the clause,1st edition). London: Bloomsbury.
  • Martin, J.R., and D. Rose. 2008. Genre relations: mapping culture. Equinox Pub.
  • Monbec, L., N. Tilakaratna, M. Brooke, S.T. Lau, Y.S. Chan, and V. Wu. 2021. Designing a rubric for reflection in nursing: A legitimation code theory and systemic functional linguistics-informed framework. Assessment & Evaluation in Higher Education, 1157–1172. doi:10.1080/02602938.2020.1855414.
  • Panadero, E., and A. Jonsson. 2020. A critical review of the arguments against the Use of rubrics. Educational Research Review 30: 100329. doi:10.1016/j.edurev.2020.100329.
  • Sadler, D.R. 1987. Specifying and promulgating achievement standards. Oxford Review of Education 13, no. 2: 191–209.
  • Sadler, D.R. 1989. Formative assessment and the design of instructional systems. Instructional Science 18, no. 2: 119–44.
  • Sadler, D.R. 2005. Interpretations of criteria-based assessment and grading in higher education. Assessment & Evaluation in Higher Education 30, no. 2: 175–94. doi:10.1080/0260293042000264262.
  • Sadler, D.R. 2007. Perils in the meticulous specification of goals and assessment criteria. Assessment in Education: Principles, Policy & Practice 14, no. 3: 387–92. doi:10.1080/09695940701592097.
  • Sadler, D.R. 2009a. Transforming holistic assessment and grading into a vehicle for complex learning. In Assessment, learning and judgement in higher education, ed. Gordon Joughin, 1–19. Dordrecht: Springer Netherlands.
  • Sadler, D.R. 2009b. Indeterminacy in the Use of preset criteria for assessment and grading. Assessment & Evaluation in Higher Education 34, no. 2: 159–79. doi:10.1080/02602930801956059.
  • Sadler, D.R. 2010. Fidelity as a precondition for integrity in grading academic achievement. Assessment & Evaluation in Higher Education 35, no. 6: 727–43. doi:10.1080/02602930902977756.
  • Sadler, D.R. 2011. Academic freedom, achievement standards and professional identity. Quality in Higher Education 17, no. 1: 85–100. doi:10.1080/13538322.2011.554639.
  • Sadler, D.R. 2013. Assuring academic achievement standards: from moderation to calibration. Assessment in Education: Principles, Policy & Practice 20, no. 1: 5–19. doi:10.1080/0969594X.2012.714742.
  • Sadler, D.R. 2014. The futility of attempting to codify academic achievement standards. Higher Education 67, no. 3: 273–88. doi:10.1007/s10734-013-9649-1.
  • Sadler, D. Royce. 2015. “Backwards assessment explanations: implications for teaching and assessment practice.” In Assessment in music education: from policy to practice, edited by Don Lebler, Gemma Carey, and Scott D. Harrison, 16:9–19. Landscapes: The arts, Aesthetics, and education. Cham: Springer International Publishing.
  • Sadler, D.R. 2016. Three In-course assessment reforms to improve higher education learning outcomes. Assessment & Evaluation in Higher Education 41, no. 7: 1081–99. doi:10.1080/02602938.2015.1064858.
  • Sadler, D.R. 2020. Assessment tasks as curriculum statements: A turn to attained outcomes. The Japanese Journal of Curriculum Studies 29: 101–9.
  • Shoecraft, K., J.L. Martin, and G. Perris. 2022. EAP learners as discourse analysts: empowering emergent multilingual students. BC TEAL Journal 7, no. 1: 23–41. doi:10.14288/bctj.v7i1.452.
  • Stewart, J.J., D. Gates, M. Wolf, A. Bertram, L. Burtnick, S. Chong, K. Melzak, et al. 2020. Chemistry 110/111 and 120/121 integrated resource package 2020-2021. UBC Department of Chemistry.
  • Tai, J., R. Ajjawi, D. Boud, P. Dawson, and E. Panadero. 2018. Developing evaluative judgement: enabling students to make decisions about the quality of work. Higher Education 76, no. 3: 467–81. doi:10.1007/s10734-017-0220-3.
  • Tian, X. 2013. Distinguish spoken English from written English: rich feature analysis. English Language Teaching 6, no. 7: 72–78.
  • To, V. 2017. Grammatical intricacy in EFL textbooks. International Journal of English Language Education 5, no. 2: 127–40. doi:10.5296/ijele.v5i2.12087.
  • Walton, J. 2020. “Making the grade: theorising musical performance assessment.” PhD diss. Griffith University.

Appendix A:

Assignment Instructions

Assignment instructions:

You will be given a short text from one of the science courses. In a recorded 5-10-minute synchronous video conversation with [Author 2] and/or the GTAs, you will use a spoken form of a data commentary to identify and describe the use of one of the language features we have discussed in [the course] in that text, and explain how that helps the text achieve its purpose.

Assignment topics:

Short texts will be taken from chemistry, calculus and physics. They will be on topics that you need to know for your final exams in those courses. You will be given one of the texts before the conversation in order to prepare. The time you receive the texts will be announced later in the term.

You can choose to discuss one or more of the following language features:

  • circumstances (weeks 4-5)

  • circumstantiation (week 6)

  • clauses and logical connections (weeks 8-9)

  • finites, non-finites and modal finites (week 2, and week 11)

You can explain the use of language features by connecting to the register variables of field (the topic or content), tenor (the relationships built between writer and reader), mode (the organisation of the text), and to the overall purpose of the text.

Assignment structure:

The conversation will follow the stages of a data commentary, using appropriate spoken language and turn-taking for a conversation.

  • Indicative summary/highlighting statement: explain what the text is about in brief and what language feature you wish to discuss

  • Description: describe the use of the language feature in the text

  • Explanation: explain why the language features were used, or how they help the text achieve its purpose, build its field, communicate its tenor or organise its mode.

  • Summary: briefly summarise the use of the language feature in the text and justify its use

You can prepare for the final by answering these questions, which you will be asked in the conversation if needed:

Appendix B: evaluative criteria in 2021

Excellent

  • Describing how analysis helped you prepare for exams or understand the course or text

  • Connecting grammar to solving (maths/physics) problems or to writing

  • Identifying parts or stages within the text

  • Saying something meaningful about multiple language features

  • Talking about a few instances of other language features when helpful or relevant

  • Nuanced interpretation of the purpose of the text within the course, text book, class, for the audience, by the author

  • Explaining why you chose one type for a problematic example, or suggesting multiple options

  • Looked at the textbook the text came from and checked what came before and after to contextualise the text

  • Used formatting (highlighting in a different colour) or added comments to make it easy to navigate to examples to talk about

  • Calculated grammatical intricacy and connected to language feature usage

  • Made a joke about language feature

  • Covered a range of types with examples and language features, e.g. 5 types of logic, both most common and least common circumstances

  • Almost perfect analysis – differences to answers are plausible and could be argued for

  • Hypothesising the different roles a language feature plays within the text

  • Analysing the maths text and dealing well with the complexity of formulas

  • Relating the language use to equations, diagrams and figures

Good

  • Briefly introducing what the text is about and where it comes from

  • Briefly introducing which language features you will be talking about

  • Strategic selection of examples to talk about

  • Analysis includes all language features from [course]

  • Adds insight into language features from [other EAP course] (theme patterns, participants, self-mention) as well as [course]

  • Analytical errors are nevertheless used to enhance interpretation, e.g. identifying manner in both circumstances and clauses

  • Revising analysis when challenged to review form

  • Giving a personal response – describing what you liked or how you found analysis

  • Provided a good rationale for selection of text, language feature

  • Formatting of analysis made it easier to read analysis by itself

  • Suggesting how text might be changed to improve its effectiveness in instructing students

  • Choosing a text or language feature to challenge yourself

  • Choosing either a text or a language feature which you find easier so that you can focus more on a text or language feature which is more challenging (e.g. choosing physics because you find it easy but then choosing circumstances as challenging)

  • Comparing similar examples which are worded different, or coded different

  • Used language features to say something about calculus / forces / 1,3 butadiene

  • Explaining circumstances make text specific and detailed referring to physics/chemistry/calculus details

  • Choosing a section of the text to focus on based on the range of language features it includes

  • Used comparative statistics – the most/least common types

  • Gave well-chosen examples that were accurately coded and clear to understand

  • Misses or miscodes a few trickier instances of language feature

Acceptable

  • Comparing how easy it was to identify language feature in different texts

  • Basic explanation of usefulness: helps author to make the text easier to understand

  • Analysis shows an acceptable understanding of basic language feature (e.g. circumstances but not embedded)

  • Identifies most instances of language feature

  • Giving a basic example not in the text when asked to clarify embedded

  • Pre-writing a summary but not using it as a script

  • Used a definition from traditional grammar, not from systemic functional grammar or from class

  • Formatting of analysis required verbal explanation to make sense of it

  • Explaining circumstances make text specific and detailed

  • Deleted language feature to show how it important it is to make meanings

  • Defined a language feature

  • Misses a few significant instances of language feature

Problematic

  • Problematic PPCC analysis – breaking groups apart or joining together

  • Misidentified connectors as circumstances

  • Misidentified clauses as circumstances

  • Circumstance identification is incomplete: prepositional phrases are not fully identified (only preposition or only noun group but not whole prepositional phrase)

  • Calling clauses prepositional phrases when questioned

  • Pre-writing a summary and not being able to answer questions or move to relevant parts when asked

  • Obviously reading a script and not responding to the actual question asked

  • Mislabelled all-ing forms as non-finite

  • Over-applying one type of circumstance or logic, both within and across grammatical forms

  • Formatting of analysis obscured some of the analysis, e.g. making it unclear where one group started and another stopped, or not much annotated.