2,585
Views
5
CrossRef citations to date
0
Altmetric
Articles

High stakes assessment policy implementation in the time of COVID-19: the case of calculated grades in Ireland

, &
Pages 385-398 | Received 04 Mar 2021, Accepted 07 Apr 2021, Published online: 10 May 2021

Abstract

This paper provides a perspective on the manner in which Irish post-primary teachers interpreted and implemented a set of guidelines created by the Department of Education and Skills (DES) in Ireland when faced with the cancellation of the traditional high stakes Leaving Certificate (LC) examination due to COVID-19. Subject teachers were asked to engage with a system of calculated grades whereby they would estimate a percentage mark and a class rank for each of their students before meeting with school colleagues to agree a final set of data to be submitted for national standardisation. This was a remarkable event in Irish education as teachers had never before been directly involved in assessing their own students for certification purposes. Data from a survey conducted with teachers (n = 713) show that a wide variety of evidence was used to support their judgements and that the DES guidelines were not always implemented as intended. Challenges highlighted in the paper include decision making around grade boundaries, the lack of evidence for newer subjects, negotiating with school colleagues, and anticipating the impact of national standardisation. The study findings will be of interest to future initiatives involving the professional judgement of teachers in high stakes contexts.

Research background

At the heart of this paper is an analysis of a set of data gathered to shed light on how post-primary teachers in Ireland responded to the alternative assessment arrangements put in place when the traditional Leaving Certificate Examination was cancelled following the outbreak of Covid-19.

The Leaving Certificate (LC) examination is a terminal assessment carried at the end of secondary education that has been in existence in Ireland since the 1920s. While elements of project work and performance tasks are used to assess elements of some curriculum areas, most subjects are assessed using traditional paper-and-pencil timed summative examinations held in the month of June. Currently, students take one of three LC programmes during their final 2 years in secondary education: the more academic Leaving Certificate Established (LCE); the academic/vocational Leaving Certificate Vocational Programme (LCVP) and the more practical/vocational Leaving Certificate Applied (LCA). The combined results of each student’s best six subjects at higher and/or ordinary level feed into a points system overseen by the Central Applications Office (CAO) that is used for entry to higher and further education. As a consequence, the stakes associated with performance in the LC examination are very high for many students.

In general, the LC examination enjoys high levels of public trust in Ireland (e.g. Gleeson Citation2010) and the external process of assessment carried out by the State Examinations Commission (SEC) is considered to be fair, reliable and transparent. However, the system is not without its critics in terms of its fitness for purpose (Coolahan Citation2017; NCCA Citation2019; OECD Citation2020; O’Leary & Scully Citation2018), its subject-focused curriculum (Neumann et al. Citation2020) and the extent to which its high-stakes nature and its strong connection to the CAO points system lead to negative back-wash effects (Jeffers Citation2011; Hennessy and Mannix McNamara Citation2013).

On the 12th March 2020, the Irish government announced that all school and colleges would be closed in response to growing concern about the spread of Covid-19. A week later the Minister of Education held a press conference to explain that all oral and practical examinations for the Leaving Certificate examination would be cancelled and that the written exams would be postponed until the end of July. However, at the same time the Minister set up a Technical Working Group (TWG) comprising personnel from the SEC, the Educational Research Centre, the Department of Education and Skills (DES) and individuals with statistical and psychometric expertise to examine the feasibility of establishing an alternative to the written exams (DES Citation2020a). Following consultation with key stakeholders including the teacher unions, the Minister announced on the May 8th that the LC2020 would be postponed until the Autumn and a system of Calculated Grades (CG) introduced in its place. The CG system was to be overseen by a non-statutory body within the DES (the Calculated Grades Executive Office) as, legally, the SEC could not be involved.

The DES (Citation2020b) made it clear that a calculated grade should result from the combination of two data sets: a school-based estimation of an overall percentage mark and ranking to be awarded to a student in a particular subject and data on past performance of students in each school and nationally (the standardisation process). The school-based phase involved each teacher using evidence to estimate a mark and rank for each of his/her students and then agreeing a final set of these marks/ranks with school colleagues at one or more alignment meetings before the data were submitted to the DES. The standardisation phase involved using four sub-sets of data: a class-based performance distribution for LC2020 students in the Junior Cert as measured by a composite score in Irish, English and Maths and the two best subjects; the school’s historical performance in the LC and JC from 2017 to 2019; the distributions based on the relationship between the school’s performance in LC and JC (2017–2019); and, the historical distribution of grades nationally for each subject. Other data such as student gender and DEIS status were used to validate rather than predict the outcomes of the calculated grades process. A National Standardisation Group (NSG) was set up to carry out all the statistical/technical work involved in the national standardisation (DES Citation2020c).

Following controversy in the UK about how the use of calculated grades there had resulted in students attending disadvantaged schools being unfairly treated, a decision was announced in August 24th that the standardisation process in Ireland would operate without recourse to the school historical data (DES Citation2020c, 82).

Theoretical perspective and rationale

Introduction of a CG system to Ireland represented a radical departure in national assessment policy at second level: for the first time in Irish educational history, post-primary teachers’ judgements, in the form of grades and ranks, were required to assess students’ performance for high-stakes certification purposes. Overnight, the educational landscape shifted, repositioning teachers as both advocates and judges of their students, a position steadfastly resisted by many teachers and their trade unions heretofore (Murchan Citation2018). From a policy-implementation perspective, this cast teachers in a pivotal gate-keeping role with responsibility for interpreting and implementing hastily crafted emergency assessment guidelines issued by the DES. The turbulence brought about by the pandemic crisis in Ireland, as elsewhere, served to intensify the reciprocal relationship between what Lipsky (Citation1980) termed, street level bureaucrats (second-level teachers) and decision-makers in the DES. This dependence necessitated a ‘ bi-directional flow of information between policy formation exercised in decision-making venues and on-the-ground implementation’ (Gofen and Lotta, Citation2021, 10).

For their part, on May 21st 2020, the DES (Citation2020b) issued a single policy document to schools, identifying teacher professional judgment as the ‘cornerstone’ of the CG model and requesting that teachers draw on ‘existing records and available evidence’ and, in the interests of ‘fairness and high quality data,’ to estimate as accurately as possible a percentage mark for each of their students and a rank order for each of their classes (11-12).

With respect to evidence, the DES urged that teachers’ professional judgments needed to be ‘suitably informed by relevant data’ but should not be ‘overly constrained or dominated by such data’. It was acknowledged that while ‘not all forms of evidence would be grounded in records’ … only evidence that related to student performance’ should be considered. Most importantly, teachers were advised to submit the ‘most likely’ percentage mark for each student rather than the mark they (the teacher) might ‘hope’ or ‘think’ a student would have ‘a reasonable chance of getting on a good day’ (DEC Citation2020b, 4).

More specifically, the guidelines directed teachers to:

  • space their estimated marks appropriately so that students were not only placed in the correct rank order, but the gaps between marks were a ‘true reflection’ of differences between the individual students

  • avoid inappropriate clustering or the tendency to subconsciously mark in multiples of 5 and 10, and to ‘gravitate towards grade boundaries’

  • be mindful of tendencies to either ‘bring an estimate down so as to avoid having it too close to the next grade boundary’ or to move marks that were originally close to a boundary, above that boundary and

  • remain alert to possible sources of unconscious bias based on perceptions of the student’s classroom behaviour or ‘what they know or think they know about students’ backgrounds, such as their socio-economic or family background.’

Within each school, teachers were also expected to ‘engage with their subject alignment group to make sure that all teachers of the subject with final year Leaving Certificate classes are applying a similar standard in respect of the same subject.’ (DES Citation2020b, 9)

While the policy guidelines document is publicly available, in and of itself, it provides no evidence of what actually transpired in schools. However, policy-implementation research, particularly as it relates to professional street level bureaucrats, has repeatedly demonstrated that the fidelity with which a policy is implemented is based, at least in part, on the congruence between the policy as originally conceived and communicated and the professional’s perceptions of both the merits of the policy and their evolving professional roles and responsibilities (e.g. Cohen Citation1990; Spillane Citation2000; Hill Citation2005). Given the fundamental shift in assessment role and identity implicit in the DES CG policy guidelines of May 2020 and the growing body of research promoting various reconceptualisations of teacher assessment literacy ((e.g. Teacher Assessment Literacy in Practice [TALiP] [Xu and Brown Citation2016] and Teacher Assessment Identity [TAI] [Looney et al. Citation2017]), this paper seeks to provide a unique insight. Based on the findings from a survey of post-primary teachers’ reflections on the first phase of data collection (i.e. the work conducted by teachers to estimate marks and ranks for their students in advance of and during the school moderation/alignment meetings), this paper addresses two policy implementation related research questions: (1) to what extent did teachers implement the DES guidelines and, (2) what challenges did they face while attempting to do so?

Another set of findings related to the impact of the CG process on how teachers view their role as assessors will be published elsewhere. The full set of data are available to review in a report on the preliminary findings from the study published on the CARPE/DCU website – see Doyle, Lysaght & O'Leary (Citation2021). Ethical approval for the study was granted by DCU’s research Ethics Committee (DCUREC/2020/189)

Methodology

A questionnaire survey involving predominantly multiple-choice and Likert-type items was piloted with 12 post primary teachers in September/October 2020. Following revisions, the main study was conducted during the months of November and December using a secure online platform. The questionnaire instrument was designed to gather data on the respondents themselves (e.g. gender, teaching experience etc.), the kind of schools they worked in, and the subject(s) they taught for LC2020; their reflections on the process of estimating marks and ranks for their students and how engagement in the process had influenced their perceptions of assessment and their role as a teacher.

Three forms of volunteer sampling were implemented for the survey:

  • Principals known personally to the researchers were contacted and asked to bring the research and the survey weblink to the attention of their school colleagues.

  • A list of contact details for all post-primary schools in the Republic of Ireland was obtained through the DES website and principals were emailed and asked to bring the research and the survey weblink to the attention of their school colleagues.

  • Contact was made via email and Twitter with a range of national educational bodies such as the Teaching Council, teaching unions, subject associations, education centres and managerial bodies alerting them to the study.

Response

Data from 713 respondents are included for analysis here. The ratio of females to males was 2:1 and most respondents (70%) had over 10 years teaching experience. Most respondents worked in mixed gender, non-DEIS, schools with between 600 and 999 students where English was the language of instruction. 25% worked in DEIS schools with a further 12% working in fee-paying schools.

About a third of respondents were experienced Senior Cycle teachers having taught LC classes eleven times or more. Conversely, a third were inexperienced having taught a LC class to completion just one before. Teachers of 29 of the 36 LC (established) subjects responded to the survey. In all, about one third were either English (15%), Gaeilge (11%) or Maths (10%) teachers. The vast majority of respondents (84%) taught their subject at higher level.

Most (75%) submitted data for between eleven and thirty students in their school, with just 6% involved in estimating marks/ranks for a greater number than that.

About one third of teachers (37%) worked alone in their subject area with another third (37%) working with one or two other colleagues. A quarter responded that, including themselves, four or more teachers submitted data for students in their schools.

Findings

In the survey teachers were presented with a list of items describing different kinds of evidence that they might have used to estimate their students’ marks/ranks and asked to rate the importance of each to their decision making prior to attending alignment meetings. The list is now presented in from high to low in order of importance. It should be noted that many of the items in the table were drawn from the guidelines document sent to school (see, DES Citation2020b, 13).

Table 1. Evidence used by respondents to estimate marks and ranks.*

Data in reveal that a wide range of assessment information was considered by teachers to be important when estimating mark/ranks for their students. Not surprisingly, the outcomes from exams in 5th and 6th year, as well as the mock exams, provided important information for the vast majority (87%+) of respondents. Respondents also valued the information they had from continuous assessments and /or in-class formative assessments from 5th and 6th year. For four of every five teachers, their knowledge of how students in previous LC classes had performed in the LC was important to their decision-making. Interestingly, knowledge of how students in other LC classes in the school had performed in previous LCs (school historical data) was considered to be of value by just 47% (with 45% saying it was unimportant to them). While engagement within class, especially in 6th year, was considered to be important, about half or less of the respondents indicated they felt engagement outside of class was of value in the term before Christmas and prior to lockdown. The vast majority of teachers of a subject with a course-work component indicated that this information was important (just 9% taking the opposite view). The experience of marking for the State Examinations Commission (SEC) was considered to be important for most teachers with that experience (just 16% indicating that it was unimportant). Not surprisingly given DES guidelines, Junior Cert results were not considered to be important sources of information for most respondents.

In addition to rating the importance of each of the items listed in , study participants were also asked to comment on other evidence they had used prior to attending alignment meetings. In total, 71 responses were received. Open coding of these data led to the identification of a number of evidence-related themes including, inter alia, experience, tracking, past LC papers, school historical data, student characteristics and lack of evidence. All direct quotes to follow are in italics and are simply indicative of the theme being addressed. The full set of comments are available to read in a report on the preliminary findings from the study published on the CARPE/DCU website – see Doyle, Lysaght & O'Leary (Citation2021).

Many respondents referred to what was referred to as ‘nous’, or tacit understanding built up from experience. For example, one respondent said: My knowledge of how students of similar ability and work ethic had performed in previous years. However, another readily acknowledged that: Very difficult process as other years I would guess a grade a student would receive in the LC but then would be dumb founded by their achieved grade. For some this Nous was gained outside the context of the LC: Previous teaching experience in England where they already have predicted grades made this process easier for me.

A number of teachers highlighted the fact that they were good at tracking student progress and that this made the estimation of marks easier: Having kept detailed records of assessment of all elements of their work since 5th year, I found it useful in estimating their grades. Some respondents indicated that this involved using other teachers’ data also: Consulted with teachers who had previously taught class before me to gather data but did not disclose this information. In addition, some highlighted the fact they drew on a school-level system of record keeping: We have a process of systematic tracking of students’ performance in each subject. We also compared performance in 2020 mock exams relative to previous years to gauge their relative strength.

The use of past leaving cert papers also played an important part in some teachers’ decision making: I give many class tests over the two-year period, all based on past higher biology papers. In fact, my students sat over 20 biology tests in two years and this gave me a very clear idea of their abilities and how they perform in an exam situation. This was vital to my predictive grades.

Many respondents referred to the fact that they and/or their colleagues drew on their school’s historical data. Typical comment included: I looked at the previous 3–4 years’ exam results at all levels and, I calculated the average increase from mocks to leaving cert. It was clear that these data were also used in alignment meetings: Historical data from 2017, 18 and 19 for each subject department was used to moderate results. Some schools used national comparisons as well: Comparison of School ‘s past performance in chosen subject in comparison to national averages. Interestingly, some Junior Cert results were used when LC data not enough. As one person explained: JC result was taken into account more so with students who were borderline but main emphasis was on LC results.

it was clear from some commentary that student characteristics were an important source of evidence used by some teachers to inform their decision making. Attitude, application, effort was identified by one as what was important for them specifically. Another noted the importance of Student work ethic and focus and willingness to take feedback from assignments and formative assessments on board in order to improve. A dilemma that must have been faced by so many teachers once exams were cancelled in March 2020 is captured well by this respondent: Potential to improve between March and June – Important, almost all students improve in the last few months.

Finally, the lack of evidence was highlighted as an issue by a number of teachers of new LC subjects such as Politics and Society, PE and Computer Science. The dilemma faced by these teachers was captured well by this respondent: This was the first year for computer science leaving certificate. We were a pilot school. There were no past papers, no junior cert subject. One sample paper, no marking scheme. No mock exams etc. No guidelines on what a H1 looked like or a H8 for that matter.

In summary, both sets of data (quantitative and qualitative) point to the fact that a wide range of evidence was used to inform teacher judgements even if not all of all of it was related directly to student performance as requested by the DES.

The data in derive from a question teachers were asked about different decisions they made when awarding marks to their students in advance of the alignment meetings. Each decision is described in a statement (labelled a to k) and two sets of percentages apply to each – a percentage of respondents selecting each statement and a percentage of students for whom the statement is true. The statements (items) are rank ordered from high to low according to the percentages selecting the Zero % option in the table.

Table 2. Decisions teachers made when estimating marks for individual students prior to attending alignment meetings.*

The vast majority of teachers (88%+) indicated that no students in their class benefited from being awarded a higher grade because of needing a particular grade for a course (item a) or because of a fear that teachers in the school or in other schools would mark leniently (items b and c). That said, 18% of teachers took some of their students’ challenging circumstances outside of school in account in deciding to award a higher mark than they (the respondent) felt they would’ve achieved in the exam (item d). There is evidence in that data that decisions for individual around grade boundaries were problematic for some teachers. For example, 17% of teachers indicated that they should’ve given a failing mark but didn’t in the case of 5% of their students (item e). In addition, 61% of teachers said they gave some of their students the benefit of the doubt and moved them above a grade boundary (item g). It’s also worth noting in this case that almost 24% of teachers said they did this for 25% or more of their students. The data with respect to item f is particularly interesting. While two thirds of respondents indicated that the prospect of national standardisation did not impact on decision they made about their students, a third said they awarded higher marks to some of their students with that in mind (item f).

While almost two of every three teachers (62%) indicated that they had plenty of evidence to estimate a mark for almost all their students (90%+), a third also indicated that this was the case for 75% or less of their students (item j). Not surprisingly, given all these data, very few teachers (17%) indicated that estimating a mark was easy for almost all the students (item g).

The vast majority of teachers (77%) indicated that they were able to apply the DES guidelines strictly when estimating marks and ranks for almost all their students (90%+)(item i). However, consistent with other data in the table, it is no surprise that about one in every four respondents indicated that applying guidelines strictly in the case of some students was problematic. That said, the vast majority of teachers (92%) felt that they awarded were fair in the case of 90% or more of their students (item k).

Data on the extent to which respondents agreed that different elements of the calculated grades posed challenges for them are contained in . Note that challenges prior to, and during, the alignment meetings are presented separately and rank ordered from high to low agreement in relation to each statement (items 1a to 1c and items 2a to 2f, respectively).

Table 3. Teacher reflections on the process of estimating marks/ranks for students prior to and during the alignment meetings.*

While most teachers (70%) agreed that combining different types of assessment data to arrive at their initial marks/rankings was easy, almost a quarter disagreed (item 1a). Many (51%) also indicated that they found inconsistencies in their students’ performance over time difficult to reconcile (item 1b). More than a third agreed that they found it hard to remain unbiased when reaching decisions about their students (item 1c).

It is heartening to see that very high percentage of respondents (89%) indicated that they found it easy to justify their decisions to colleagues during alignments meetings in their schools them (item 2a). That said, about almost a quarter of the respondents (23%) also agreed that they found it hard to voice concerns about how their colleagues arrived at their decisions (item 1b).

Respondents were divided on whether the estimation of marks and ranks differed among teachers in their school with 38% agreeing it did and 45% disagreeing (item 2c). The data also provides evidence that marks awarded by some teachers initially changed following the alignment meetings with 26% agreeing they awarded a higher mark (item 2b) and 17% a lower mark (item 2e). Very few teachers (13%) agreed that their students would have received a higher mark from a colleague (item 2f).

Discussion and conclusion, including future-facing recommendations

The data in this study make it clear that, following DES advice, a wide variety of evidence was used by teachers when making decisions about marks and ranks, and unsurprisingly, while some types of evidence was common to almost all teachers, it was also the case that teachers differed in what they used. Some had more experience teaching LC classes, for example, while others were able to draw on their experience of marking LC papers for the SEC. Some may have maintained systematic records of their students’ performance over fifth and sixth year up to lockdown, or worked in schools where systems of tracking were common, while others may have relied more on data from one-off exams such as the mocks. It is also clear that hard evidence such as a school’s performance data from prior years of the LC were used by some, while softer evidence related to personal characteristics such as impressions of how students were working prior to lockdown played a part for others. Over a quarter said they found it difficult to combining qualitative and quantitative data and one in two (approx.) agreed that they found it difficult to reconcile inconsistencies in student performance. Significantly, more than a third of teachers indicated said that, for at least some of their students, they lacked what was described in the survey as ‘plenty of evidence.’

It seems clear that teachers faced a number of dilemmas when reviewing data to reach a conclusion about marks. The data here show that less than one in five found it easy. Decisions around grade boundaries, in particular, were problematic for many, as were judgement made in relation to students with challenging circumstances outside of school. While most were unaffected by worries about lenient marking by colleagues in their own or other schools, a third indicated that were influenced to some degree by the fear that the national standardisation would bring their marks down. Alignment meetings also posed challenges with a quarter agreeing that they changed their marks and found it hard to voice concerns about how others reached their decisions.

More broadly, the overwhelming majority of respondents in this study felt that they were fair to all or almost all of their students. However, it must be acknowledged that given the nature and variety of the evidence used, the fact that the DES guidelines may not have been applied strictly in some instances, and that efforts to align grades occurred within schools only, one can only surmise about the extent to which judgements were consistent across all schools. What is clear, however, is that the LC2020 outcomes, when they were published in September, looked very different to those from previous years and pointed to a clear gap between what was envisaged in the policy document and what teachers actually did.

Analysis conducted by the DES (Citation2020c) showed that grades awarded by teachers were overestimated on all points in the achievement spectrum with the percentage of H1s awarded in most subjects between two and three times higher than was observed in 2019. It was noted that aligning the 2020 distribution calculated grades with distributions from prior years of the LC would have required that about 60% of higher level grades and 25% of ordinary level grades be reduced by one grade. Ultimately, following a government decision not to include school historical data, the standardisation process resulted in changes to 17% of grades. This still meant that, on average, students in 2020 received approximately 40 CAO points more than their counterparts from previous years, which, in turn, necessitated the creation of thousands of additional places in programmes across the third level sector.

While the issue of grade inflation was never overtly addressed in the guidelines document sent to schools in May, other calculated grades-related publications make it clear that DES were fully cognizant of the research on teacher judgements and the potential for some teachers to overestimate their own students’ marks (see, DEC Citation2020a, 5–7). However, as data in this paper show, assessment decisions during the calculated grades process were not always made on the basis of policy guidelines but were often the result of teachers’ idiosyncratic use of evidence and approaches taken to estimate student marks and class ranks. It is important to point out here that no blame can be attributed to the DES or to teachers for whatever gaps resulted between the policy and practice of calculated grades. In the short time available, the DES produced an outstanding set of guidelines for schools. And it was noted that teachers engaged in the process with ‘the utmost integrity and professionalism’ (DES Citation2020c, 30). Both sides had the welfare of all LC2020 students at heart.

The fact that Irish post-primary teachers were involved in assessing their own students for certification purposes for the first time ever was a remarkable event. The DES can take comfort in the fact that the calculated grades process survived a legal challenge (O’Brien Citation2021) and, despite the many difficulties encountered, teachers played a pivotal role in ensuring that the vast majority of students were able to progress in their education and careers. Whether or not this has any long term impact in terms of LC reform remains to be seen. A dual approach involving certified (calculated) grades and the traditional LC exam, both to be overseen by the SEC, is planned for LC2021. Once again, teachers will be cast as gate-keepers to interpret and implement whatever policy will be agreed to ensure the fairest possible outcomes for all students. Our hope is that this study, in shedding some light on what transpired in 2020, will be used to support planning for whatever assessment change is on the horizon in 2021 and beyond.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Additional information

Funding

This work was supported by Prometric Inc.

Notes on contributors

Audrey Doyle

Audrey Doyle is an assistant professor in the School of Policy and Practice in DCU. A former second-level principal of a large all-girls post-primary school in Dublin, she achieved her Ph.D. in Maynooth University in 2019. She now lectures on curriculum and assessment across a diversity of modules in DCU, contributing to the Masters in Leadership and the Doctorate in Education.

Zita Lysaght

Zita Lysaght is a member of the School of Policy and Practice and a Research Associate and member of the Advisory Board and Advisory Panel of CARPE at DCU. She coordinates and teaches classroom assessment and research methodology modules on undergraduate, masters and doctoral programmes and directs and supervises a range of research and doctoral projects.

Michael O'Leary

Michael O'Leary holds the Prometric Chair in Assessment at Dublin City University where he also directs the Centre for Assessment Research, Policy and Practice in Education (CARPE). He leads a programme of research at CARPE focused on assessment across all levels of education and in the workplace.

References