530
Views
2
CrossRef citations to date
0
Altmetric
Articles

Demythologising A level Exam Standards

Pages 875-906 | Received 01 Sep 2020, Accepted 28 Dec 2020, Published online: 06 Jan 2021
 

ABSTRACT

There are two major myths concerning A level exam standards in England. First, the Ancient Myth, which insists that standards were norm-referenced until the 1980s, when they transitioned to being criterion-referenced. Second, the Modern Myth, which insists that standards transitioned again, during the 2010s, to being based upon the comparable outcomes principle. The present paper debunks these myths, arguing that: except for the occasional use of comparable outcomes to bridge qualification reforms, A level standards have always been attainment-referenced; and that this has always been operationalised using a combination of methods, including both examiner judgement of exam performances and statistical expectations of cohort attainment. The paper also argues that what has changed significantly is the degree of confidence that the exam industry has placed in examiner judgement relative to statistical expectations, which has waxed and waned over time. When statistical expectations have prevailed, pass rates have tended to plateau; somewhat implausibly. When examiner judgement has prevailed, pass rates have tended to rise; also somewhat implausibly. These trends have given a false impression of principled transitions, which the paper dispels.

Acknowledgments

I am grateful to Dennis Opposs, Tim Oates, and an anonymous reviewer for helpful feedback on an earlier draft of this report; and to Cambridge Assessment for support in accessing documents and data from the UCLES archives.

Disclosure statement

No conflicts of interest to disclose.

Notes

1. Although the present report focuses on A level standards, most of its conclusions generalise to both O level and GCSE exams, which have operated in essentially the same way.

2. Pollard was actually discussing percentage requirements; so when referring to ‘number’ he was presumably assuming a similar cohort size.

3. For simplicity of exposition, the analysis in present paper will be framed primarily in terms of the pass rate, i.e. the percentage of a cohort that is awarded grade E or above. This might be the cohort of candidates that sat a particular subject exam within a particular year within a particular board. Or it might be a higher-level cohort, e.g. the cohort of all candidates that sat a subject exam within a particular subject area, across all boards, within a particular year. Or it might even be at the highest level, i.e. the ‘cohort’ of all exams sat within a particular year, across all subject areas, across all boards.

4. These data (and those relating to ) were collated from records in the Cambridge Assessment archives. They include candidates from the University of Cambridge Local Examinations Syndicate Summer examinations series (Home candidates only, Main syllabuses only).

5. Now, if norm-referencing were being applied as a matter of principle, then this ought, in theory, to be a perfectly straight line. In practice, though, it is impossible to engineer absolute precision when setting grade boundaries. For instance, if 72.3% of a subject cohort were to achieve a mark of 26 or more, while 67.3% of that cohort were to achieve a mark of 27 or more, then an exam board would no doubt designate 26 as the ‘70%’ pass mark, even though this would actually return a slightly higher percentage pass rate.

6. See Newton (Citation2011) for other possible reasons, including misinterpreted recommendations from a Secondary Schools Examinations Council report (see from Newton Citation2011).

7. Crofts was Secretary to the Joint Matriculation Board of the Northern Universities (JMB), from 1919 to 1941.

8. The precise details of how grade boundary marks are derived have differed over time and across exam boards. Differences across boards were more prevalent from the 1950s to the mid-1980s; after which, they were gradually brought into alignment, especially during the 1990s. However, the basic idea of empowering senior examiners to judge the quality of candidate performances was common across all boards (Robinson Citation2007; see Taylor and Opposs Citation2018 for a more up-to-date account).

9. I undertook this research as Director of the Cambridge Assessment Network, within the Assessment Research and Development Division of Cambridge Assessment.

10. The raw data were uploaded from photocopies of documents, prepared by the boards on an annual basis, which presented A level exam results, across all boards, broken down by gender. These documents, one for each year, were titled: ‘General Certificate of Education, Advanced level, Summer [year]’. Where alternative syllabuses were offered by a board, within a particular subject area, those results were aggregated. (The original documents were provided by the Cambridge Assessment Archive Team, 29 June 2011.)

11. Although the data in come from a single exam board, similar trends can be seen at the national level, across all subjects and all exam boards (e.g. University of Buckingham Citation2010).

12. Alongside the debate over criterion-referencing, procedures for determining certain A level grade boundaries were changed, and brought into line across the boards. This change, which occurred in 1987, is often mistaken for the ‘transition’ to criterion-referencing (e.g. Shackleton Citation2014; Tattersall Citation2007; cf. Kingdon Citation1991; Newton Citation2011).

13. Although the scales tipped in favour of examiner judgement from the 1980s onwards, it is important to appreciate that the balance between judgement and statistics waxed and waned even during the ‘strangely escalating’ phase. For instance, the Chief Executive of the QCA believed that the A level exams crisis of 2002 was largely a consequence of failing to pay sufficient attention to examiner judgement. Subsequently, a spokesman for the QCA was quoted as saying: ‘When chairs of examiners look to set grade boundaries this year they will have to take account of a revised code of practice that clearly gives priority to examiner judgement not statistics.’ (Townsend and Bright Citation2003).

14. These practices evolved slightly differently across exam boards. For instance, although all boards set the pass/fail grade boundary using examiner judgement of performance evidence, this was not true of the other grade boundaries. While some of the boards set all grade boundaries using judgement, others reserved judgement for key grade boundaries (e.g. A/B, B/C and E/fail), and then ‘interpolated’ the remaining ones. These practices were aligned in 1987 (Kingdon Citation1991).

15. In 1944, Brereton was Assistant Secretary to UCLES. Jenkins was Secretary to the University of London University Entrance and School Examinations Council from 1945 to 1957. In 1953, Petch was Secretary to the JMB.

16. In 1980, Christopher was Secretary to the JMB.

17. This is a fairly recent consensus, formalised to facilitate the introduction of Curriculum 2000 A level exams. It is possible to find evidence of the principle having been applied prior to the turn of the century, although this evidence is limited. Indeed, insufficient understanding and use of this principle during the 1980s and 1990s is likely to have contributed significantly to grade inflation (Pollitt Citation1998; Newton Citation2020a, Citation2020b).

18. Nowadays, as noted earlier, the demographic composition of an exam cohort is judged in terms of a single prior attainment indicator, e.g. mean GCSE point score. The process is operationalised using prediction matrices.

19. Examiner judgement still has a role, albeit a different one. It is used to check that applying the comparable outcomes principle does not lead to results that would lack credibility. If examiners for a particular subject exam were to raise credibility concerns, then this might necessitate a bespoke solution (Cresswell Citation2003).

20. Analyses by Benton (Citation2016) have helped to elucidate this matter. Whilst he acknowledged that comparable outcomes does make system improvement harder to track over time, he also explained how individual schools should still be able to demonstrate improvement; observing that grade boundaries under comparable outcomes are likely to be in the order of only a mark or so more severe than they would have been otherwise. Clearly, we must be careful not to mythologise the degree of impact of comparable outcomes.

21. See Coe (Citation2013) for a pessimistic, but not unreasonable answer.

22. This is why A levels changed from being ungraded (pass-fail) to being graded, in the early 1960s (Montgomery Citation1965).

Additional information

Funding

This work was supported by Ofqual.

Notes on contributors

Paul E. Newton

Dr Paul Newton is Research Chair at the Office of Qualifications and Examinations Regulation in England (Ofqual). Prior to joining Ofqual, Paul was Professor of Educational Assessment at the Institute of Education, University of London. Prior to that, he worked in a variety of assessment agencies, from the Associated Examining Board to Cambridge Assessment. Paul is a Fellow of the Association for Educational Assessment – Europe.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.