1,962
Views
0
CrossRef citations to date
0
Altmetric
Editorial

Systemic influences on standard setting in national examinations

The field of educational assessment has built strong technical foundations, but we can be myopic when the big education questions are asked. One hot topic is educational standards and how they are set. Standard setting is typically defined as ‘the process of establishing one or more cut scores on a test’ (Cizek & Bunch, Citation2007, p. 13). This process comes after test administration, so content standards have already been decided and the focus is on outcome standards. Much of the research on standard setting has concerned itself with different statistical or judgemental processes; their effects upon test outcomes, the robustness of the procedures and so on. There needs to be space for assessment researchers and practitioners to explore these important matters and to communicate their findings, however technical they may be. Notwithstanding, the standard setting literature does not much address the broader societal concerns about assessment standards; it has a narrow focus. The wider issues pose questions for the assessment field about how the technical assessment matters intersect with questions of curriculum, pedagogy, educational experiences, policy, politics, social justice, formulation of people’s identities and so on.

A recent project documented standard setting approaches to high-stakes assessments in a range of countries (Baird et al., Citation2018). A surprising array of processes were used across jurisdictions. Further, whilst each process was accepted practice in the context in which it had been developed, often features of it would have been hugely problematical for use in other jurisdictions. For example, in France the Baccalauréat does not have a standard setting process, as defined above: it is simply marked on a scale from 1 to 20 (Gauthier, Citation2018). This is only possible because the selective universities also require students to pass preparation classes. In Georgia, norm-referencing has led to an adequate supply of higher education candidates, but the quality of their performances is low (Andguladze & Mindadze, Citation2018). In Chile there has been a great deal of disquiet about the suitability of the examination system and its standards due to equity issues (Osses & Varas, Citation2018). Having described a range of systems, subsequent questions arise about why they are so different, how they developed, what makes them acceptable, what tensions arise in each system and why a single, best, most scientific model has not been adopted universally. After all, much of the literature on standard setting has concerned the accuracy of our various approaches. However, decisions about which approach should be used are evidently not only technical decisions. Standard setting practices must be acceptable to the stakeholders who buy into them. We observe that there are tensions about this acceptability in all systems, but there is typically a social compact around them for standard setting in large-scale examinations. Alternatively, systems are under reform. Indeed, in a number of cases, participants in the project had to withdraw because the examinations were undergoing reform and the changes were too politically sensitive to undergo broad scrutiny.

This Special Issue sets out to explore some of the systemic factors that shape standard setting practices and form part of the environment in which decisions are taken. There are likely to be many such factors and this Issue is intended to provoke discussion about meta-issues in our field. Social and cultural factors are germane to what is deemed fair in standard setting. For example, the position of teachers in society shapes the acceptability of teacher assessed standards. However, structural factors are also important, such as the ways in which schools are held accountable, how examinations are regulated and the use of assessment results in higher education. The purpose of the assessments is also impacted by the way in which societies are constructed, which in turn affects standard setting practices. Colonial and regional governance influences are also apparent in education and assessment systems, meaning that similarities in standard setting approaches are also in evidence. None of these factors dictate the shape of the standard setting systems; they interact in creative and idiosyncratic ways and in many societies, there are political tensions which lead to constant pressures for reform of standard setting systems. This Issue does not attempt to tackle all structural factors or to provide a framework for thinking about them. Instead, the articles illustrate three systemic factors which influence how standard-setting is conducted:

  • policy and politics (Cummings; Gray, Tong, Lee and Luo; this issue),

  • assessment paradigms (Bramley; Cummings, this issue) and

  • governance structures (Brunfaut and Harding; Opposs, Baird, Chankseliani, Stobart, Kaushik, Johnson and McManus, this issue).

In the lead article, Gray (this issue) addresses the prospects for evidence-based policy in standard-setting, since it is a highly political area. Additionally, many assessment researchers are funded by organisations with commercial interests, or at least vested interests, even if they are not all commercial. This leaves individuals and organisations in a difficult place with regards to research on standard setting, since being very open can seriously undermine confidence in the system. Additionally, challenging policy through evidence can be unrewarding and career-limiting. In any case, what counts as ‘evidence’ requires interpretation, as we in the UK have witnessed dramatically during the current COVID-19 pandemic. Experts disagree and ultimately have different value lenses through which they interpret data.

Gray takes up the issues of positionality and models this in the style of the article, as she draws upon her personal experience. She makes concrete, as well as contextualises, the theoretical debates that have been conducted in the literature on insider researchers. Exam board research is a distinctive context. In these ways, she connects the standard setting literature with wider debates on the policy influence of research. Those assessment researchers who see their jobs as science, with the rest being someone else’s role, may find this article uncomfortable reading. For others, the depiction of the issues that assessment insiders face may be liberating and the link to the broader literature instructive. She takes up notions of communicative space, boundary-spanning, relationship-building and reciprocity to propose a way forward for exam board researchers who want to influence policy. Four key suggestions are made which are based upon previous literature, tailored to the situation of exam board insider researchers.

The level of co-construction with policy makers that is proposed is observable in most successful policy influencing relationships. However, these kinds of relationships are usually only possible between researchers and policy-makers when there are shared values. Thus, ideas about what counts as good standard setting procedures will be filtered by those lenses and certain researchers may be elevated and discarded with the rise and fall of their contacts.

Cummings (this issue) also takes up the issue of policy through the reforms that are shaping high stakes assessment in Queensland, Australia. Queensland has been an often-cited case of high stakes assessment based upon teacher judgement, but the latest reforms have moved the system to externally-set assessments combined with an element of teacher assessments. Historically, the Queensland assessment system had been inherited from the English model and cohort-referenced standard-setting, with fixed proportions gaining the grades. A curriculum-based paradigm was adopted in the 70s when the teacher assessment model was introduced. This approach involves prioritising the assessment of curriculum, with a focus upon the operation of the qualification system, as it plays out across the schools. Recent reforms have blended different paradigms, with a great deal of the discussion focusing upon psychometric views of standards and quality. To a greater degree, psychometrics approaches prioritise technical definitions of assessment quality. Cummings questions whether such a blend is compatible at a theoretical level and it will be interesting to see how acceptable it is, what compromises have to be made and what problems arise practically. However, as noted in Gray (this issue), it is entirely possible that a lot of these debates will be insider discussions. Isaacs and Gorgen (Citation2018) argued that paradigm change was highly unusual in high stakes assessment systems. Cummings points to three conditions that she argues accompany paradigm change. These preconditions are: dissatisfaction with the current model, existence of an acceptable paradigm and a majority in favour of change. It will be interesting to see whether these conditions resonate with assessment reform researchers in other settings and whether they can be refined and augmented by future research.

Continuing the theme of assessment paradigms, Bramley (this issue) focuses upon a specific paradigm – psychometrics – and discusses the extent to which metaphors underpin the logic of this approach. He contrasts the measurement of cognitive features of individuals with physical attributes in sport. Underlying this article is the nature of language, which Bramley accepts as vague and fuzzy, in line with Wittgenstein’s (Citation1953) views. He argues that spatial features are given to attributes through the use of metaphors in psychometrics, which he concludes is useful, powerful, clear and brings order to an otherwise messy field. Like others, Bramley accepts that there is scope for other metaphors and perspectives. This is an argument from a psychometrics perspective which grapples with the underlying theoretical structure of that paradigm. Whether the author believes that this is a superior paradigm is left unstated. Contrasting ways of thinking were outlined in the Cummings article (this issue) and it would be interesting, for example, to know whether Queensland curriculum-based thinkers would agree with Bramley’s priorities. For example, he states that the conceptualisation of standard in psychometrics as ‘a point on an abstract line’ is a big advantage of the measurement metaphor. Many educationalists would not recognise this, or the ensuing modelling and strictures as the key to good assessment (see Baird and Black, Citation2013; Lawn, Citation2008). This article analyses the philosophy of psychometrics, and it also serves in this special issue to depict a particular way of thinking about assessment. Our paradigmatic views of assessment are one structural impact upon how standards are thought about and processes are shaped.

The next two articles deal with the theme of governance structures, the first one explicitly and the second is an empirical study conducted in reference to a standards framework constructed under the governance of the European Union. Opposs, Baird, Chankseliani, Stobart, Kaushik, McManus and Johnson (this issue) describe three different forms of governance for the commissioning of examinations in different countries and go on to predict what the implications of these governance structures might be for standard setting practices. They note that typologies for exam systems have been problematical, as exam systems are idiosyncratic. The three governance structures are: nationalised, a commercial market and a quasi-market. The authors comment that these governance structures have been in plain sight, but there has been no identified work on the implications of them. Three prototypical case studies are presented to illustrate each governance structure: Ireland the USA and India. However, they go on to classify the examination systems of 22 other jurisdictions, most of which are nationalised. The only example of a commercial market they could find was the USA, but perhaps further examples will be indicated in future research. The authors discuss how stakeholder influence and power over standard setting choices might shift across the different governance structures.

Governance structures can extend well beyond the national and may be influenced by economic, colonial or political ties. Qualifications frameworks have been generated to align standards between qualifications within and across jurisdictions, often by giving particular credit levels for qualifications. The Common European Framework of Reference (CEFR) is a European Union criterion-referenced approach to defining the standard of language skills. Brunfaut and Harding (this issue) conducted a study of the standard setting sessions for national examinations in Luxembourg to investigate the use of these EU standards in a local context. The high levels of language skills of test-takers posed challenges for standard setting because the criteria are made concrete in local contexts. When students were familiar with a number of languages, vocabulary or grammar that may be deemed difficult for other groups were thought to be less so. Thus, there were ceiling effects and the test-takers were exceeding curricular targets. However, with a streamed schooling system, there were other contextual factors to be taken into account. The cohort was not homogenous. Brunfaut and Bruner’s article illustrates the tensions that standard setting judges have when trying to take into account the various expectations for the standards at local, national and international levels. They are not always coincident.

The final paper in this issue returns to the theme of policy, giving and account of the 2012 reforms to standard setting in Hong Kong. Tong, Lee and Luo (this issue) explain how standards referencing has been adopted and adapted in this process. The Hong Kong Diploma of Secondary Education Examination was part of a somewhat radical reform to the education system more broadly, including the move from a three- to a four-year degree programme. These structural changes have large implications for the levels below and the reforms were over 10 years in discussion. A standards-referenced approach, within a largely psychometric paradigm was adopted but to cater for a highly selective tertiary education system, an additional, norm-referenced grade was introduced (5**) for the top 10%. International benchmarking and pressure for high standards from the universities meant that the curriculum was very challenging, leading to a range of problems in the delivery of the new assessment system and its standards. It is instructive to read the aims of the curriculum reform in this article, as they are so familiar to many assessment systems globally: preparation for the knowledge economy, emphasis on higher-order thinking skills, raising the quality of education and increased participation at tertiary level. Yet how these goals are enacted varies a great deal by context.

Standard setting systems are perplexing unless we are familiar with the context in which they have been generated. To work well, they must fit into the jigsaw that is created by the historical, political, educational structures and processes that the assessment is part of. This Special Issue has raised some of these issues and the articles are highly contrasting because of the different situations of the research. Although we are yet to see a homogenising effect of globalisation on this area, it is possible that the future will see more commonality as national ways of operating are transformed by global ideas and actors such as international organisations.

References

  • Andguladze, N., & Mindadze, I. (2018). Chapter 8 in Standard setting in Georgia: The unified national examinations. In J.-A. Baird, T. Isaacs, D. Opposs, & L. Gray (Eds.), Examination standards: How measures and meanings differ around the world (pp. 133–151)). UCL IOE Press.
  • Baird, J., & Black, P. (2013). Test theories, educational priorities and reliability of public examinations in England. Research Papers in Education, Special Issue: The Reliability of Public Examinations, 28(1), 5–21. https://doi.org/10.1080/02671522.2012.754224
  • Baird, J.-A., Isaacs, T., Opposs, D., & Gray, L. (Eds.). (2018). Examination standards: How measures and meanings differ around the world. UCL IOE Press.
  • Cizek, G. J., & Bunch, M. J. (2007). What is standard setting? A guide to establishing and evaluating performance standards on tests. Sage Publications Inc.
  • Gauthier, R.-F. (2018). Chapter 7 in Standard setting in France: The baccalauréat. In J.-A. Baird, T. Isaacs, D. Opposs, & L. Gray (Eds.), Examination standards: How measures and meanings differ around the world (pp. 119–127)). UCL IOE Press.
  • Isaacs, T., & Gorgen, K. (2018). Culture, context and controversy in setting national examination standards. In J.-A. Baird, T. Isaacs, D. Opposs, & L. Gray (Eds.), Examination standards: How measures and meanings differ around the world (pp. 307–330). UCL IOE Press.
  • Lawn, M. (Ed.). (2008). An Atlantic crossing? The work of the international examination inquiry, its researchers, methods and influence (Comparative histories of education). Symposium Books.
  • Osses, A., & Varas, M. L. (2018). Chapter 5 in Standard setting in Chile: The Prueba de Seleccion. In J.-A. Baird, T. Isaacs, D. Opposs, & L. Gray (Eds.), Examination standards: How measures and meanings differ around the world (pp. 78–95)). UCL IOE Press.
  • Wittgenstein, L. (1953). Philosophical investigations. English translation by G. E. M. Anscombe. John Wiley and Sons Ltd. (reprinted 2001).

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.