4,807
Views
3
CrossRef citations to date
0
Altmetric
Editorial

Does a test have to be fair to be valid?

In recent years, there has been an increasing focus on the need to consider not only the validity and reliability of tests but also their fairness. Tests need to be fair to be valid, free from bias, and ideally fit all test-takers, independent of their gender, race or socio-economic background. In the current regular issue of Assessment in Education, we publish a set of articles that shed light upon fairness in assessment, from different perspectives.

In the first published article in this regular issue, Verhart, Bouwer, Donche and De Maeyer present a meta-analysis performed on the results of 49 Comparative Judgement (CJ) assessments conducted between 2014 and 2016. The use of assessment methodology CJ is on the rise as it offers an alternative to more conventional marking by comparing and rank order exam texts, videos and other student products with user-friendly technological platforms (McGrane, Humphry, & Heldsinger, Citation2018). Recently, we have also published several articles in this journal documenting its impact in the field of assessment and practice (Hopfenbeck, Citation2019). It has been argued that CJ can offer a more reliable assessment (Heldsinger and Humphry, Citation2010, Pollitt, Citation2012) although some scholars have also questioned the high reliability found in some studies (Bramley & Vitello, Citation2019). The question of reliability is of particular importance for students sitting for high-stake tests who need a fair and valid assessment, and it is one of the reasons why the current study of Verhart et al. (this issue) is of interest.

The aim of their study was to investigate which assessment characteristics influence the level of reliability, with the goal to provide research and practice with guidelines for use of CJ. Most of the CJ in their study were conducted in higher education while others were in primary and secondary education, as well as research and selection, offering insight for both researchers, educators and practitioners.

As discussed in the previous research (Pollitt, Citation2012, Jones and Wheadon, Citation2015), the number of comparisons will impact the level of reliability when using CJ. In the present study, one of the main findings was the number of comparisons per representation was the only characteristic that consistently affected the reliability. Verhart et al. (this issue) write that to reach a reliability of .70, between 10 and 14 comparisons per representations were needed. For a reliability of .90, 26 to 37 comparisons were needed. One of the authors’ recommendations for future work on CJ is therefore to carefully consider the number of comparisons per representation. Another future research direction would be to further investigate the differences found between expert assessors and more novice assessors, as the authors have demonstrated that the level of expertise impacts how much effort the assessors use on the judgement. It also taps into the issues of the differences between holistic assessment, as CJ represents, and more analytic assessment, such as using rubrics. In high-stake exam settings where students do not expect feedback on their assignment, a more holistic CJ approach might be a better, fairer and more reliable approach of assessment. Matters which have also been raised by McGrane et al. (Citation2018).

In the second article, Barrance (this issue) reports from a study that surveyed 1600 GCSE students in Northern Ireland and Wales, investigating what affected students’ performance and the WISERD Education project, an annual, longitudinal multi-cohort study surveying secondary school students in Wales. In addition, focus groups from the GCSE students were recruited from five non-grammar schools and one grammar school in Northern Ireland (65 students) and six schools in Wales (68 students). Since there are no grammar schools in Wales, the schools were sampled according to the proportion of students above and below the national free school meal average. It is worth noting that the researchers included a youth forum in their research project to receive advice on questions in the process, following up the ideals from the United Nations Convention on the Rights of the Child (UNCRC) that ‘children have the right to have their views taken into account in decisions that affect them’ (Lundy & McEvoy, Citation2012).

Barrance (this issue) takes the position that ‘a test has to be fair to be valid’ (Xi, Citation2010), and that fairness can also be understood as ‘comparable validity for all individuals and groups’ (Willingham and Cole, Citation1997). She further argues that students’ perspectives and experiences of assessment can make a contribution to our understanding of fairness of the task-taking and preparation stages of internal assessments, and the article therefore outlines fairness of internal assessment in relation to testing appropriate skills, authenticity of students work and how the assessments facilitate ‘best performance’. More specifically, she is investigating how fairness is related to controlled assessment, a form of assessment introduced as part of the GCSE reform in 2009, with the goal to include a type of internal assessment which shared features with examinations, but was controlled and supervised by the teacher internally at the schools.

One of the many interesting findings from this study is the fact that over 70% of students reported they received some form of support from their parents or carers when preparing for their controlled assessment. In other words, the notion that controlled assessments have removed the advantages for some students when using coursework as an assessment practice, is not necessarily so. Another interesting finding suggested students experience unfairness related to which teacher they have as some teachers would help their students more than others, while some teachers would make sure to hold strictly to the regulations of controlled assessment and help students less. Students’ experiences of these issues included the administration of different timings, with some classes being given more time than others. Other issues were related to the claim that a class had done their controlled assessment three times to bump up their grade.

Barrace concludes that students’ experience of controlled assessment varies considerably across schools, classrooms and contexts, and hence begs the question of fairness in relation to controlled assessment. Her suggestion that schools and teachers need to pay greater attention to the standardisation of administrative procedures around controlled assessment sounds like a fair and important place to start.

Rasooli, Zandi & DeLuca (this issue) further explore the concept of fairness in their article in this regular issue from the lens of organisational justice theory. Acknowledging the increased focus upon fairness also as a core foundation of classroom assessment, the authors have reviewed current definitions and theories of classroom assessment fairness, providing a helpful overview of the field. They further discuss the use of the concept of fairness and how it is distinguished from the concept of justice.

In the final article in this issue, Nisbet & Shaw discuss the concept of fairness in assessment. They argue that there is a need for analytic work to inform the debate on fairness, as the concept has been viewed through a range of different lenses and traditions of thoughts. The authors outline different uses of fairness relevant for assessment and suggest a framework of questions. Drawing upon mostly North American studies, the authors also provide a historic overview of the use of fairness in the US, reminding the readers that the word fairness, first appeared in the 1974 edition of the Standards. In 1985, the fourth edition of the Standards contained the first full section dedicated to fairness issues, while the sixth and current edition (2014) has a prominent chapter on fairness. As the authors suggest, the current debate now often includes the big three, reliability, validity and fairness, not only the first two. They suggest further work is needed to give clarity to the debate around the different views and practices, including social justice. Allowing different scoring for groups identified by construct-irrelevant characteristics (e.g. SES) could be one example where such analysis could be beneficial.

Finally, we are publishing a book review by Cohen of The Handbook of Cognition and Assessment; Frameworks, Methodologies, and Applications, edited by Andree A. Rupp and Jacqueline P. Leighton (this issue). The handbook is organised into three sections: Frameworks, Methodologies and Applications, with a total of 52 authors who have contributed to the 23 chapters, all ‘accomplished researchers and assessment practitioners, from testing organizations, universities and government agencies – represent the multi-faceted work being conducted in North America’. Cohen (this issue) recommends the Handbook as an excellent source of survey and introductory chapters that can be used for graduate courses in assessment or in-service training workshops for testing agencies. Although he agrees with the authors it is not possible to cover all nuances of cognition and assessment in a single handbook, Cohen writes they have succeeded in providing a reasonable view of the overall landscape of the integration of cognition into assessment.

Disclosure statement

No potential conflict of interest was reported by the author.

References

  • Barrance, R. (2019). The fairness of internal assessment in the GCSE: The value of students’ accounts. Assessment in Education: Principles, Policy & Practice. doi:10.1080/0969594X.2019.1619514
  • Bramley, T., & Vitello, S. (2019). The effect of adaptivity on the reliability coefficient in adaptive comparative judgement. Assessment in Education: Principles, Policy & Practice, 1–16. doi:10.1080/0969594X.2017.1418734
  • Cohen, Y. (2019). The handbook of cognition and assessment frameworks, methodologies, and applications. Assessment in Education: Principles, Policy & Practice. doi:10.1080/0969594X.2019.1597679
  • Heldsinger, S. A, & Humphry, S. M. (2010). Using the method of pairwise comparison to obtain reliable teacher assessments. The Australian Educational Researcher, 37, 1–19.
  • Hopfenbeck, T. N. (2019). Writing assessment, comparative judgement and students’ evaluative expertise. Assessment in Education: Principles, Policy & Practice, 26(1), 1–5.
  • Jones, I. & Wheadon, C. (2015). Peer assessment using comparative and absolute judgement. Studies in Educational Evaluation, 47, 93–101.
  • Lundy, L., & McEvoy, L. (2012). Childhood, the United Nations convention on the rights of the child and research: What constitutes a ‘rights-based’ approach? In M. Freeman (Ed.), Law and childhood (pp. 75–91). Oxford: Oxford University Press.
  • McGrane, J. A., Humphry, S. M., & Heldsinger, S. (2018). Applying a thurstonian, two-stage method in the standardized assessment of writing. Applied Measurement in Education, 31(4), 297–311.
  • Nisbet, I., & Shaw, S. D. (2019). Fair assessment viewed through the lenses of measurement theory. Assessment in Education: Principles, Policy & Practice. doi:10.1080/0969594X.2019.1586643
  • Pollitt, A. (2012). The method of adaptive comparative judgement. Assessment in Education: Principles, Policy & Practice, 19(3), 281–300.
  • Rasooli, A., Zandi, H., & DeLuca, C. (2019). Conceptualising fairness in classroom assessment: Exploring the value of organisational justice theory. Assessment in Education: Principles, Policy & Practice. doi:10.1080/0969594X.2019.1593105
  • Verhavert, S., Bouwer, R., Donche, V., & De Maeyer, S. (2019). A meta-analysis on the reliability of comparative judgement. Assessment in Education: Principles, Policy & Practice. doi:10.1080/0969594X.2019.1602027
  • Willingham, W, & Cole, N. (1997). Gender and fair assessment. London: Lawrence Erlbaum.
  • Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27(2), 147–170.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.