673
Views
1
CrossRef citations to date
0
Altmetric
Research Articles

Lower secondary school teachers’ arguments on the use of a 26-point grading scale and gender differences in use and perceptions

ORCID Icon, ORCID Icon & ORCID Icon
Pages 56-74 | Received 20 Jun 2023, Accepted 23 Feb 2024, Published online: 10 Mar 2024

ABSTRACT

This study explores lower secondary school teachers’ arguments and perceptions for using a 26-point grading scale (26-PGS), and gender differences in assessment practice. An explanatory sequential design was conducted. First, teachers (n = 6) assessment of students’ text (n = 182)was analysed.In the subsequent phase, an open-ended questionnaire with teachers (n = 54) was conducted and analysed. The study revealed that the teachers perceive that the 26-PGS provides precision. Teachers highlight the significance of using the 26-PGS as an alternative assessment method, aiming to foster students’ growth, and as a message to motivate for learning. In addition, gender disparities in teachers’ provision of grades and arguments of using a 26-PGS as part of their assessment practice were found. The study contributes to the existing literature by shedding light on teachers’ assessment practice and gender differences regarding the use of grading scales and discuss its potential challenges in educational contexts.

To ensure consistent grading standards, quality assurance procedures are needed (Brookhart & Guskey, Citation2019; Crisp, Citation2017; Welsh, Citation2019). Harlen (Citation2005) states that good assessment will mean good assessment of learning, not necessarily only for learning. However, assessment of students’ performance will be subject to some error and bias, due to teachers’ teaching styles, values, and beliefs (McMillan, Citation2019). A large body of research has explored teachers’ grading practices, but less research is conducted on teacher perceptions about grading (McMillan, Citation2019).

Assessment can have many purposes, but the two main purposes discussed in this article are for summarising learning and for helping students to learn (Harlen, Citation2005). Assessment as an integrated part of teaching and students’ learning processes should be perceived and experienced as formative, if a mean is to enhance learning and teaching (Black & Wiliam, Citation2018; Brookhart, Citation2018). To unfold the complex layer of learning and assessment, teachers aim to use feedback which provides a better representation of students’ learning (Black & Wiliam, Citation1998; Brookhart, Citation2018). Brookhart (Citation2018, p. 72) emphasis ‘that feedback on summative assessment may be used strategically for specific, limited purposes’, as, for example, if students are given an opportunity to revise their work or retake a test. Still, the most appropriate role for a formative learning cycle is to provide feedback, which can be acted upon (Brookhart, Citation2018). In the school context, assessment can be understood differently in terms of its purpose (McMillan, Citation2019). Indeed, the same information, gathered in the same way, is defined as formative if it will be used to enhance learning and teaching, or summative if it will be used for recording and reporting (Brookhart, Citation2018).

It is known that teachers’ judgemental feedback may influence students’ views of their capability and likelihood of succeeding (e.g. Black & Wiliam, Citation1998; Harlen & Deakin Crick, Citation2003). The impact of summative assessment on students’ motivation for learning can be both direct and indirect. A direct impact for students can be through receiving low scores which can induce test anxiety, decrease in self-esteem and perceptions of themselves as learners (Black et al., Citation2002; Thomas et al., Citation2017). Any negative impact on students’ motivation for learning is undesirable, especially at a time when the importance of learning to learn and lifelong learning is widely embraced (Bransford et al., Citation2000). Tests and provision of grades can thus influence teachers’ classroom assessment since the assessment might be interpreted by students as purely summative regardless of the teachers’ intention. These perceptions can be a result of teachers’ over-concern with performance rather than process (Harlen, Citation2005; McMillan, Citation2019). For summative assessment, common criteria need to be applied, so that achievement can be summarised in terms of levels or grades that have the same meaning for all involved assessors and students (Brookhart & Guskey, Citation2019).

An issue we raise in this paper is how lower secondary school teachers in Norway perceive and use grading scales (Brookhart & Guskey, Citation2019) for formative and summative assessment purposes. Teachers in Norway use a 26-point grading scale (26-PGS) as part of their assessment practice, but less is known about their arguments and perceptions for doing so. Norwegian students experience that fine distinction grades are given as rewards for high effort or punishment for low efforts (Gamlem & Smith, Citation2013). Thus, there might be need for more research that explores issues around teachers’ argumentation for using finer distinctions in a grading scale, and in addition, we will study if there are any gender differences in regard of practise or argumentation.

Grading practices and teacher perceptions

Studies of grading practices and teachers’ perceptions have been conducted since at least the middle of the twentieth century (McMillan, Citation2019). Assessment decision-making is an integral part of teachers’ daily work. Issues related to trustworthiness have for decades been a major area of concern, particularly variability and consistency of teacher judgement (McMillan, Citation2019; Phung & Michell, Citation2022; Prøitz, Citation2013). Judgement is in our study understood as appraisal, a decision concerning the value or quality of a performance or perceived competence which applies regardless of assessment purpose, participants, or method (Phung & Michell, Citation2022). Brookhart and Guskey (Citation2019) address that the reliability of grades is a foundational issue in grading policy and practice, and that former research about the reliability of grading give several practical suggestions for todays’ practice. The three most important relate to criteria, consistency, and categories in grading. Procedures for ensuring more dependable summative assessment will benefit the formative use, and the teacher’s understanding of the learning goals and the nature of progression in achieving them (Harlen, Citation2005). Assessment provided by teachers involves judgement and will therefore be subject to some error and bias (Björnsson & Skar, Citation2021; Brookhart & Guskey, Citation2019; McMillan, Citation2019). Error and bias can be found both in teachers’ assessment for summative and formative purposes (Harlen, Citation2005; McMillan, Citation2019). Newton (Citation2007) has argued that all judgements are, by nature, summative, even those made for formative purposes.

The assessment and grading in schools have long been a subject of profound interest (Brookhart & Guskey, Citation2019; McMillan, Citation2019; Smith & Dubbin, Citation1960). It is noteworthy to mention that the ways in which teachers assess and portray student achievements hold the potential to influence academic trajectories and future opportunities (Brookhart & Guskey, Citation2019; Gamlem et al., Citation2023; McMillan, Citation2019). Grading, according to Brookhart (Citation2013), primarily entails attaching a numerical value or letter to an assignment, providing students with a basic indication of their performance in comparison to a predefined set of criteria. Grades are used to signify the extent of a student’s comprehension and mastery of the subject matter created for summative documents (Brookhart & Guskey, Citation2019). McMillan (Citation2019) explains that teachers use information from assessments and other sources of information to determine and report student grades, whether on papers, unit tests, or semester reports should be referred as grading practices.

A challenge to the process of teacher grading is that assessment criteria are presented as written verbal descriptions of what is expected of students to achieve certain grades. Whilst these criteria can be well defined, there is a scope for variation in interpretation (McMillan, Citation2019; Maxwell, Citation2010), and thus inevitably involving an element of qualitative judgement (Brookhart & Guskey, Citation2019). Teacher grading raises certain issues that have been addressed, such as possible bias (intended or unintended) due to their relationships with the students. Researchers find that reliability and validity of teacher assessments and grading practice varies between contexts and may be related to features, such as school cultures and subjects (Prøitz, Citation2013), moderation procedures (Crisp, Citation2017), and the degree of specification of task and criteria (Harlen, Citation2005). Guskey and Link (Citation2019) found that teachers at all grade levels use cognitive-based evidence of student learning as the primary factor in determining students’ grades, but also that teachers across all grade levels additionally rely on behavioural non-cognitive factors (effort, attendance, class participation) in their grading practices. A study by Gustavsen and Ozsoy (Citation2017) found that teachers often evaluated students’ academic achievements based not only on their knowledge but also on their social skills. Previous research has found that teachers in elementary and secondary schools differ in their perspectives on the purposes of grading (McMillan, Citation2019). Elementary teachers were more likely to view grading as a process of communication with students and their parents, and a possibility to differentiate grading for individual students. Secondary teachers believed that grading served classroom control and management function, emphasising student behaviour and completion of work.vMcMillan (Citation2019) explains that grades are often viewed as a form of feedback to students. Further, teachers seem to believe that idiosyncratic and individualised grading practices are helpful to students, and that the use of academic enablers, such as effort and participation reflect actual achievement (McMillan, Citation2019).

Grading scales in assessment practice

In a grading system, a set of predetermined categories that corresponds in a clear and straightforward way to a specific range of achievements should be conducted (Sadler, Citation2009). There are several grading scales, as, for example, the 0% to 100% scale, the 5-point letter scale (A, B, C, D and F), or the 6-point grading scale (6-PGS) (1, 2, 3, 4, 5, 6). In schools, and education, a variety of grading scales have been used, and in the early 20th century, schools began to shift from reporting percentages to the familiar letter-grade scale (Brookhart & Guskey, Citation2019; Guskey, Citation2015). In Norway, numerical grading scales are used in lower- and upper secondary school in addition to feedback. Throughout the past decades, a 26-PGS has been applied by teachers in lower and upper secondary schools as a classroom assessment practice. For final assessment (exams) a 6-PGSmust be used (from highest to lowest); 6, 5, 4, 3, 2 and 1, where 2 is the lowest passing grade (Regulations to the Education Act, Citation2020). shows how the 6-PGS are developed into a grid of fine distinction grades ending up with a 26-PGS.

Figure 1. A 26-point grading scale developed based on a 6-point grading scale.

Figure 1. A 26-point grading scale developed based on a 6-point grading scale.

As far as we know, there has not been conducted research on teachers’ use of the 26-PGS, but research has found that Norwegian students experience that fine distinction grades are given as rewards for high effort or punishment for low efforts, and that the judgement differs based on teachers’ expectations (Gamlem & Smith, Citation2013).

A challenge for the use of a 26-PGS in Norway is that teachers seem to struggle to use a 6-PGS for exam, and that a 4-point grading scale is found to meet standard for inter-coder (Björnsson & Skar, Citation2021). Brookhart and Guskey (Citation2019, pp. 28–29) claim that, ‘when it comes to teachers’ grading reliability, teachers should use grading scales with few categories that do not require fine distinctions that are likely to be inaccurate’. Brookhart and Guskey state that several factors are identified regarding the grading process and what teachers and schools can do something about. These are to conduct clear criteria, take steps to ensure consistency, and to use scales that do not require finer distinction than the teacher can reliably make.

Assessment and gender differences

Gender differences also seems to be an issue when discussing teachers’ assessment and grading practice, and the provision of grades (Doornkamp et al., Citation2022). Several previous studies conducted on gender differences in teachers’ grading practices (Chávez & Mitchell, Citation2020; Doornkamp et al., Citation2022; Driessen & Van Langen, Citation2013; Gustavsen & Ozsoy, Citation2017; Protivínský & Münich, Citation2018) have generally found evidence of bias in the grading of student work based on the gender of the student. Gender bias is found to be existing at all levels including higher institutions, secondary and primary schools. In a study conducted among university students, female students are often perceived as more hardworking and cooperative, leading to them being assigned more challenging tasks and receiving higher scores (Rezai et al., Citation2022). This bias can result in the unfair evaluation of male students, impacting the overall fairness of assessments. Along similar lines, Terrier (Citation2020) who conducted a study to compare biases among grade 6 students found that teachers tend to grade girls higher, resulting in a significant effect on girls' progress in Mathematics and French. In a more recent study, DiLiberto et al. (Citation2022) found that gender bias occurs among Italian teachers grading practices where discrimination goes in favour of females. The results in this study also pointed out those biases that occurred in both language and mathematics subject.

Doornkamp et al. (Citation2022) find that predominantly disparity in giving grades occur due to teachers’ expectations and stereotypical beliefs. Doornkamp et al. revealed that teachers’ who had more balanced expectations of male and female students did not have a preconceived notion that one gender is more talented than the other, and demonstrated a lower degree of bias in giving grades.

Research questions

An overarching aim in this study is to investigate teachers’perceptions and arguments for using a26-PGSas part of their assessment practice, and if there are any gender differences in assessment practice. We will answer the following research questions (RQ):RQ 1: How does the use and distribution of a 26-PGSwork among lower secondary school teachers across different drafts, and how might it vary between male and female teachers? RQ 2: Is there a statistically significant difference in the mean grades received by female and male students across both the first and final drafts of the assessment? RQ 3: What are lower secondary schoolteachers’ argumentation for using a fine distinct grading scale when assessing students’ work or performance – and are there gender differences?

Methodology

The study design has a mixed methods approach and can be defined as an explanatory sequential design study (Creswell & Plano Clark, Citation2011). In phase 1 quantitative data, as students’ grades on a written assignment were collected. The students were given an assignment in their English as a Second Language class (ESL) where they should draft a story titled A Hero. The students got information that assessment criteria would emphasise spelling, grammar and punctation. In addition, the text should have a title, introduction, main part and ending. The students could choose their hero; thus, the assignment gave some autonomy to the students. They were told to write three drafts on their personal computer during one school day. The students were provided with feedback (from computer, teacher, and peers) on their drafts and discussed the provided feedback in groups before starting on the next draft. The first and final draft were collected by the researchers, and the teachers were to assess individually the students’ texts based on the assessment criteria and a grading scale in the range 1–6. The teachers were told to give one grade for the first draft and one grade for the final draft. This assessment was conducted a couple of weeks after the writing process. An early-stage analysis revealed that many teachers, despite being asked to assess students' text using a grading scale ranging from 1 to 6, used a 26-PGSin their assessment. In phase 2, the aim was therefore to build a deeper understanding of teachers’ arguments of using such a fine distinct grading scale. Thus, in phase 2 more teachers from lower secondary schools were invited to answer questions on a digital questionnaire. The teachers were asked four open-ended questions regarding their use of a 26-PGS when assessing students’ assignments in school (see appendix).

Participants – recruitment

In phase 1 of the study, all lower secondary schools in a municipality in Norway were invited to participate with their 8th grade classes (students 13 years). All the schools gave consent. Thus, we ended up with a total of three schools. The study was approved by NSD (The Norwegian Centre for Research Data), and informed consent was obtained. In each school the two classes with highest number of consents from teachers and students were drawn as participants for this study, giving a sample of six classes. The reason we opted to draw participants from classes with the highest number of consents was that this inherently align with the principles of inclusivity and representation. In addition, conducting a writing assignment in the classes required consent.

In these six classes, there were 134 students, and 94 of these handed in a consent provided by their parents due to students’ age. This gave us a participation rate on 70.1%. All students participated in the writing assignment, but data was only collected from those that had given consent. The sample consisted of 41.8% male students (n = 38), and 58.2% females (n = 53). Mean value of students age at participation point were 13.3 years. The six teachers assessing students’ written assignment consisted of four female teachers and two males.

In phase 2 of the study, an open-ended questionnaire was administered to obtain further insights into teachers’ assessment practices in using a 26-PGS. During this phase, a total of 12 lower secondary schools were contacted. A total of 6 lower secondary schools responded and gave consent. In total, the number of schools reached 8, incorporating 2 schools from the first phase of the study. The number of teachers who contributed to the questionnaire consisted of 2 teachers from the initial phase and 52 teachers from the subsequent phase, culminating in a total of 54 teachers (19 male teachers and 33 female teachers). This group consisted of teachers from grade 8 to grade 10. The schools are named in the result section as school A-H.

Data collection

Data from phase 1 consist of 185 texts, handed in by 94 students from six classes. These students’ texts were divided between the six teachers who assessed both the 1st and final draft with a grade. Since students should hand in two drafts of their writing assignment, only the students that were registered with two texts (1st and final draft) were sampled for further analysis (n = 182 texts; 91 students). In addition, the dataset from phase 2 consists of open-ended answers from 54 teachers (8 schools) regarding their assessment practice with the use of a 26-PGS.

Analysis

IBM SPSS (version 29.0) was used as a tool for the quantitative data in phase 1. The teachers’ grades of students’ texts were analysed using descriptive statistics. In this study, the predefined grading rubric was designed to encompass grades ranging from 1 to 6, in line with formal educational act regulations. However, the grading scale was expanded by the participating teachers to a more granular 26-PGS aligned with their classroom assessment practice (Gamlem & Smith, Citation2013). The reliability of the grading system was quantitatively assessed using Cronbach’s alpha, and the analysis yielded a Cronbach’s alpha of .804. These results suggest a high level of internal consistency. The Cronbach’s alpha coefficient suggests that the grading system, despite its deviation from the original 6-PGS to a more elaborate 26-PGS, demonstrates a commendable degree of reliability. This implies that the scores assigned by teachers across the first and third drafts are consistently aligned, which contributes positively to the overall reliability of the grading process. Further, a t-test (independent-t-test) was conducted to analyse whether there were gender differences regarding the teachers’ provided grades, and if there was a difference in grades received by female and male students.

The answers on the open-ended questions in phase 2 were analysed both quantitatively and qualitatively. In the analysis of the open-ended questions, a comprehensive two-stage process was employed to gain a deeper understanding of the responses provided by the teachers. The initial stage of this analysis involved the examination of raw data, followed by a preliminary coding process. The following steps entail applying a combination of deductive codes deriving from the research questions and inductive codes from the data. The categorisation process involves the dynamic interplay between the data and the researcher’s active interpretation, as discussed by Saldana (2021). The data is reviewed, patterns are identified. During this phase, responses were systematically categorised according to recurring themes and patterns. The teachers were asked to provide their argumentation for using a 26-PGS when assessing students’ work or performance. Recurring instances related to the teachers’ arguments for their use of the grading scale were then sorted based on thematic analysis (Braun & Clarke, Citation2006) into arguments for usage (5 categories: Assessment complexity, Students’ growth and achievement, Evaluation clarity, Student motivation and engagement, Alternative assessment) and perceptions ofaspects that are crucial for usage (5 categories: Precision, Motivation and engagement, Goal attainment, Student effort and participation, Reflects competence).

To fortify the credibility of the coding, the inclusion of three researchers in the qualitative coding process ensures the inter-coder reliability of this research (O’Connor & Joffe, Citation2020). Upon completion of the coding phase, a deliberative coder meeting was organised. During this session, we engaged in a comprehensive analysis of the coding outcomes. Through a process of open discourse, we sought consensus on contentious coding instances. Upon completing the coding process, we sorted the codes according to genders.

In the second stage of our analysis in phase 2, we conducted a quantitative analysis using SPSS. Specifically, we calculated the average mean scores for each of the identified themes, further dissecting the data based on gender differences. This allowed us to explore any variations in the themes among different genders.

Results

Phase 1 – findings

The first research question of our study was: How does the use and distribution of a 26-PGS work among lower secondary school teachers across different drafts, and how might it vary between male and female teachers? Our findings indicate that both male and female teachers in lower secondary schools use a 26-PGSin their assessment practices indicating a widespread practice across genders. However, there is a noticeable difference in how these grades are distributed between first drafts and final drafts, with a higher number of grades from the 26-PGSused in the first draft. This could probably indicate that at the early stages of writing, teachers emphasise identifying areas for improvement in early drafts. We found that these six teachers use grades in the range from 1 to 5 when assessing students’ 1st draft. From this range,42.8% of the grades are finer distinctions from the 26-PGS. When assessing the final draft, the teachers use the entire range (1–6), and 36.2% of the grades are finer distinctions from the 26-PGS (see ).

Figure 2. Distribution of a 26-point grading scale and a 6-point grading scale.

Figure 2. Distribution of a 26-point grading scale and a 6-point grading scale.

Moreover, a finding is that there are differences in assessment practices between male and female teachers, including variations in the grades given for final drafts and the degree of grade changes between drafts (see ). Results from the independent-t-test show male teachers give a slightly higher increase in students grades between 1st and final draft than the female teachers. There is a significant difference (p-value <0.01) in the change of provided grades between male and female teachers when observed for increase in grade. When examining whether the grades stay the same, female teacher has a higher mean of 0.46 compared to the male teachers with 0.35. There is no significant difference between male and female teachers in this category. Finally, female teachers tend to decrease student grades more significantly than male teachers, with a mean difference of 0.20 as opposed to 0.05. This decline is statistically significant with a p-value of 0.03.

Table 1. Teachers’ gender and mean values of Given Grade on students’ written assignments.

In our second research question we asked: Is there a statistically significant difference in the mean grades received by female and male students across both the first and final drafts of the assessment? The independent test shows male students demonstrated an average score of 12.61, while female students with an average score of 14.13 on the first draft. For the final draft, male students had an average score of 13.87, while female students had an average score of 15.04 (See ).

Table 2. Students’ gender and mean values of received grades on written assignments (n = 91).

The results revealed that male students’grades increase more in their assignment from first to final draft than the female students, but it is not significantly different (p < 0.17) among both genders. The data from this study also found there is a significant difference (p > 0.02) between genders in terms of if the grade stays the same from 1st to final draft. Male students get a decline in grades from 1st to final draft compared to the female students, but there is no significant difference between male or female students in this category.

Phase 2– findings

In the second phase of our study, we aimed to contribute some further explanations of teachers’ use of a 26-PGS by addressing the following third research question: What are lower secondary school teachers’ argumentation for using a fine distinct grading scale when assessing students’ work or performance – and are there gender differences?

Results from the questionnaire indicate that the use of a 26-PGSto a high extent is a part of teachers’ assessment practice. From the sample (8 lower secondary schools; 54 teachers) we found that 85% of the teachers affirmed that they and their colleagues use a 26-PGS when assessing students’ work (see ). It is noteworthy to mentioned that only 4 of the teachers answered that use of a fine distinct grading scale is not a part of their assessment practice, and these 4 teachers are representatives from the same school, suggesting a variation in the adoption of assessment method. In addition, one teacher from the sample says he does not use it and is unsure whether it even is a practice at their school.

Table 3. 26-PGS practice.

Individual teachers might independently opt to use the 26-PGS to address exceptional circumstances or to ensure precision. However, we also found that the use of a 26-PGS is developed and used based on an assessment culture in the schools. In the questionnaire teachers were asked to provide their argumentation for using a 26-PGS when assessing students’ work or performanceOur study identifies several arguments describing when the teachers use a 26-PGS (see )

Table 4. Analysis of the arguments of usage of a 26-point grading scale among teachers.

The most common argument found was that these fine distinctions provide evaluation clarity. The teachers noted, through their individual wording, that a 26-PGS provide greater precision which ensure that the grades reflect a more accurate representation of students’ performance. Teachers also mentioned that the fine distinctions as found in a 26-PGS is used to indicate if a student is close to moving up or down in their performance level.

A fine distinct grading scale is often used to provide a more precise evaluation of student work. Other times it may be used to show that a student needs to work to maintain their grade, or to show that a student is approaching the next level. (School B, Teacher 6)

Second, teachers mentioned the use of a 26-PGSas an alternative assessment method. The teacher’s emphasis that students can stay engaged and motivated in the learning process if a fine distinct grading scale is used as formative assessment, reinforcing students to continue working hard. The third argument that is prevalent is student growth and achievements. Teachers say that students will be accountable for their learning when they receive a score which will allow them to make visible their progress in addition to address their mistakes. The teachers claim this can enhance students’ effort to achieve their desired goals. Teachers’responses on the use of a 26-PGSare also related to a fourth argument: assessment complexity. In determining the grade level of the student, the teachers also pointed out that the use of a fine distinct grading scale offered a more comprehensive view of students’performance. Finally, the fifth category of argument is built on the idea that use of a fine distinct grading scale might help to increase student motivation and engagement. Teachers argued that by using a fine distinct grading scale, students can see their progress and receive positive feedback, which can boost their confidence and encourage them to continue improving.

While some teachers argue that the use of a 26-PGS is the solution of grade distribution to reflect students’ real competence, some teachers stressed that it has negative consequences too. One of the primary reasons given by the teachers on why they choose not to use a fine distinct grading scale is to avoid confusion, and that improvements and progress in students’ work could be reflected through written feedback.

A 26-point grading scale is not precise. Students need to understand what they have done well and what can be improved, which I think is best communicated through words and indicators of goal achievement. (School H, Teacher 2)

Some other reasons mentioned include that it was not a customary practice in their school to use a 26-PGS in their assessment practice, or a fine distinct grading scale at all, and that they follow the final exam format which contains a 6-PGS.

When the teachers were asked what aspects s/he perceive as crucial in using a fine distinct grading scale (26-PGS), five aspects were mentioned by the teachers (see ).

Table 5. Teachers perceptions on aspects that are crucial in using a fine distinct grading scale.

The aspect described as most crucial by the teachers was precision. Teachers generally argued that significant variation exists within a single grade on a 6-point scale. Hence, in using a 26-PGS, teachers could more precisely reflect the range of student achievement (e.g. 4+, 4/3). The second aspect that were crucial for the teachers wasmotivation and engagement. Teachers justify the use of a 26-PGS by mentioning that it could enhance the students’ motivation. Teachers also believed that when students are provided the opportunity to discern strength and weaknesses, they will increase their motivation and engagement in the process. Both male and female teachers value both precision and motivation and engagement as crucial aspects, but our analysis indicate that while male teachers found precision the most crucial aspect, the female teachers found motivation and engagement the most crucial.

Goal attainmentwas constructed as the third most prevalent aspect. Male teachers appear to emphasise goal attainment and precision more than female teachers. The teachers see this as an opportunity to communicate with the students on where they stand within a numerical value. It is just not merely perceived a representation of a subject matter but also encouraging students to strive harder. The fourth aspect that the teachers think is crucial for using a fine distinct grading scale is students’ effort and participation. The teachers explain that the indicators for a 26-PGSwere used as a medium of communication to the students. Hence, teachers believe that when a plus (+) or minus (-) sign is used it might boost students’effort. In addition, the teacher tells that when a minus (-) is given this is meant as a feedback message that they need to work harder to achieve a better grade. A grade as, for example, 3/4or 4/3 is also argued as a mean to boost learners’ participation, to indicate that degree of effort and persistence might result in a decrease in final grade (e.g. 3) or an increase (e.g. 4).Teachers also stressed that the use of a 26-PGSreflects competence and that the grades were given based on several criteria. Our analysis reveal that female teachers seem to give more weight to student effort and participation when using the grading scale compared to male teachers, as reflected in the higher mean value 0.21. This indicates that female teachers might be more likely to adjust grades and reward active participation and hard work. The aspect reflects competence was also mentioned as crucial for using a fine distinct grading scale, still this argument have less impact on the use of the 26-PGS for both genders.

Discussion

Teachers’ use of a 26-PGS and gender differences

Based on our data, it is evident that the 26-PGS is extensively used in lower secondary assessment practises across genders. When teachers (n = 54) in the second phase were asked whether they use the 26-PGS in their assessment practice, 85% confirmed they do and additionally 5% answered that they sometimes do. However, male, and female teachers hold different arguments and perceptions of the usage of a fine distinct grading scale in their assessment practice. McMillan (Citation2019, p. 103) explain that teachers’ grading practice ‘are best understood when teachers’ views towards the purpose, meaning, value, and consequences on student learning and motivation are considered’. We find that teachers’ perceptions and arguments about their grading practice provide a rationale and explanation for how grading and variety in grading scales does more than just document learning. A key element for assessment is to promote learning and motivation, and a ‘one-size-fits-all’ might not be seen as the best interest of student learning (McMillan, Citation2019) – and thus teachers might give a variety of reasons for using different aspects and scales in their grading based on their rationale. In our study, we find that male teachers mostly argue for a use of the 26-PGS as an alternative assessment practice, but also for evaluation clarity, and due to assessment complexity. The female teachers argue that they mostly use the 26-PGS for evaluation clarity, students’ growth and achievement, and as alternative assessment. Former research has pointed to the fact that teacher’s subjective criteria is often used in grading, and that variability in grading practice occurs among teachers at the same school, and even within a single classroom (McMillan, Citation2019; Prøitz, Citation2013). Our study seems to align to these former findingsin addition to adding a perspective of gender differences among teachers.

In our study, we find that the female students receive higher grades in their first and last drafts than the males. These findings are in line with another study (Lievore & Triventi, Citation2023), which suggests that there are gender differences in grading with girls receiving higher grades than the boys when performance is assessed by their classroom teacher. In our study, male teachers give a slightly higher increase in grades between first and final draft then their female colleagues, but we cannot argue strongly about this due to the sample size. Overall, both the male and female teachers value motivation and engagement, and precision as important aspects in using a 26-PGS. Research has proven that teachers’ assessment is not solely based on students’ academic abilities but also non-cognitive factors such as social behaviours (DeVries et al., Citation2018; McMillan, Citation2019). Thus, our findings align with these former studies in regard of argumentations teachers provide when they argue for the use of a 26-PGS, where they, for example, value effort, participation, and motivation. Our findings also align with former research suggesting that teachers may subconsciously give a student a higher grade when s/he thinks that the student completed the task given in the classroom or inclined to give lower grades when s/he perceives that the students are inattentive or did not complete the task assigned (Isnawati & Saukah, Citation2017; Mullola et al., Citation2012). We know less about the teachers’ expectations, but this seems to be an implicit dimension of their assessment practice when using the 26-PGS since the teachers finds this grading scale crucial for aspects as motivation and engagement, and effort and participation.

Teachers’ arguments and perceptions of using a 26-point grading scale

This study reveals a common pattern among teachers, they advocate for the use of a fine distinct grading scale (e.g. 26-PGS) to accurately reflect students’ competence or performance. Teachers’ argument is that fine distinct grading scales can offer precision which can distinguish the levels of achievements among students far better than a 6-point grading system which offers limited granularity. In applying a 26-PGS in their formative assessment practices, teachers believed that it caters to the diverse learning needs of the students and encourages students to focus on their growth trajectories. This argument aligns with a study by Olmos (Citation2018) which stressed that teachers used a fine distinct grading scale to provide a clearer picture of students’ progress which enable them to set realistic goals, monitor their own progress and eventually motivation in their own learning. McMillan (Citation2019) states that there is little research that probes the nature of these teacher judgements to better understand how they support learning.

Our study found that teachers seem to assess students’ assignment based on what is expected of students to achieve certain grades. The criteria for assignments might be well defined, but still the teachers choose to additionally rely on behavioural non-cognitive factors in their grading practices and thus add criteria, which thus inevitably involving elements of qualitative judgement (Brookhart & Guskey, Citation2019). We find that there is a scope for variation in interpretation when using a 26-PGS, based on teachers’ arguments for their usages and what they find as crucial aspects for using this grading scale. This concern is also emphasised in former work by Brookhart and Guskey (Citation2019); McMillan (Citation2019), and Maxwell (Citation2010). For summative assessment, common criteria need to be applied, so that achievement can be summarised in terms of levels or grades that have the same meaning for all involved assessors and students (Brookhart & Guskey, Citation2019). While some teachers differ in the use of their criterion grading scale, such as percentages, points, alphabets, or numbers when providing formative assessment, the end goal of the assessment is to communicate students’ achievements (Brookhart & Guskey, Citation2019). By providing the 26-PGS, teachers in our study consider that it will more likely foster a positive environment towards learning. Still, providing clear common criteria and maintaining precision in grades is important (Brookhart & Guskey, Citation2019; Prøitz, Citation2013). We put a reminder on Harlen’s (Citation2005) work stating that good assessment will mean good assessment of learning, not necessarily only for learning. McMillan (Citation2019) state that teachers need to address and discuss what grades means and how, for example, effort is used for grading and motivation. Our study seems to reveal that the use of common criteria, and precision is a challenge in teachers assessment practices – even though teachers claim that a 26-PGS provide precision and evaluation clarity. Brookhart and Guskey (Citation2019) have identified several factors regarding the grading process and what teachers and schools can do something about. These are to conduct clear criteria, take steps to ensure consistency, and to use scales that do not require finer distinction than the teacher can reliably make. Teacher’s use of a 26-PGS in Norwegian lower secondary schools thus seems to have challenges to meet the foundational issue for grading practice, in addition Björnsson and Skar (Citation2021) found that examiner reliability was low even on a 6-PGS.

In our study, we have found that teachers argue for a use of a 26-PGS to bolster students’ motivation and engagement. Still, former research (e.g. Black et al., Citation2002; Thomas et al., Citation2017) has found that the impact of summative assessment on students’ motivation for learning can be both direct and indirect. Our study reveals that teachers seem to use a 26-PGS as a means of communicating their expectations regarding effort and persistence. It was stressed by several of the teachers, that a 26-PGS can motivate students to improve their performance. Still, a few teachers stressed that the use of grading scales has negative consequences too. Out of the eight schools, only teachers from one school state that they and their colleagues do not use fine distinct grading scales (e.g. 26-PGS) as part of their assessment practice, since they believe grades has negative consequences for students’ learning. They state that feedback should be given as text instead of grades. Their arguments align with research stating that feedback as formative assessment is the most appropriate approach for learning (Black & Wiliam, Citation1998; Brookhart, Citation2018). It is also widely known that judgemental feedback may influence students’ views of their capability and likelihood of succeeding (e.g. Black & Wiliam, Citation1998, Harlen & Deakin Crick, Citation2003; Hattie & Timperley, Citation2007), and in addition research has over decades stated that providing grades as classroom assessment have low impact if a mean is to support and enhance students learning (e.g. Butler, Citation1988; Butler & Nisan, Citation1986; Hattie & Timperley, Citation2007). One of the primary reasons given by the teachers who says they are not using a 26-PGS, is to avoid confusion, and that improvements and progress in students’ work should be reflected through written feedback.

Grades and grading scales certainly have limitations; and they are neither inherently good nor bad for students. Grades are simply labels identifying various levels or categories of student performance (Brookhart, Citation2013). Many teachers face complexity in determining which scale that the students fall under, for example, in the use of the 6-PGS (Björnsson & Skar, Citation2021; Brookhart & Guskey, Citation2019). Recognising the perplexity of the assessment, teachers fundamental understanding of what assessment constitutes is essential. The 26-PGSpresent a fine mesh grid of rank-based assessment, thus also inviting students to compare and rank own achievement level to their peers. Teachers with strong assessment literacy skills can make informed decisions on how to use and interpret the grading scale in a fair and transparent manner to mitigate any potential bias. However, teachers must ensure that students understand that grades do not reflect who you are as a student, but where you are in your learning journey. In addition, it should be clearly stated by the teacher that where is always temporary. In Butler’s (Citation1988) terms, grades must be task-involving rather than ego-involving. To ensure consistent grading standards, quality assurance procedures are needed (Brookhart & Guskey, Citation2019; Crisp, Citation2017; Welsh, Citation2019). Brookhart and Guskey (Citation2019) address that the reliability of grades is a foundational issue in grading policy and practice. The three most important practical suggestions to teachers and schools relate to criteria, consistency, and categories in grading – and thus, an assessment practice in Norway with the use of a 26-PGS seems strive against these suggestions.

Limitations and implications

This study has limitations that we would like to address. The small sample size, in addition to sample method must be mentioned. Due to the sparse number of participants, we want to emphasise that the results should be understood and argued based on the design. The sample size also needs to be considered when arguing for the results in our study regarding teacher gender differences ingrading practices.

Considering the implications, we argue for an increased awareness on how teachers’ perceptions and expectations might give directions for grading practices, and that gender differences among teachers also should be considered when studying reliability of assessment practices. Overall, our findings suggest, that teachers grading practice are related to school culture, their expectations, assessment literacy, and individual beliefs about assessment, learning and motivation – which aligns with former research by Brookhart and Guskey (Citation2019),Doornkamp et al. (Citation2022), and McMillan (Citation2019). Our study reveals a need to strengthen teachers’ assessment literacy, particularly their understanding on how their classroom assessment using grading scales is a trouble shoot issue regarding reliability, in addition to how it can affect students’ learning, motivation, and effort.

Supplemental material

Supplemental Material

Download MS Word (12.6 KB)

Acknowledgements

For the selection of data for phase 1 in this project, we thank our interdisciplinary research team from Østfold University College, Volda University College, Hypatia Learning.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Supplementary material

Supplemental data for this article can be accessed online at https://doi.org/10.1080/0969594X.2024.2325365

Additional information

Funding

This is a sub-project of the AI4AfL-project (2022–2025) that is funded by The Research Council of Norway, Grant number 326607.

Notes on contributors

Siv M. Gamlem

Siv M. Gamlem is Professor at the Faculty of Humanities and Teacher Education, institute of Pedagogy, Volda University College, Norway. Her research interests include assessment, feedback, learning processes, professional development, teacher education, and learning and teaching in digital environment and AIEd.

Meerita Segaran

Meerita Segaran is a PhD researcher at Østfold University College, Norway, in the Department of Pedagogy, Learning, and ICT. Her research is centered on artificial intelligence and education (AIEd), technology enhanced learning, assessment, feedback, and professional development.

Synnøve Moltudal

Synnøve Moltudal is Associate professor at the Faculty of Humanities and Teacher Education, institute of Pedagogy, Volda University College, Norway. Her research interests include teaching and learning in digital environments, feedback, assessment, professional development and classroom management.

References

  • Björnsson, J. K., & Skar, G. B. (2021). Sensorreliabilitet på skriftlig eksamen i videregående opplæring [Examiner reliability on written exams in upper secondary education]. University of Oslo. http://urn.nb.no/URN:NBN:no-88263
  • Black, P., Broadfoot, P., Daugherty, R., Gardner, J., Harlen, W., James, M., Stobart, G., & Wiliam, D. (2002). Testing, motivation and learning. University Faculty of Education.
  • Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education: Principle, Policy & Practice, 5(1), 7–74. https://doi.org/10.1080/0969595980050102
  • Black, P., & Wiliam, D. (2018). Classroom assessment and pedagogy. Assessment in Education Principles, Policy & Practice, 25(6), 551–575. https://doi.org/10.1080/0969594X.2018.1441807
  • Bransford, J. D., Brown, A. L., & Cocking, R. R. (2000). How people learn. Brain, mind, experience, and school. National Academy Press.
  • Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77–101. https://doi.org/10.1191/1478088706qp063oa
  • Brookhart, S. M. (2013). Grading. In J. H. McMillan (Ed.), SAGE handbook of research on classroom assessment (pp. 257–272). Sage Publications, Inc.
  • Brookhart, S. M. (2018). Summative and formative feedback. In A. L. Lipnevich & J. K. Smith (Eds.), The Cambridge handbook of instructional feedback (pp. 52–78). Cambridge University Press.
  • Brookhart, S. M., & Guskey, T. R. (2019). Reliability in grading and grading scales. In T. R. Guskey & S. M. Brookhart (Eds.), What we know about grading. What works, what doesn’t and what’s next (pp. 13–31). ASCD.
  • Butler, R. (1988). Enhancing and undermining intrinsic motivation: The effects of task-involving and ego-involving evaluation on interest and performance. British Journal of Educational Psychology, 58(1), 1–14. https://doi.org/10.1111/j.2044-8279.1988.tb00874.x
  • Butler, R., & Nisan, M. (1986). Effects of no feedback, task-related comments, and grades on intrinsic motivation and performance. Journal of Educational Psychology, 78(3), 210–216. https://doi.org/10.1037/0022-0663.78.3.210
  • Chávez, K., & Mitchell, K. M. W. (2020). Exploring bias in student evaluations: Gender, race, and ethnicity. PS: Political Science & Politics, 53(2), 270–274. https://doi.org/10.1017/S1049096519001744
  • Creswell, J. W., & Plano Clark, V. L. (2011). Choosing a mixed methods design. In V. L. I, P. Clark & J. W. Creswell (Eds.), Designing and conducting mixed methods research (pp. 69–94). Sage.
  • Crisp, V. (2017). The judgement processes involved in the moderation of teacher assessed projects. Oxford Review of Education, 43(1), 19–37. https://doi.org/10.1080/03054985.2016.1232245
  • DeVries, J. M., Rathmann, K., & Gebhardt, M. (2018). How does social behavior relate to both grades and achievement scores? Frontiers in Psychology, 9, 857. https://doi.org/10.3389/fpsyg.2018.00857
  • DiLiberto, A., Casula, L., & Pau, S. (2022). Grading practices, gender bias and educational outcomes: Evidence from Italy. Education Economics, 30(5), 481–508. https://doi.org/10.1080/09645292.2021.2004999
  • Doornkamp, L., Van der Pol, L. D., Groeneveld, S., Mesman, J., Endendijk, J. J., & Groeneveld, M. G. (2022). Understanding gender bias in teachers’ grading: The role of gender stereotypical beliefs. Teaching and Teacher Education, 118, 103826. https://doi.org/10.1016/j.tate.2022.103826
  • Driessen, G., & Van Langen, A. (2013). Gender differences in primary and secondary education: Are girls really outperforming boys? International Review of Education, 59(1), 67–86. https://doi.org/10.1007/s11159-013-9352-6
  • Gamlem, S. M., & Smith, K. (2013). Student perceptions of classroom feedback. Assessment in Education Principles, Policy & Practice, 20(2), 150–169. https://doi.org/10.1080/0969594X.2012.749212
  • Gamlem, S. M., Smith, K., & Sandvik, L. V. (2023). Stakeholder opinions about cancelling exams in Norwegian upper secondary school during the pandemic, and its consequences – an illuminative study. Assessment Matter. https://doi.org/10.18296/am.0063
  • Guskey, T. R. (2015). On your mark: Challenging the conventions of grading and reporting. Solution Tree
  • Guskey, T. R., & Link, L. J. (2019). Exploring the factors teachers consider in determining students’ grades, assessment in education: Principles. Assessment in Education Principles, Policy & Practice, 26(3), 303–320. https://doi.org/10.1080/0969594X.2018.1555515
  • Gustavsen, A. M., & Ozsoy, G. (2017). Longitudinal relationship between social skills and academic achievement in a gender perspective. Cogent Education, 4(1), 1411035. https://doi.org/10.1080/2331186X.2017.1411035
  • Harlen, W. (2005). Teachers’ summative practices and assessment for learning – tensions and synergies. The Curriculum Journal, 16(2), 207–223. https://doi.org/10.1080/09585170500136093
  • Harlen, W., & Deakin Crick, R. (2003). Testing and motivation for learning. Assessment in Education, Principle, Policy & Practice, 10(2), 169–207. https://doi.org/10.1080/0969594032000121270
  • Hattie, J., & Timperley, H. (2007). The power of feedback. American Educational Research Association, 77(1), 81–112. https://doi.org/10.3102/003465430298487
  • Isnawati, I., & Saukah, A. (2017). TEACHERS’ GRADING DECISION MAKING. TEFLIN Journal - A Publication on the Teaching and Learning of English, 28, 155. https://doi.org/10.15639/teflinjournal.v28i2/155-169
  • Lievore, I., & Triventi, M. (2023). Do teacher and classroom characteristics affect the way in which girls and boys are graded? British Journal of Sociology of Education, 44(1), 97–122. https://doi.org/10.1080/01425692.2022.2122942
  • Maxwell, G. (2010). Moderation of student work by teachers. In International encyclopedia of education (3rd ed., pp. 457–463). Elsevier.
  • McMillan, J. (2019). Surveys of teachers’ grading practices and perceptions. In T. R. Guskey & S. M. Brookhart (Eds.), What we know about grading. What works, what doesn’t and what’s next (pp. 84–112). ASCD.
  • Mullola, S., Ravaja, N., Lipsanen, J., Alatupa, S., Hintsanen, M., Jokela, M., & Keltikangas-Järvinen, L. (2012). Gender differences in teachers’ perceptions of students’ temperament, educational competence, and teachability. The British Journal of Educational Psychology, 82(2), 185–206. https://doi.org/10.1111/j.2044-8279.2010.02017.x
  • Newton, P. E. (2007). Clarifying the purposes of educational assessment. Assessment in Education Principles, Policy & Practice, 14(2), 149–170. https://doi.org/10.1080/09695940701478321
  • O’Connor, C., & Joffe, H. (2020). Intercoder reliability in qualitative research: Debates and practical guidelines. International Journal of Qualitative Methods, 19, 1609406919899220. https://doi.org/10.1177/1609406919899220
  • Olmos, F. (2018). Impact of Teacher Grading Scales and Grading Practices on Student Performance and Motivation [ Masters Thesis]. California State Polytechnic University.
  • Phung, D. V., & Michell, M. (2022). Inside teacher assessment decision-making: From judgement gestalts to assessment pathways. Frontiers in Education, 7, 830311. https://doi.org/10.3389/feduc.2022.830311
  • Prøitz, T. S. (2013). Variation in grading practice – subjects matter. Education Inquiry, 4(3), 555–572. https://doi.org/10.3402/edui.v4i3.22629
  • Protivínský, T., & Münich, D. (2018). Gender bias in teachers’ grading: What is in the grade. Studies in Educational Evaluation, 59, 141–149. https://doi.org/10.1016/j.stueduc.2018.07.006
  • Regulations to the Education Act. (2020). Forskrift Til opplæringslova. Kapittel 3. Individuell Vurdering I Grunnskolen Og I vidaregåande opplæring.(nr.1474) [Regulations to the Education Act. Chapter 3. Individual Assessment in Primary School and in Upper Secondary Education. Lovdata. https://lovdata.no/dokument/SF/forskrift/2006-06-23-724/KAPITTEL_5#KAPITTEL_5
  • Rezai, A., Namaziandost, E., Miri, M., & Kumar, T. (2022). Demographic biases and assessment fairness in classroom: Insights from Iranian university teachers. Language Testing in Asia, 12(1), 1–20. https://doi.org/10.1186/s40468-022-00157-6
  • Sadler, D. R. (2009). Indeterminacy in the use of preset criteria for assessment and grading. Assessment & Evaluation in Higher Education, 34(2), 159–179. https://doi.org/10.1080/02602930801956059
  • Smith, A. Z., & Dubbin, J. E. (1960). Marks and marking systems. In C. W. Harris (Ed.), Encyclopedia of educational research 3rd ed., (pp. 783–791). Mcmillan.
  • Terrier, C. (2020). Boys lag behind: How teachers’ gender biases affect student achievement. Economics of Education Review, 77, 101981. https://doi.org/10.1016/j.econedurev.2020.101981
  • Thomas, C. L., Cassady, J. C., & Heller, M. L. (2017). The influence of emotional intelligence, cognitive test anxiety, and coping strategies on undergraduate academic performance. Learning and Individual Differences, 55, 40–48. https://doi.org/10.1016/j.lindif.2017.03.001
  • Welsh, M. (2019). Standards-based grading. In T. R. Guskey & S. M. Brookhart (Eds.), What we know about grading. What works, what doesn’t and what’s next (pp. 113–144). ASCD.