925
Views
0
CrossRef citations to date
0
Altmetric
Research Article

How might rubric-based observations better support teacher learning and development?

ORCID Icon & ORCID Icon
Pages 86-101 | Received 19 Apr 2023, Accepted 09 Jan 2024, Published online: 31 Jan 2024

ABSTRACT

Background

Many education systems internationally expect schools to participate in continuous instructional improvement programmes. One tool used within these processes is the structured, rubric-based classroom observation, focused on the evaluation of teaching. Such observations are a common feature of formative evaluation systems, teacher coaching programmes, within-school teacher collaborative structures, and other local, regional or national frameworks. However, a question arises as to how rubric-based observations may better support teacher learning and development.

Purpose and sources

Drawing on existing theoretical arguments and empirical work, this paper seeks to contribute to discussion about rubric-based classroom observation and its relationship with teacher learning and instructional improvement.

Main arguments

Observation rubrics can be regarded as summaries of a community’s understanding of good instruction. When generated in a way that makes this understanding accessible to teachers in the context of their own practice, they have the potential to place the rubric’s, and teachers’, understandings of good teaching ‘in conversation’ with each other. This could provide valuable opportunities for teachers to refine and expand their understandings of good instruction. Embedding rubric-based observations in school structures can, thus, facilitate continuous improvement efforts by better supporting teacher self-reflection, feedback, and collaboration. However, many uses of observation within school contexts tend to prioritise the rubric’s, rather than teachers’, understanding of good teaching. This risks turning observations from tools of learning into tools of judgement, disrupting the pathway through which they might support teacher learning and instructional improvement.

Conclusion

Our discussion draws attention to the potential benefits yielded and the challenges that may occur in using rubric-based observations to support teacher learning. It highlights factors that need consideration in efforts to leverage rubric-based observations to better promote continuous teacher learning, ultimately positively influencing student learning.

Introduction

Internationally, the significance of high-quality teaching and learning as a lever for sustainable development has been emphasised (Independent Group of Scientists appointed by the Secretary-General Citation2019). In order to support the delivery of quality education to students in the classroom, many education systems require schools to participate in continuous instructional improvement programmes. There is a need, therefore, for systems and tools (i.e. infrastructure) that can be used to help strengthen the instructional core of teaching and learning within educational systems (Cohen, Spillane, and Peurach Citation2018; Mehta and Fine Citation2015). One such tool is the structured, rubric-based classroom observation, focused on the evaluation of teaching. However, although rubric-based observations are a common feature of many local, regional or national school improvement frameworks, important questions arise as to how they may support teacher learning and development more comprehensively and meaningfully.

This paper is interested in considering how rubric-based observations could better serve as a tool for instructional improvement within existing school structures, in terms of their untapped potential, and in view of the challenges that occur in using them to support teacher learning. Rubric-based observations involve observationally measuring instruction based on a pre-defined rubric that lays out an explicit understanding of what good instruction looks like. Such observations have recently been embedded into many school structures, including teacher evaluation systems (Close, Amrein-Beardsley, and Collins Citation2020; Steinberg and Donaldson Citation2016), coaching and professional development programmes (e.g. Cohen et al. Citation2016; Kraft and Hill Citation2020), and teacher collaborative structures (e.g. professional learning communities; e.g. Jensen, Valdés, and Gallimore Citation2021). These structures utilise rubric-based observations to try to improve the quality and consistency of teacher learning and instructional improvement. The discussion in this paper explores the uses of rubric-based observations to support systematic school improvement and considers how such observations might be best leveraged to better promote continuous teacher learning, ultimately positively influencing student learning.

Background

Definitions

Notions of good teaching, effective teaching, and teaching quality are variously defined and widely debated. Full discussion of the range of definitions and application of evaluative terminology is beyond the scope of the current paper: our purpose here is to explain how key ideas are understood in the context of our study. We distinguish between good teaching and effective teaching, in alignment with others (see both Bell and Kane Citation2022; Fenstermacher and Richardson Citation2005). Accordingly, we view teaching, in general, as the set of interactions between students, teachers, and content that supports students’ learning and development. Good teaching is understood as teaching conforming with any group’s (i.e. a community of practice’s) normative view of what teaching should be. It follows, then, that different groups (e.g. teacher communities, policymakers, research groups) may have different understandings of good teaching. Additionally, we assume that good teaching is often implicitly understood rather than explicitly articulated. Effective teaching is regarded as teaching that successfully supports student learning and development. Therefore, effective teaching emphasises the empirical impact of teaching, while good teaching emphasises the normativity of teaching, though the two are ideally aligned.

This paper focuses its attention on the notion of good teaching (as opposed to effective teaching or teaching quality) in acknowledgement of the multiplicity of understandings of how one should teach, and because the most effective practice for any tangible situation is likely to be goal dependent and may be empirically unknowable. A teacher’s understanding of good teaching guides the teacher’s instructional choices and decision-making. Teacher learning and instructional improvement, then, can be understood through the lens of expanding, refining and developing teachers’ understandings of good teaching. However, within the context of their daily working lives, teachers may not typically be exposed to alternative understandings of good teaching that would challenge them to expand and/or refine their understandings of good teaching. A key point we make here is that exposure to explicit understandings of good teaching, whether or not those understandings align with teachers’ perspectives, can support teachers in developing and refining their understandings of good instruction.

The use of rubric-based observation in schools

The subsections below discuss three school-based structures that have used rubric-based observation to support instructional improvement: (1) school and teacher evaluation/inspection systems; (2) teacher coaching; and (3) teacher-peer collaboration. The choice of these particular structures is based on the authors’ own international experience of how school systems may use rubric-based observation tools. The literature we discuss has been intentionally selected to give a broad overview of how rubric-based observations have been built into some kinds of existing school structures: it does not set out to be a comprehensive review. For example, some school-based practices (e.g. action research, data-driven decision making) are not discussed here, although these specific practices could, and may, be built into teacher coaching and collaboration structures, along with the use of rubric-based observations.

School and teacher evaluation/inspection systems

School inspection and teacher evaluation play an integral part in education structures across many countries (Close, Amrein-Beardsley, and Collins Citation2020; Steinberg and Donaldson Citation2016; Taut and Rakoczy Citation2016; van de Grift Citation2007). Rubric-based observations have increasingly become a core part of these broader evaluation/inspection systems, as country inspectorates work to develop their own rubrics (van de Grift Citation2007) or adopt existing, research-based rubrics (Steinberg and Donaldson Citation2016). While the specifics of school and teacher evaluation systems vary widely across countries, the use of rubric-based observation can enable these systems to have a greater influence on teacher instruction. The underpinning idea is that formal adoption of a rubric sets forth a clear standard for teaching practice, which can promote fairness by making the standard public, and consistency by ensuring all evaluators use the same standard.

Beyond this formal role, rubric-based observations can support evaluation systems’ goals of improving the quality of teaching. For example, the use of rubrics in observations seems to help direct the observers’ attention to tangible features of teaching, increasing the objectivity, specificity, and usefulness of feedback, making feedback more aligned to what teachers say they need (Kraft and Gilmour Citation2017; Liu et al. Citation2019; Sartain et al. Citation2011). Rubric-based observations can, too, help to keep the focus on teaching practice, in line with teacher preferences (Visone, Mongillo, and Liu Citation2022). At the same time, any real influence stemming from rubric-based observations depends heavily on the way the system is adopted and implemented, including how leaders understand the use of observations and the rubric within the broader system (Halverson, Kelley, and Kimball Citation2004; Rigby Citation2015).

Teacher coaching

Coaching is regarded as an empirically promising strategy for supporting teacher instructional improvement (Kraft, Blazar, and Hogan Citation2018). Furthermore, coaching appears to be a familiar practice in many schools: in an international study, on average, nearly 45% of lower secondary teachers across OECD countries reported experiencing, as part of a formal arrangement, self- and/or peer-observation and coaching in the year prior to being surveyed (OECD Citation2019). While the nature and scope of coaching can vary widely, the use of rubric-based observations in coaching is becoming increasingly common (e.g. Cohen et al. Citation2016; Hu and van Veen Citation2020; Jamil and Hamre Citation2018; Kraft and Hill Citation2020). This generally involves scoring an observed lesson using an observation rubric and providing teachers with feedback and guidance using scores from the observation, often with some teacher self-reflection on their own practice. The rubric serves as a valuable tool to structure coaching interactions and focus attention on specific, observable features of teachers’ instructional practice, while supporting teacher self-reflection and accurate self-assessment (Boston and Candela Citation2018; Kraft and Hill Citation2020). It is evident that coaching programmes can face substantial challenges in terms of time and scalability (Kraft, Blazar, and Hogan Citation2018), but some programmes are working to leverage virtual resources to meet these challenges and ensure the affordability of coaching for all teachers (Gregory et al. Citation2014).

Teacher-peer collaboration

There is growing interest in developing formal structures to support teacher peer collaboration (e.g. professional learning communities (Coburn and Russell Citation2022); quality teaching rounds (Gore et al. Citation2017); and teacher teams (Jensen, Valdés, and Gallimore Citation2021)). Broadly, such structures generally use teacher planning time to create opportunities for teachers to interact, with the goal of improving the quality of teaching (Coburn and Russell Citation2022). Research has highlighted several features of these structures that appear to support their effectiveness: time and support to meet regularly, opportunities to study teaching and learning (e.g. through observations of each other’s practice, examining student work), and routines to focus interactions on teaching (Bryk et al. Citation2010; Coburn and Russell Citation2022; Gallimore et al. Citation2009; Horn and Little Citation2010). It is noteworthy, though, that institutional pressures (e.g. teacher evaluation), often lead teachers to feel that collaboration structures are forced and disconnected from daily practice (Visone, Mongillo, and Liu Citation2022). Even when concentrated on teaching practice, teachers’ perceptions of teaching are often guided by and may be restricted by their current understandings of what constitutes good teaching (Gore et al. Citation2017; Horn and Little Citation2010). There is growing interest in adopting rubric-based observations within these structures to provide a shared frame of reference for teachers when discussing teaching (Gore et al. Citation2017; Jensen, Valdés, and Gallimore Citation2021; van Es and Sherin Citation2008).

Discussion

Underlying our discussion is the belief that teachers can benefit from having opportunities to engage with additional understandings of good teaching within the context of their own instructional practice. This benefit results from teachers being enabled to clarify, refine, and expand their own understanding of good teaching in relation to an alternative view. An overview of the processes commonly involved in rubric-based classroom observation is presented in . In the subsections below, we discuss (1) how observation rubrics codify an understanding of good teaching and (2) how observation rubrics can be designed to support processes important for teacher learning. We argue that this makes rubric-based observations a potentially useful tool to help teachers develop their understandings of good instruction. This leads us to examine (3) how rubric-based observations might be embedded in existing school practices. We argue that, crucially, observations need to be integrated into larger school systems. Further, we identify and reflect on some of the challenges and pitfalls involved in leveraging rubric-based observation systems.

Figure 1. An overview of processes commonly involved in rubric-based classroom observation.

Notes: Observation scores are a description of quality mapped to the community’s vision of teaching; pieces of evidence are chunks of observable features of lessons that can be interpreted in light of one or more rubric-specified dimensions; evaluations of evidence are rubric-based interpretations of pieces of evidence; decomposition of teaching results from the rubric establishing discrete dimensions that receive scores and for which evidence is identified and interpreted. These decompositions support communication on practice by focusing on discrete dimensions of practice, concrete evidence for quality, and explicit evaluations of evidence.
Figure 1. An overview of processes commonly involved in rubric-based classroom observation.

How observation rubrics codify an understanding of good teaching

Observation rubrics are explicit summarisations of the rubric developers’ understanding of good instruction (Bell et al. Citation2019). They are designed such that one can score an observed lesson to obtain a set of evidence-based codes that serve to characterise the scored instruction, based on the understanding of good teaching contained in the rubric (Klette and Blikstad-Balas Citation2018). This results in observation rubrics containing a degree of explicitness that enables an understanding of good teaching to be linked directly to observable features of instruction. To do this, the rubric lays out a set of analytically distinct dimensions with levels of quality explicitly tied to observable features of instruction (Klette and Blikstad-Balas Citation2018). These dimensions can be helpful for teachers, in that they show the aspects of teaching that are important for the rubric-embedded understanding of good teaching. Dimensions are further broken down into performance levels that describe varying levels of quality in each dimension, using observable features of teaching to characterise each dimension. Thus, the rubric clarifies what teaching looks like at different levels of quality. In scoring the rubric, the observer identifies evidence relevant to each rubric dimension, evaluates that evidence, and determines the appropriate score for each dimension, which together creates a decomposition of practice based on the understanding of good teaching embedded in the rubric (as understood by the observer; see right part of ). This decomposition facilitates communication about the rubric’s view of good teaching.

In essence, the rubric provides teachers access to alternative understandings of good teaching (i.e. it serves as a boundary object; Smith Citation2014) and allows teachers to apply that understanding to their own teaching practice. That is, when used to score teachers’ own instruction, the decomposition of practice resulting from applying the rubric links teachers’ own instructional practice to an external understanding of good teaching. This can provide teachers, within the context of their own practice, with access to perspectives which they might otherwise not encounter. Where teachers’ understandings are already in alignment with the rubric, the decomposition of practice resulting from applying the rubric can support teachers in refining their perspective and calibrating their view of their own practice. Equally, where the rubric introduces new concepts or ideas, teachers may be exposed to alternative views that can expand their understanding of good teaching. Where teachers disagree with the rubric, the contrast between the teacher’s own and the rubric’s understandings of good instruction can serve as a tool to help teachers interrogate, refine, and develop their own understandings. In all cases, there is the potential for changes to occur, and for teachers to refine their conceptualisations of good instruction. Since teachers’ understanding of good instruction forms the basis for their pedagogical decision-making and choices, this has the potential to lead to meaningful changes in teaching practice.

It is noteworthy, in this context, to bear in mind that studies on research utilisation suggest that teachers, and other practitioners, sometimes encounter barriers to accessing research, which means it is difficult to make use of research-based knowledge (Nelson, Leffler, and Hansen Citation2009). Given that many observation rubrics are developed from research-based knowledge, another potential benefit linked to rubric-based observations is that they can support the use of research-based knowledge by making research-based understandings of good teaching directly accessible to teachers within the context of their own practice.

How observation rubrics can support processes important for teacher learning

There are three basic processes that observation rubrics can contribute to: self-reflection, feedback, and joint inquiry. Self-reflection is an important feature of teacher education and professional teaching practice (Nelson and Sadler Citation2013). Rubric-based observations can facilitate self-reflection (Danielson Citation2007) through the rubric dimensions that highlight specific, tangible aspects of teaching to serve as targets for self-reflection and improvement efforts (Grossman and McDonald Citation2008). Further, the rubric’s performance levels and focus on observable evidence can support teachers in self-calibrating their own understanding of their strengths and weaknesses to an external standard (Kraft and Hill Citation2020). In fact, a major goal of incorporating rubric-based observations into coaching and professional development efforts and teacher evaluation efforts has been to provide a tool to support teacher self-reflection (e.g. Jamil and Hamre Citation2018).

Rubric-based observations can, too, support direct feedback to teachers about their teaching. This happens not only through the scoring of observed teaching into a set of codes that represent the rubric-embedded understanding of good teaching, but also through the conversations that teachers have with observers about the decompositions of practice that result from observations (i.e. scores, observable evidence, and evaluations of evidence; see ). The language contained in the rubric has the potential to help facilitate these conversations about teaching (Danielson Citation2007; Gore et al. Citation2017), which is reinforced by training videos that show exemplar cases of teaching in a dimension at different levels of quality (Archer et al. Citation2016). Teachers report desiring more frequent, specific, and evidence-based feedback (Liu et al. Citation2019). Rubric-based observations can assist in making feedback more specific (by focusing on specific dimensions) and evidence-based (since performance levels are rooted in observable evidence), thereby supporting teachers’ needs (Kraft and Gilmour Citation2017; Sartain et al. Citation2011).

Along with strengthening self-reflection and feedback, rubric-based observations can contribute to joint inquiry by groups of teachers (e.g. action research) through providing a focus for such inquiry (e.g. rubric-based dimensions; Neumerski et al. Citation2018), a common language to facilitate discussions of teaching (Danielson Citation2007; Gore et al. Citation2017; Grossman and McDonald Citation2008), and decompositions of practice that can be compared across observers (see ). This common language may be further enriched by video exemplars in observation rubric training, providing shared, tangible referents to support common interpretations of language (Klette and Blikstad-Balas Citation2018; Stecher et al. Citation2012).

Crucially, each of these three processes (i.e. self-reflection, feedback, and joint inquiry) require rubric-based observations to be utilised in a way that moves away from the problematic idea of something being ‘done’ to teachers, so that teachers are positioned, instead, as active contributors. It must be borne in mind that aspects of the broader institutional and school context often complicate efforts to promote self-reflection, feedback and joint inquiry (e.g. Liu et al. Citation2019; Visone, Mongillo, and Liu Citation2022). Self-reflection, feedback, and joint inquiry all require significant amounts of teacher time and energy to be carried out in a meaningful and productive way. Therefore, it is imperative that the educational context, including school policy and leadership, is actively supportive to allow this time to be focused on these important processes.

How rubric-based observations might be embedded in existing school practices

Up to this point, we have discussed how rubric-based observations are special codifications of a conceptualisation of good teaching. They are structured to facilitate applying this understanding to teachers’ instructional practice, providing teachers with near-to-practice access to the rubric’s understanding of good teaching. This can support teachers in refining and expanding their own understandings of what good instruction looks like. Such development of teachers’ understanding of good instruction, in turn, can support teachers’ instructional choices and decision-making. However, to take advantage of this possibility, rubric-based observations must provide teachers with the time and space necessary to engage with the understanding of good teaching which is embedded in an observation rubric. Embedding routines for rubric-based observation within existing school practices is one way that institutions may ensure that teachers receive the time and support necessary to engage in real learning and instructional improvement (Feldman and Pentland Citation2003). This embedding can help make sure that school structures are focused clearly and explicitly on the goal of improving instructional practice to better support student learning, thereby preventing drift that can pull teachers’ attention towards other matters (e.g. Visone, Mongillo, and Liu Citation2022).

As noted above, as part of our analysis we reviewed several approaches to embedding rubric-based observations. These included teacher and school evaluations, teacher coaching, and teacher collaboration structures. Many school systems already have structures that aim to provide teachers with time and provision, on an ongoing basis, for teacher self-reflection, feedback, and/or collaboration. It is evident that there is considerable variety in the nature and organisation of these structures. For example, some schools contain professional learning communities (PLCs), or groups of teachers who have shared planning time to collaborate on solving problems related to practice (Coburn and Russell Citation2022). PLCs can be structured in many ways (e.g. by grade level or subject), but often include specific routines to guide interactions (e.g. ‘Plan, Do, Analyze, Revise’ – Jensen, Valdés, and Gallimore Citation2021, 7) and an emphasis on some aspect or consequence of practice (e.g. data from tests, or student work). Teacher agency and commitment is vital for the success of PLCs (Coburn and Russell Citation2022). To embed rubric-based observations, teachers would need to be offered training in the observation rubric; for example, meetings could be organised around watching videos of teachers’ own practice, using the rubric as a common language and shared framework that provides an alternative understanding of good teaching for teachers to contemplate. Thus, via PLCs, teachers could focus on rich representations of their own practice that provide meaningful opportunities for professional discussions and exploration.

The diversity of school organisations and institutional contexts means that any attempt to embed the usage of rubric-based observations into existing or new school structures would have to involve adapting plans carefully to align with the broader context around schools. The discussion in this paper is necessarily limited, in scope, to the main points that should be considered in adapting the use of rubric-based observations to the local context. In light of this, we feel it is useful to highlight several aspects that appear to be central to effective collaborative structures in schools: teacher commitment and agency; time and support to meet with regularity; opportunities to study teaching and learning (e.g. through observations of each other’s practice, examining student work), and routines to focus interactions on building understandings of teaching (Bryk et al. Citation2010; Coburn and Russell Citation2022; Gallimore et al. Citation2009). Rubric-based observations can help to create a close focus on teaching and learning and provide a cognitive scaffold to guide and direct interactions in teaching. Additionally, they may serve as an accessible link to new ideas and understandings of good teaching. This can assist in making sure that discussions of instruction do not remain bounded by teachers’ current understandings and perspectives (Horn and Little Citation2010). Other challenges, such as ways of creating time and space, or the specific nature of teacher collaboration, need to be worked through so that they can be accommodated within the existing school and institutional context. With conditions conducive to learning, embedding rubric-based observations within collaboration structures has the potential to support teachers in developing and refining their understanding of good instruction, which can ultimately lead to practice improvement which benefits student learning.

Challenges and pitfalls when using rubric-based observations

Integrating rubric-based observations into existing school structures is not necessarily a straightforward process. In addition to the continual challenge of securing time for development activities, appropriate ways need to be found to support teachers to strengthen their understanding of good teaching embedded in the rubric, as providing access to alternative understandings of good teaching is a key goal of rubric-based observations. If teachers were to interpret idiosyncratic or divergent meanings of the rubric-embedded understanding of good teaching, it is likely that their current conceptualisation of good teaching will simply be reinforced. This means that the opportunities to refine understanding and benefit from acquiring a shared language and lens for discussing teaching is less likely to develop (since each teacher would develop their own, isolated view of the meaning of a rubric’s terms and dimensions).

As studies in this field of inquiry make clear, some concerns have been raised about pitfalls that can become apparent in using rubric-based observations to provide teacher feedback. These issues may stem from the multiple goals and priorities that practitioners are tasked with continuously balancing. It is interesting to note that some principals report artificially adjusting scores they assign from observations, based on how they think teachers will respond to feedback. The reasons reported for doing this include maintaining positive relationships with teachers, promoting commitment to the observations, increasing teachers’ receptivity to feedback and/or supporting a positive working climate (Bell et al. Citation2015; Donaldson and Woulfin Citation2018; Halverson, Kelley, and Kimball Citation2004; Kraft and Gilmour Citation2016; Shaked Citation2018). In other words, principals worry that if teachers receive low scores from their observations, they may feel alienated from the entire observation process and be less likely to engage with the feedback which is provided along with the scores (Halverson, Kelley, and Kimball Citation2004; Kraft and Gilmour Citation2017). This seems to reflect a misperception that scores from observation systems reflect a direct assessment of the quality of instruction, as discussed further below. Principals may, therefore, provide inflated scores to motivate teachers to engage with the feedback (i.e. to avoid teachers feeling negatively evaluated). However, the practice of intentional score inflation undermines the potential usefulness of rubric-based observation, as distorting scores in this way risks obscuring the connection between the rubric-embedded understanding of good teaching and enacted instruction. This would make it difficult for teachers to learn from the understanding of good teaching embedded in the rubric. It further devalues teachers’ own understanding of good teaching by privileging the rubric’s perspective over teachers’ perspectives. In addition, it hinders the development of a common, shared understanding of the rubric that could facilitate instruction-focused discussion and collaboration among teachers. This cycle of events is not, though, inevitable: some schools are able to develop improvement cultures that support the use of observation scores for formative goals, even when scores are low (e.g. Marsh et al. Citation2017; Reinhorn, Johnson, and Simon Citation2017; see also Myung and Martinez Citation2013).

Research has identified some issues that arise particularly when rubric-based observations are utilised in high-stakes ways (e.g. to make employment decisions in teacher evaluation or school inspection systems). Although we focus here on research from teacher-evaluation systems in the USA, we feel that these points apply more broadly and will resonate elsewhere, given that the fundamental problem of balancing and prioritising a wide variety of school goals and consequences is a familiar challenge across systems. When scores from rubric-based observations are high stakes for teachers, the scores given are heavily influenced by concerns related to these summative outcomes (Bell et al. Citation2015; Kraft and Gilmour Citation2017; Shaked Citation2018). These concerns range from avoiding triggering time-intensive consequences to protecting teachers with potential for growth. In other words, scores may be inflated or deflated based on the principal’s goals for the given teacher, such as coaching a teacher towards improvement (Drake et al. Citation2016; Halverson, Kelley, and Kimball Citation2004; Kimball and Milanowski Citation2009). It is recognised that there may be some benefits to these practices: for instance, they can lead to scores that better capture teachers’ overall contributions to schools, more stable teacher scores (Bell et al. Citation2015; Ho and Kane Citation2013), greater commitment (Bell et al. Citation2015; Mintrop et al. Citation2018), or higher job satisfaction (Koedel et al. Citation2017). However, it is also the case that this practice risks distorting the connections between the rubric-embedded understanding of good teaching and enacted practice, undermining the potential for teachers to engage in learning in terms of the alternative understandings of good teaching contained in the rubric.

The issues discussed above may evolve from the many challenges evident, including schools’ need to balance goals and priorities. Misconceptions about the nature and significance of rubric-based observations are relevant here, as well. It is sometimes not well understood that rubrics are built to characterise typical teaching according to the rubric’s understanding of good teaching, and not the teaching that a specific teacher should provide on a specific day. Further, validation of rubrics focuses on validating estimates of classroom-level average teaching (e.g. Kane et al. Citation2012). It follows that it is an ecological fallacy (i.e. the assumption that features of the average/typical case apply to each individual case (Schwartz Citation1994)) to use rubrics (that characterise a normative view of the average case) to make judgements about the quality of a specific lesson (i.e. the individual case). Such a conclusion would require the highly questionable assumption that every instance of teaching should be the same, at least in light of rubric-defined dimensions. The implication here is that rubric-based scores are not intended to be used to evaluate the quality of enacted instruction for a given lesson, even if one takes the rubric-embedded understanding of good instruction as the criterion for quality. This aligns with the notion of good teaching as a contestable construct with multiple, only partially overlapping, understandings of good teaching across different groups. Scores from observation systems must, then, be accepted for what they are: a mapping of enacted instruction to a specific understanding of good instruction. It is only through interpreting scores within the context of the enacted instruction that a judgement of quality might be made, and it should be remembered that such judgements are inevitably contestable, depending on one’s understanding of good teaching.

We suggest that a shift is needed in the understanding and use of rubric-based observations. The movement should be away from rubric-based observations being regarded as judgements of the quality of enacted instruction and towards the recognition of them as a cognitive tool to support the consideration of alternative understandings of good teaching. If observation systems are viewed, institutionally and individually, as judgements of teaching quality, there will remain a perceived need to adjust scores based on their impact. A shift would lead away from using rubric-based observations in ways that involve high-stakes consequences for teachers, as such uses are inherently judgemental. It highlights a need for professional understandings of the use and benefits of rubric-based observation scores to evolve.

Limitations and further research

This paper has focused on how rubric-based observations might best support teacher learning. As such, it was not seeking to develop new theory but, rather, situate a discussion about the uses, value, and challenges associated with rubric-based observations. There are, inevitably, limitations of scope and approach, and areas that warrant further in-depth investigation. First, it has not been possible to examine, in depth, the implications that follow when rubric-based observations are based on problematic understandings of good teaching. In such situations, the features of observation systems supporting joint inquiry (e.g. common language, identifying observable behaviours) could still support structured interactions and conversations about what constitutes good practice, focusing on observable features of teaching. Related to this, it is noteworthy that a substantial minority of teachers may believe that the rubric currently used by their school promotes ineffective instruction (Stecher et al. Citation2012). Though outside the scope of the current paper, is an important issue to acknowledge and explore, since it links to reasons why professionals may not feel receptive towards the adoption a rubric’s understanding of good teaching.

In addition, it is important to bear in mind that research using rubric-based observations has identified rater error as a continual problem (Bell et al. Citation2018; White and Ronfeldt Citation2023). This points to a need for more research to explore how this influences practice-based uses of rubric-based observations. It could be the case, for example, that rater error might hamper the potential for teachers to learn from the rubric’s understanding of good teaching, while creating barriers to joint inquiry, as rater error would indicate a lack of intersubjectivity that could facilitate smooth conversations (Jensen and White Citation2023). The difficulty of training raters to agree on rubric scores implies that teachers may need extensive support and plentiful opportunities for practice in applying the rubric in order to adopt the rubric’s understanding of good teaching. This could be facilitated through school structures that provide time and space for teacher interactions (see, e.g. Jensen, Valdés, and Gallimore Citation2021). This would, ideally, be supported by rubric experts who can support teachers’ deep understanding of the rubric’s conceptualisation of good teaching. It is clear that supporting teacher learning and instructional improvement requires both support systems and significant teacher efforts (Mehta and Fine Citation2015).

It has been noted, too, that the theoretical conceptualisations of observation rubrics are often underdeveloped (Praetorius and Charalambous Citation2018). This includes important contextual information that identifies why, when, and how different dimensions support student learning (White et al. Citation2022). Such information could be very useful for educators, especially when they test out practices described in the rubric. Including this more detailed understanding of good teaching would be helpful in encouraging teachers to develop and refine their understandings of good teaching, preventing rubrics from becoming a checklist of instructional practices for teachers to adopt (Cohen and Goldhaber Citation2016).

Conclusions

This paper offers a contribution to the discussion about rubric-based classroom observation and its relationship with teacher learning and instructional improvement. We have focused on the question of how rubric-based observations may better support teacher learning and development. Exploration of this area seems particularly pertinent given that teacher evaluation systems, school inspection systems, teacher coaching programmes and teacher collaboration structures are increasingly leveraging rubric-based observations to promote instructional improvement.

Our discussion draws attention to both the potential benefits and the challenges that are evident in the use of rubric-based observations to support teacher learning. In terms of benefits, these observations can allow teachers to engage with the rubric’s understanding of good teaching within the context of their own teaching practice. This, in turn, can help teachers to refine and expand their understanding of good teaching, which can lead to improved instructional choices and decision-making. However, to yield value from rubric-based observations, teachers need access to resources, including time and support, to engage with the conceptualisation of good teaching embedded in the rubric. This could be facilitated by the integration of rubric-based observations into existing school structures, such as collaborative structures, teacher coaching, or teacher evaluation systems. Our paper has also explored the issues that may surface when rubrics are utilised to serve as tools of judgement that arbitrate good teaching. This can lead to counter-productive practices that create disconnects between enacted practice and observation scores. It points to the need for using rubrics as tools to introduce teachers to alternative understandings of good teaching that can help refine and expand their own understandings, rather than using rubrics as tools of judgement.

It is clear that many education systems internationally require schools to participate in continuous instructional improvement programmes, in efforts to strengthen teaching and learning quality, and ultimately improve student outcomes. Our discussion suggests that for the rubric-based classroom observation to represent a truly useful tool in this endeavour, it is critically important to consider how rubric-based observations may support teacher learning and development more comprehensively and meaningfully.

Acknowledgements

The authors would like to acknowledge the SISCO research group at the University of Oslo for important feedback on an earlier draft of the article.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

Work on this paper was supported by grants from the Spencer Foundation and the W.T. Grant foundation for a project entitled ‘Under Construction: The Rise, Spread, and Consequences of the Common Core State Standards Initiative in American Education’. All views expressed in this paper are those of the authors.

References

  • Archer, J., S. Cantrell, S. L. Holtzman, J. N. Joe, C. M. Tocci, and J. Wood. 2016. Better Feedback for Better Teaching: A Practical Guide to Improving Classroom Observations. San Francisco, CA: Jossey-Bass.
  • Bell, C. A., M. J. Dobbelaer, K. Klette, and A. Visscher. 2019. “Qualities of Classroom Observation Systems.” School Effectiveness and School Improvement 30 (1): 3–29. https://doi.org/10.1080/09243453.2018.1539014.
  • Bell, C. A., N. Jones, J. Lewis, Y. Qi, D. Kirui, L. Stickler, and S. Liu. 2015. “Understanding Consequential Assessment Systems of Teaching: Year 2 Final Report to Los Angeles Unified School District.” ETS. http://www.ets.org/Media/Research/pdf/RM-15-12.pdf.
  • Bell, C. A., N. D. Jones, Y. Qi, and J. M. Lewis. 2018. “Strategies for Assessing Classroom Teaching: Examining Administrator Thinking as Validity Evidence.” Educational Assessment 23 (4): 229–249. https://doi.org/10.1080/10627197.2018.1513788.
  • Bell, C., and M. Kane. 2022. “Formative and Summative Teacher Evaluation in Social Context.” In Teacher Evaluation Around the World: Experiences, Dilemmas and Future Challenges, edited by J. Manzi, Y. Sun, and M. R. García, 9–38. Cham: Springer International Publishing. Teacher Education, Learning Innovation and Accountability. https://doi.org/10.1007/978-3-031-13639-9_2.
  • Boston, M. D., and A. G. Candela. 2018. “The Instructional Quality Assessment as a Tool for Reflecting on Instructional Practice.” ZDM 50 (3): 427–444. https://doi.org/10.1007/s11858-018-0916-6.
  • Bryk, A. S., P. B. Sebring, E. Allensworth, S. Luppescu, and J. Q. Easton. 2010. Organizing Schools for Improvement: Lessons from Chicago. Chicago; London: University Of Chicago Press.
  • Close, K., A. Amrein-Beardsley, and C. Collins. 2020. “Putting Teacher Evaluation Systems on the Map: An Overview of State’s Teacher Evaluation Systems Post-Every Student Succeeds Act.” Education Policy Analysis Archives 28:58. https://doi.org/10.14507/epaa.28.5252.
  • Coburn, C., and J. Russell. 2022. Getting the Most Out of Professional Learning Communities and Coaching: Promoting Interactions That Support Instructional Improvement. Pittsburgh, PA: University of Pittsburgh, Learning Policy Center. https://www.sesp.northwestern.edu/docs/publications/7373301757c9ac9dd89f0.pdf.
  • Cohen, J. J., and D. Goldhaber. 2016. “Building a More Complete Understanding of Teacher Evaluation Using Classroom Observations.” Educational Researcher 45 (6): 378–387. https://doi.org/10.3102/0013189X16659442.
  • Cohen, J., L. Schuldt, L. Brown, and P. Grossman. 2016. “Leveraging Observation Tools for Instructional Improvement: Exploring Variability in Uptake of Ambitious Instructional Practices.” Teachers College Record 118 (11): 1–36. https://doi.org/10.1177/016146811611801105.
  • Cohen, D. K., J. P. Spillane, and D. J. Peurach. 2018. “The Dilemmas of Educational Reform.” Educational Researcher 47 (3): 204–212. https://doi.org/10.3102/0013189X17743488.
  • Danielson, C. 2007. Enhancing Professional Practice: A Framework for Teaching. 2nd Edition. 2nd ed. Alexandria, VA: Association for Supervision & Curriculum Development.
  • Donaldson, M. L., and S. L. Woulfin. 2018. “From Tinkering to Going “Rogue”: How Principals Use Agency When Enacting New Teacher Evaluation Systems.” Educational Evaluation and Policy Analysis 40 (4): 531–556. https://doi.org/10.3102/0162373718784205.
  • Drake, T. A., E. Goldring, J. A. Grissom, M. Cannata, C. Neumerski, M. Rubin, and P. Schuermann. 2016. “Development or Dismissal? Exploring Principals’ Use of Teacher Effectiveness Data.” In Improving Teacher Evaluation Systems: Making the Most of Multiple Measures, edited byJ. A. Grissom and P. Youngs, 116–131. New York: Teachers College Press.
  • Feldman, M. S., and B. T. Pentland. 2003. “Reconceptualizing Organizational Routines as a Source of Flexibility and Change.” Administrative Science Quarterly 48 (1): 94–118. https://doi.org/10.2307/3556620.
  • Fenstermacher, G., and V. Richardson. 2005. “On Making Determinations of Quality in Teaching.” The Teachers College Record 107 (1): 186–213. https://doi.org/10.1111/j.1467-9620.2005.00462.x.
  • Gallimore, R., B. Ermeling, W. Saunders, and C. Goldenberg. 2009. “Moving the Learning of Teaching Closer to Practice: Teacher Education Implications of School‐Based Inquiry Teams.” The Elementary School Journal 109 (5): 537–553. https://doi.org/10.1086/597001.
  • Gore, J., M. Smith, J. Bowe, H. Ellis, and D. Lubans. 2017. “Effects of Professional Development on the Quality of Teaching: Results from a Randomised Controlled Trial of Quality Teaching Rounds.” Teaching and Teacher Education 68 (November): 99–113. https://doi.org/10.1016/j.tate.2017.08.007.
  • Gregory, A., J. P. Allen, A. Y. Mikami, C. A. Hafen, and R. C. Pianta. 2014. “Effects of a Professional Development Program on Behavioral Engagement of Students in Middle and High School.” Psychology in the Schools 51 (2): 143–163. https://doi.org/10.1002/pits.21741.
  • Grossman, P., and M. McDonald. 2008. “Back to the Future: Directions for Research in Teaching and Teacher Education.” American Educational Research Journal 45 (1): 184–205. https://doi.org/10.3102/0002831207312906.
  • Halverson, R. R., C. Kelley, and S. Kimball. 2004. “Implementing Teacher Evaluation Systems: How Principals Make Sense of Complex Artifacts to Shape Local Instructional Practice.” Educational Administration, Policy, and Reform: Research and Measurement, edited by W. K. Hoy and C. Miskel, 53–188. Greenwich, CT: Information Age Publishing.
  • Ho, A. D., and T. J. Kane. 2013. The Reliability of Classroom Observations by School Personnel. MET Project. Bill & Melinda Gates Foundation. http://eric.ed.gov/?id=ED540957.
  • Horn, I. S., and J. Warren Little. 2010. “Attending to Problems of Practice: Routines and Resources for Professional Learning in Teachers’ Workplace Interactions.” American Educational Research Journal 47 (1): 181–217. https://doi.org/10.3102/0002831209345158.
  • Hu, Y., and K. van Veen. 2020. “How Features of the Implementation Process Shape the Success of an Observation-Based Coaching Program: Perspectives of Teachers and Coaches.” The Elementary School Journal 121 (2): 283–310. https://doi.org/10.1086/711070.
  • Independent Group of Scientists appointed by the Secretary-General. 2019. “Global Sustainable Development Report 2019: The Future is Now – Science for Achieving Sustainable Development.” New York: United Nations. https://sdgs.un.org/sites/default/files/2020-07/24797GSDR_report_2019.pdf.
  • Jamil, F. M., and B. K. Hamre. 2018. “Teacher Reflection in the Context of an Online Professional Development Course: Applying Principles of Cognitive Science to Promote Teacher Learning.” Action in Teacher Education 40 (2): 220–236. https://doi.org/10.1080/01626620.2018.1424051.
  • Jensen, B., G. Valdés, and R. Gallimore. 2021. “Teachers Learning to Implement Equitable Classroom Talk.” Educational Researcher 50 (8): 546–556. https://doi.org/10.3102/0013189X211014859.
  • Jensen, B., and M. White. 2023. “Classroom Observation Systems for Teacher Collaborative Learning: Reliability of Teacher Raters.” Presented at the American Educational Research Association Annual Conference, Chicago, IL.
  • Kane, T. J., D. O. Staiger, D. McCaffrey, S. Cantrell, J. Archer, S. Buhayar, and D. Parker. 2012. “Gathering Feedback for Teaching: Combining High-Quality Observations with Student Surveys and Achievement Gains.” Seattle, WA: Bill & Melinda Gates Foundation, Measures of Effective Teaching Project. http://eric.ed.gov/?id=ED540960.
  • Kimball, S. M., and A. T. Milanowski. 2009. “Examining Teacher Evaluation Validity and Leadership Decision Making within a Standards-Based Evaluation System.” Educational Administration Quarterly 45 (1): 34–70. https://doi.org/10.1177/0013161X08327549.
  • Klette, K., and M. Blikstad-Balas. 2018. “Observation Manuals as Lenses to Classroom Teaching: Pitfalls and Possibilities.” European Educational Research Journal 17 (1): 129–146. https://doi.org/10.1177/1474904117703228.
  • Koedel, C., J. Li, M. G. Springer, and L. Tan. 2017. “The Impact of Performance Ratings on Job Satisfaction for Public School Teachers.” American Educational Research Journal 54 (2): 241–278. https://doi.org/10.3102/0002831216687531.
  • Kraft, M. A., D. Blazar, and D. Hogan. 2018. “The Effect of Teacher Coaching on Instruction and Achievement: A Meta-Analysis of the Causal Evidence.” Review of Educational Research 88 (4): 549–588. https://doi.org/10.3102/0034654318759268.
  • Kraft, M. A., and A. F. Gilmour. 2016. “Can Principals Promote Teacher Development as Evaluators? A Case Study of Principals’ Views and Experiences.” Educational Administration Quarterly 52 (5): 711–753. https://doi.org/10.1177/0013161X16653445.
  • Kraft, M. A., and A. F. Gilmour. 2017. “Revisiting the Widget Effect: Teacher Evaluation Reforms and the Distribution of Teacher Effectiveness.” Educational Researcher 46 (5): 234–249. https://doi.org/10.3102/0013189X17718797.
  • Kraft, M. A., and H. C. Hill. 2020. “Developing Ambitious Mathematics Instruction Through Web-Based Coaching: A Randomized Field Trial.” American Educational Research Journal 57 (6): 2378–2414. https://doi.org/10.3102/0002831220916840.
  • Liu, Y., J. Visone, M. B. Mongillo, and P. Lisi. 2019. “What Matters to Teachers if Evaluation is Meant to Help Them Improve?” Studies in Educational Evaluation 61 (June): 41–54. https://doi.org/10.1016/j.stueduc.2019.01.006.
  • Marsh, J. A., S. Bush-Mecenas, K. O. Strunk, J. Arnold Lincove, and A. Huguet. 2017. “Evaluating Teachers in the Big Easy: How Organizational Context Shapes Policy Responses in New Orleans.” Educational Evaluation and Policy Analysis 39 (4): 539–570. https://doi.org/10.3102/0162373717698221.
  • Mehta, J., and S. Fine. 2015. “Bringing Values Back In: How Purposes Shape Practices in Coherent School Designs.” Journal of Educational Change 16 (4): 483–510. https://doi.org/10.1007/s10833-015-9263-3.
  • Mintrop, R., M. Ordenes, E. Coghlan, L. Pryor, and C. Madero. 2018. “Teacher Evaluation, Pay for Performance, and Learning Around Instruction: Between Dissonant Incentives and Resonant Procedures.” Educational Administration Quarterly 54 (1): 3–46. https://doi.org/10.1177/0013161X17696558.
  • Myung, J., and K. Martinez. 2013. Strategies for Enhancing the Impact of Post-Observation Feedback for Teachers. Stanford, CA: Carnegie Foundation for the Advancement of Teaching.
  • Nelson, S. R., J. C. Leffler, and B. A. Hansen. 2009. Toward a Research Agenda for Understanding and Improving the Use of Research Evidence. Portland, OR: Northwest Regional Educational Laboratory.
  • Nelson, F. L., and T. Sadler. 2013. “A Third Space for Reflection by Teacher Educators: A Heuristic for Understanding Orientations to and Components of Reflection.” Reflective Practice 14 (1): 43–57. https://doi.org/10.1080/14623943.2012.732946.
  • Neumerski, C. M., J. A. Grissom, E. Goldring, T. A. Drake, M. Rubin, M. Cannata, and P. Schuermann. 2018. “Restructuring Instructional Leadership: How Multiple-Measure Teacher Evaluation Systems are Redefining the Role of the School Principal.” The Elementary School Journal 119 (2): 270–297. https://doi.org/10.1086/700597.
  • OECD. 2019. TALIS 2018 Results (Volume I): Teachers and School Leaders as Lifelong Learners. TALIS. OECD. https://doi.org/10.1787/1d0bc92a-en.
  • Praetorius, A.-K., and C. Y. Charalambous. 2018. “Classroom Observation Frameworks for Studying Instructional Quality: Looking Back and Looking Forward.” ZDM 50 (3): 535–553. https://doi.org/10.1007/s11858-018-0946-0.
  • Reinhorn, S. K., S. Moore Johnson, and N. S. Simon. 2017. “Investing in Development: Six High-Performing, High-Poverty Schools Implement the Massachusetts Teacher Evaluation Policy.” Educational Evaluation and Policy Analysis 39 (3): 383–406. https://doi.org/10.3102/0162373717690605.
  • Rigby, J. G. 2015. “Principals’ Sensemaking and Enactment of Teacher Evaluation.” Journal of Educational Administration 53 (3): 374–392. https://doi.org/10.1108/JEA-04-2014-0051.
  • Sartain, L., S. R. Stoelinga, E. R. Brown, K. K. Matsko, F. K. Miller, C. E. Durwood, J. Y. Jiang, and D. Glazer. 2011. Rethinking Teacher Evaluation in Chicago. Chicago, IL: Consortium on Chicago School Research at the University of Chicago.
  • Schwartz, S. 1994. “The Fallacy of the Ecological Fallacy: The Potential Misuse of a Concept and the Consequences.” American Journal of Public Health 84 (5): 819–824. https://doi.org/10.2105/AJPH.84.5.819.
  • Shaked, H. 2018. “Why Principals Often Give Overly High Ratings on Teacher Evaluations.” Studies in Educational Evaluation 59 (December): 150–157. https://doi.org/10.1016/j.stueduc.2018.07.007.
  • Smith, M. S. 2014. “Tools as a Catalyst for Practitioners’ Thinking.” Mathematics Teacher Educator 3 (1): 3–7. https://doi.org/10.5951/mathteaceduc.3.1.0003.
  • Stecher, B., M. Garet, D. Holtzman, and L. Hamilton. 2012. “Implementing Measures of Teacher Effectiveness.” Phi Delta Kappan 94 (3): 39–43. https://doi.org/10.1177/003172171209400309.
  • Steinberg, M. P., and M. L. Donaldson. 2016. “The New Educational Accountability: Understanding the Landscape of Teacher Evaluation in the Post-NCLB Era.” Education Finance and Policy 11 (3): 1–40. https://doi.org/10.1162/EDFP_a_00186.
  • Taut, S., and K. Rakoczy. 2016. “Observing Instructional Quality in the Context of School Evaluation.” Learning and Instruction 46 (December): 45–60. https://doi.org/10.1016/j.learninstruc.2016.08.003.
  • van de Grift, W. 2007. “Quality of Teaching in Four European Countries: A Review of the Literature and Application of an Assessment Instrument.” Educational Research 49 (2): 127–152. https://doi.org/10.1080/00131880701369651.
  • van Es, E. A., and M. G. Sherin. 2008. “Mathematics Teachers’ ‘Learning to Notice’ in the Context of a Video Club.” Teaching & Teacher Education 24 (2): 244–276. https://doi.org/10.1016/j.tate.2006.11.005.
  • Visone, J. D., M. B. Mongillo, and Y. Liu. 2022. “Teachers’ Perceptions of Collaboration within an Evolving Teacher Evaluation Context.” Journal of Educational Change 23 (4): 421–450. https://doi.org/10.1007/s10833-021-09424-4.
  • White, M., J. Luoto, K. Klette, and M. Blikstad-Balas. 2022. “Bringing the Conceptualization and Measurement of Teaching into Alignment.” Studies in Educational Evaluation 75 (December): 101204. https://doi.org/10.1016/j.stueduc.2022.101204.
  • White, M., and M. Ronfeldt. 2023. Monitoring Rater Quality in Observational Systems: Issues Due to Unreliable Estimates of Rater Quality. Ann Arbor, MI: University of Michigan.