Abstract
For accreditation and programmatic decision making, education school administrators use inter-rater reliability analyses to judge credibility of student–teacher assessments. Although weak levels of agreement between university-appointed supervisors and cooperating teachers are usually interpreted to indicate that the process is not being implemented with fidelity, it is unclear which part of the process contributes to weak agreements. Quantitative analysis of data (N = 230) from four semesters, and matched-pair sampling (n = 14), countered weaknesses found in usual analysis methods. Using “data-driven decision making,” “personal practical knowledge” and “evidence-based practice” as theoretical frameworks, university-based field instructors’ discussions about what accounted for varying correlations were analyzed. Qualitative data analysis (focus group/questionnaires) found that field instructors (n = 7) assumed divergent scores indicate weakness in evaluation processes and posited conflicting root causes. Inter-rater reliability analyses should include pair-wise sampling, so that weak and strong rates of agreement are unmasked and opportunities for meaningful data conversations are possible.
Notes
1. The pathwise scores are recorded on the evaluation forms, which are based on Danielson's (1996) framework for student teaching (see Appendix for tool).