Abstract
A multilevel analysis approach was used to analyse students’ evaluation of teaching (SET). The low value of inter-rater reliability stresses that any solid conclusions on teaching cannot be made on the basis of single feedbacks. To assess a teacher’s general teaching effectiveness, one needs to evaluate four randomly chosen course implementations. Two implementations are needed when one course is evaluated, and if one implementation is evaluated, up to 15 feedbacks are needed. The stability of students’ ratings is very high, which reflects students’ stable rating criteria. There is an obvious rating paradox: from the student’s point of view, each rating is very precise, stable and justifiable, but from the teacher’s point of view a single feedback reflects the quality of teaching to just a moderate extent. Cross-hierarchical analysis reveals that there are large discrepancies between the uses of rating scales; some students are systematically more lenient in their rating whereas others are systematically more severe. The study also reveals that some courses are generally rated more favourably and that some courses are more suitable for certain teachers. Managers can thus improve the quality of teaching by finding the most suitable courses for each teacher.