ABSTRACT
Heightened accountability pressures and an increased emphasis on teaching quality have directed scholarly attention to scrutinizing instruction, particularly with respect to issues of validity and reliability. However, these attempts have largely been directed toward “core” content areas and investigated generic or content-specific instructional aspects separately. Focusing on a less explored area, physical education, and concurrently attending to both instructional aspects, in this exploratory study, we examined whether the optimal lesson-rater combination needed to obtain reliable teaching-quality estimates differs depending on the type of instructional aspects considered. Data analysis of 147 lessons using generalizability theory suggested that either a 3-lesson-2-rater or a 4-lesson-1-rater combination yields sufficiently reliable estimates for nearly all dimensions examined (generic or content specific). Quality of student practice, however, required 11 lessons scored by 2 raters. These findings underline the importance of examining individual dimensions within a given observational instrument and the merit of carefully selecting and training raters.
Acknowledgments
The authors would like to thank the teachers who participated in the study and Professor Richard Shavelson for his support and consultation during data analysis.
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes
1. For example, the Dynamic Model of Educational Effectiveness (Creemers & Kyriakides, Citation2008) has a distinct factor for time management and classifies classroom management as a different factor (classroom as a learning environment). Similarly, in the Classroom Assessment Scoring System (Pianta & Hamre, Citation2009), this distinction is manifested in the factors of behavioral management and productivity.
2. Rink’s classification includes the following categories of tasks: informing tasks (i.e., typically the first tasks in a task sequence; these tasks simply describe a skill or movement concept with their main focus being to help students just execute the skill); refining tasks (i.e., tasks that focus on improving the quality or the mechanics of the performance of the skill under consideration); extending tasks (i.e., tasks in which the level of difficulty or complexity of the conditions under which they are performed is gradually altered); and applying tasks (i.e., typically competitive tasks, which provide students with the opportunity to apply or practice the skills under consideration in situations similar to those in which they will be employed in games or other performance settings) (see Rink, Citation2010).
3. Starting from the academic year 2015–2016, PE is taught for three 40-min periods in Grades 5 and 6.
4. In Cyprus, primary-school PE lessons are taught by generalist teachers, as opposed to secondary-school PE lessons, which are taught by content specialists. Primary school teachers who opt to teach PE typically have a particular interest in this area (e.g., because of being athletes or participating in a sports team) and might have pursued additional PE training or Master’s degrees in PE.
5. Expressed in effect sizes, the differences in agreement between the raters’ scores and the master scores, were very small (they ranged from 0.033 to 0.066).
6. This model had acceptable fit indices (χ2 = 231.54, df = 140, χ2/df = 1.65; CFI = 0.96; RMSEA = 0.05 with its 95% CI bounds being 0.04 and 0.06); the loadings of the first-order factors ranged from 0.40 to 0.89, whereas the loadings of the second-order factors ranged from 0.32 to 0.87.
7. An alternative approach would be to use Chiu and Wolfe’s (Citation2002) subdividing method, according to which the dataset can be partitioned into smaller mutually exclusive and exhaustive subsets that can be analyzed by using typical G-study designs; the variance components yielded from these individual mini G-studies can then be averaged. The approach undertaken herein is less demanding but also valid, although it yields a lower bound estimate of reliability (R. Shavelson, personal communication, February 2016). Because of the exploratory character of this study and its intention to compare the reliabilities of generic and content-specific instructional aspects, such lower bound estimates are still informative.
8. Because we used the average of the items within each dimension and hence each factor score was on a continuous scale, we employed the intra-class correlation (ICC) to check for inter-rater agreement while also correcting for chance. According to Cicchetti (Citation1994, p. 284), ICC coefficients between 0.60 and 0.74 are good, whereas coefficients above 0.75 are considered excellent.
Additional information
Funding
Notes on contributors
Charalambos Y. Charalambous
Charalambos Y. Charalambous is an Assistant Professor in Educational Research and Evaluation at the Department of Education of the University of Cyprus. His research interests include measuring and understanding teaching quality and factors contributing to it, with a particular focus on teachers’ practices (generic and content specific) and teachers’ use of personal and systemic resources.
Ermis Kyriakides
Ermis Kyriakides holds a PhD in Physical Education from the Department of Education of the University of Cyprus. His research interests lie in the field of teaching effectiveness in physical education in primary schools. He is interested in exploring how teacher behaviors, and especially teaching practices (generic or content specific), can influence student learning in physical education.
Niki Tsangaridou
Niki Tsangaridou is a Professor in Physical Education at the Department of Education of the University of Cyprus. The focus of her research is on teacher reflection and reflective teaching, teacher education and learning to teach, and teaching practices in physical education.
Leonidas Kyriakides
Leonidas Kyriakides is Professor of Educational Research and Evaluation at the Department of Education of the University of Cyprus. His main research interests are in the area of school effectiveness and school improvement and especially in modelling the dynamic nature of educational effectiveness and in using research to promote quality and equity in education. Leonidas acted as chair of the AERA SIG on School Effectiveness and Improvement and of the EARLI SIG on Educational Effectiveness.