1,220
Views
5
CrossRef citations to date
0
Altmetric
Methodological Studies

A Framework for Addressing Instrumentation Biases When Using Observation Systems as Outcome Measures in Instructional Interventions

ORCID Icon, ORCID Icon &
Pages 162-188 | Received 17 May 2021, Accepted 21 Apr 2022, Published online: 17 Jun 2022

References

  • Barnett, S., Yarosz, D., Thomas, J., & Hornbeck, A. (2006). Educational effectiveness of a Vygotskian approach to preschool education: A randomized trial (pp. 1–37). National Institute for Early Education Research.
  • Bejar, I. (2012). Rater cognition: Implications for validity. Educational Measurement: Issues and Practice, 31(3), 2–9. https://doi.org/10.1111/j.1745-3992.2012.00238.x
  • Bell, C. A., Dobbelaer, M. J., Klette, K., & Visscher, A. (2019). Qualities of classroom observation systems. School Effectiveness and School Improvement, 30(1), 3–27. https://doi.org/10.1080/09243453.2018.1539014
  • Bell, C. A., Gitomer, D. H., McCaffrey, D. F., Hamre, B. K., Pianta, R. C., & Qi, Y. (2012). An argument approach to observation protocol validity. Educational Assessment, 17(2–3), 62–87. https://doi.org/10.1080/10627197.2012.715014
  • Bell, C. A., Jones, N. D., Qi, Y., & Lewis, J. M. (2018). Strategies for assessing classroom teaching: Examining administrator thinking as validity evidence. Educational Assessment, 23(4), 229–249. https://doi.org/10.1080/10627197.2018.1513788
  • Bell, C. A., Qi, Y., Croft, A. J., Leusner, D., McCaffrey, D. F., Gitomer, D. H., & Pianta, R. C. (2014). Improving observational score quality: Challenges in observer thinking. In T. J. Kane, K. A. Kerr, & R. C. Pianta (Eds.), Designing teacher evaluation systems: New guidance from the measures of effective teaching project (pp. 50–97). Jossey-Bass.
  • Blyth, C. R. (1972). On Simpson’s paradox and the sure-thing principle. Journal of the American Statistical Association, 67(338), 364–366. https://doi.org/10.1080/01621459.1972.10482387
  • Boston, M. (2012). Assessing instructional quality in mathematics. The Elementary School Journal, 113(1), 76–104. https://doi.org/10.1086/666387
  • Brown, J. L., Jones, S. M., LaRusso, M. D., & Aber, J. L. (2010). Improving classroom quality: Teacher influences and experimental impacts of the 4Rs program. Journal of Educational Psychology, 102(1), 153–167. https://doi.org/10.1037/a0018160
  • Campbell, S., & Ronfeldt, M. (2018). Observational evaluation of teachers: Measuring more than we bargained for? American Educational Research Journal, 55(6), 1233–1267. https://doi.org/10.3102/0002831218776216
  • Cappella, E., Hamre, B. K., Kim, H. Y., Henry, D. B., Frazier, S. L., Atkins, M. S., & Schoenwald, S. (2012). Teacher consultation and coaching within mental health practice: Classroom and child effects in urban elementary schools. Journal of Consulting and Clinical Psychology, 80(4), 597–610. https://doi.org/10.1037/a0027725
  • Carlisle, J., Kelcey, B., Berebitsky, D., & Phelps, G. (2011). Embracing the complexity of instruction: A study of the effects of teachers’ instruction on students’ reading comprehension. Scientific Studies of Reading, 15(5), 409–439. https://doi.org/10.1080/10888438.2010.497521
  • Casabianca, J. M., Lockwood, J. R., & McCaffrey, D. F. (2015). Trends in classroom observation scores. Educational and Psychological Measurement, 75(2), 311–337. https://doi.org/10.1177/0013164414539163
  • Casabianca, J. M., McCaffrey, D. F., Gitomer, D. H., Bell, C. A., Hamre, B. K., & Pianta, R. C. (2013). Effect of observation mode on measures of secondary mathematics teaching. Educational and Psychological Measurement, 73(5), 757–783. https://doi.org/10.1177/0013164413486987
  • Charalambous, C. Y., & Praetorius, A-K. (2020). Creating a forum for researching teaching and its quality more synergistically. Studies in Educational Evaluation, 67, 8. https://doi.org/10/gwsf
  • Cohen, J., Schuldt, L. C., Brown, L., & Grossman, P. (2016). Leveraging observation tools for instructional improvement: Exploring variability in uptake of ambitious instructional practices. Teachers College Record, 118, 1–36.
  • Curby, T. W., Stuhlman, M. W., Grimm, K., Mashburn, A., Chomat-Mooney, L., Downer, J., Hamre, B. K., & Pianta, R. C. (2011). Within-day variability in the quality of classroom interactions during third and fifth grade. The Elementary School Journal, 112(1), 16–37. https://doi.org/10.1086/660682
  • Danielson, C. (2007). Enhancing professional practice: A framework for teaching (2nd ed.). Association for Supervision & Curriculum Development.
  • Gallimore, R., & Santagata, R. (2006). Researching teaching: The problem of studying a system resistant to change. In Strengthening research methodology: Psychological measurement and evaluation (pp. 11–28). American Psychological Association. https://doi.org/10.1037/11384-001
  • Gore, J., Lloyd, A., Smith, M., Bowe, J., Ellis, H., & Lubans, D. (2017). Effects of professional development on the quality of teaching: Results from a randomised controlled trial of Quality Teaching Rounds. Teaching and Teacher Education, 68, 99–113. https://doi.org/10.1016/j.tate.2017.08.007
  • Graham, M., Milanowski, A. T., & Miller, J. (2012). Measuring and promoting inter-rater agreement of teacher and principal performance ratings (No. ED532068). Center for Educator Compensation Reform. http://eric.ed.gov/?id=ED532068
  • Gregory, A., Allen, J. P., Mikami, A. Y., Hafen, C. A., & Pianta, R. C. (2014). Effects of a professional development program on behavioral engagement of students in middle and high school. Psychology in the Schools, 51(2), 143–163. https://doi.org/10.1002/pits.21741
  • Grossman, P., Cohen, J. J., & Brown, L. (2014). Understanding instructional quality in English language arts. In Designing teacher evaluation systems: New guidance from the measures of effecting project (pp. 303–331). Jossey-Bass.
  • Grossman, P., Loeb, S., Cohen, J. J., & Wyckoff, J. (2013). Measure for measure: The relationship between measures of instructional practice in middle school English language arts and teachers’ value-added scores. American Journal of Education, 119(3), 445–470. https://doi.org/10.1086/669901
  • Hamre, B. K., Hatfield, B., Pianta, R. C., & Jamil, F. (2014). Evidence for general and domain-specific elements of teacher–child interactions: Associations with preschool children’s development. Child Development, 85(3), 1257–1274. https://doi.org/10.1111/cdev.12184
  • Hamre, B. K., Pianta, R. C., Mashburn, A., & Downer, J. T. (2012). Promoting young children’s social competence through the preschool PATHS curriculum and MyTeachingPartner professional development resources. Early Education & Development, 23(6), 809–832. https://doi.org/10.1080/10409289.2011.607360
  • Hiebert, J., Gallimore, R., Garnier, H., Givvin, K. B., Hollingsworth, H., Jacobs, J., Chui, A. M.-Y., Wearne, D., Smith, M., Kersting, N., Manaster, A., Tseng, E., Etterbeek, W., Manaster, C., Gonzales, P., & Stigler, J. (2003). Teaching Mathematics in seven countries: Results from the TIMSS 1999 video study. National Center for Education Statistics. https://doi.org/10.1037/e610352011-003
  • Jaeger, R. M. (1993). Live vs. Memorex: Psychometric and practical issues in the collection of data on teachers’ performances in the classroom. Center for Educational Research and Evaluation, University of North Carolina. http://eric.ed.gov/?id=ED360325
  • Joe, J., Kosa, J., Tierney, J., & Tocci, C. (2013). Observer calibration. Teachscape.
  • Kane, T. J., Staiger, D. O., McCaffrey, D., Cantrell, S., Archer, J., Buhayar, S., & Parker, D. (2012). Gathering feedback for teaching: Combining high-quality observations with student surveys and achievement gains. Bill & Melinda Gates Foundation, Measures of Effective Teaching Project. http://eric.ed.gov/?id=ED540960
  • Kelcey, B., & Carlisle, J. (2013). Learning about teachers’ literacy instruction from classroom observations. Reading Research Quarterly, 48(3), 301–317. https://doi.org/10.1002/rrq.51
  • Kim, H., Cameron, C. E., Kelly, C. A., West, H., Mashburn, A. J., & Grissmer, D. W. (2019). Using an individualized observational measure to understand children’s interactions in underserved kindergarten classrooms. Journal of Psychoeducational Assessment, 37(8), 935–956. https://doi.org/10.1177/0734282918819579
  • Klette, K., & Blikstad-Balas, M. (2018). Observation manuals as lenses to classroom teaching: Pitfalls and possibilities. European Educational Research Journal, 17(1), 129–146. https://doi.org/10.1177/1474904117703228
  • Klette, K., Blikstad-Balas, M., & Roe, A. (2017). Linking instruction and student achievement. A research design for a new generation of classroom studies. Acta Didactica Norge, 11(3), 10. https://doi.org/10.5617/adno.4729
  • Kraft, M. A., & Hill, H. C. (2020). Developing ambitious mathematics instruction through web-based coaching: A randomized field trial. American Educational Research Journal, 57(6), 2378–2414. https://doi.org/10.3102/0002831220916840
  • LaVenia, M., Cohen-Vogel, L., & Lang, L. B. (2015). The common core state standards initiative: An event history analysis of state adoption. American Journal of Education, 121(2), 145–182. https://doi.org/10.1086/679389
  • Liu, S., Bell, C. A., Jones, N. D., & McCaffrey, D. F. (2019). Classroom observation systems in context: A case for the validation of observation systems. Educational Assessment, Evaluation and Accountability, 31(1), 61–95. https://doi.org/10.1007/s11092-018-09291-3
  • McCormick, M. P., Cappella, E., O’Connor, E. E., & McClowry, S. G. (2015). Social-emotional learning and academic achievement: Using causal methods to explore classroom-level mechanisms. AERA Open, 1(3), 233285841560395. https://doi.org/10.1177/2332858415603959
  • Milanowski, A. (2017). Lower performance evaluation practice ratings for teachers of disadvantaged students: Bias or reflection of reality? AERA Open, 3(1), 233285841668555. https://doi.org/10.1177/2332858416685550
  • NGA Center for Best Practices, & CCSSO. (2010). Common core state standards: English language arts. National Governors Association Center for Best Practices, Council of Chief State School Officers. http://www.corestandards.org/
  • Pianta, R. C., DeCoster, J., Cabell, S., Burchinal, M., Hamre, B. K., Downer, J., LoCasale-Crouch, J., Williford, A., & Howes, C. (2014). Dose–response relations between preschool teachers’ exposure to components of professional development and increases in quality of their interactions with children. Early Childhood Research Quarterly, 29(4), 499–508. https://doi.org/10.1016/j.ecresq.2014.06.001
  • Pianta, R. C., Hamre, B. K., & Mintz, S. L. (2010). CLASS upper elementary manual. Teachstone.
  • Plank, S. B., & Condliffe, B. (2013). Pressures of the season: An examination of classroom quality and high-stakes accountability. American Educational Research Journal, 50(5), 1152–1182. https://doi.org/10.3102/0002831213500691
  • Praetorius, A.-K., & Charalambous, C. Y. (2018). Classroom observation frameworks for studying instructional quality: Looking back and looking forward. ZDM, 50(3), 535–553. https://doi.org/10.1007/s11858-018-0946-0
  • Praetorius, A.-K., Klieme, E., Herbert, B., & Pinger, P. (2018). Generic dimensions of teaching quality: The German framework of Three Basic Dimensions. ZDM, 50(3), 407–426. https://doi.org/10.1007/s11858-018-0918-4
  • Praetorius, A.-K., Lenske, G., & Helmke, A. (2012). Observer ratings of instructional quality: Do they fulfill what they promise? Learning and Instruction, 22(6), 387–400. https://doi.org/10.1016/j.learninstruc.2012.03.002
  • Praetorius, A.-K., Pauli, C., Reusser, K., Rakoczy, K., & Klieme, E. (2014). One lesson is all you need? Stability of instructional quality across lessons. Learning and Instruction, 31, 2–12. https://doi.org/10.1016/j.learninstruc.2013.12.002
  • Reyes, M. R., Brackett, M. A., Rivers, S. E., White, M., & Salovey, P. (2012). Classroom emotional climate, student engagement, and academic achievement. Journal of Educational Psychology, 104(3), 700–713. https://doi.org/10.1037/a0027268
  • Rivers, S. E., Brackett, M. A., Reyes, M. R., Elbertson, N. A., & Salovey, P. (2013). Improving the social and emotional climate of classrooms: A clustered randomized controlled trial testing the RULER approach. Prevention Science : The Official Journal of the Society for Prevention Research, 14(1), 77–87. https://doi.org/10.1007/s11121-012-0305-2
  • Rowan, B., & White, M. (2022). The common core state standards initiative as an innovation network. American Educational Research Journal, 59(1), 73–111. https://doi.org/10.3102/00028312211006689
  • Schoenfeld, A. H. (2018). Video analyses for research and professional development: The teaching for robust understanding (TRU) framework. ZDM, 50(3), 491–506. https://doi.org/10.1007/s11858-017-0908-y
  • Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin.
  • Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-Positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. https://doi.org/10.1177/0956797611417632
  • Simons, D. J., Shoda, Y., & Lindsay, D. S. (2017). Constraints on Generality (COG): A proposed addition to all empirical papers. Perspectives on Psychological Science : A Journal of the Association for Psychological Science, 12(6), 1123–1128. https://doi.org/10.1177/1745691617708630
  • Slavin, R., & Madden, N. A. (2011). Measures inherent to treatments in program effectiveness reviews. Journal of Research on Educational Effectiveness, 4(4), 370–380. https://doi.org/10.1080/19345747.2011.558986
  • Steinberg, M. P., & Garrett, R. (2016). Classroom composition and measured teacher performance: What do teacher observation scores really measure? Educational Evaluation and Policy Analysis, 38, 293–317. https://doi.org/10.3102/0162373715616249
  • Steinberg, M. P., & Sartain, L. (2015). Does teacher evaluation improve school performance? Experimental evidence from Chicago’s excellence in teaching project. Education Finance and Policy, 10(4), 535–572. https://doi.org/10.1162/EDFP_a_00173
  • Thierry, K. L., Vincent, R. L., & Norris, K. (2020). Teacher-level predictors of the fidelity of implementation of a social-emotional learning curriculum. Early Education and Development, 0(0), 1–15. https://doi.org/10.1080/10409289.2020.1849896
  • Tong, F., Tang, S., Irby, B. J., Lara-Alecio, R., Guerrero, C., & Lopez, T. (2019). A process for establishing and maintaining inter-rater reliability for two observation instruments as a fidelity of implementation measure: A large-scale randomized controlled trial perspective. Studies in Educational Evaluation, 62, 18–29. https://doi.org/10.1016/j.stueduc.2019.04.008
  • White, M. (2021, August 31). Moving from institutional practices to challenging the robustness of conclusions: The case of rater Error [Invited Keynote]. QUINT PhD Summer Institute 2021.
  • White, M. (2022). 3. A validity framework for the design and analysis of studies using standardized observation systems. In M. Blikstad-Balas, K. Klette, & M. Tengberg (Eds.), Ways of analyzing teaching quality: Potentials and pitfalls (pp. 89–120). Scandinavian University Press. https://doi.org/10.18261/9788215045054-2021
  • White, M. C. (2018). Rater performance standards for classroom observation instruments. Educational Researcher, 47(8), 492–501. 0013189X18785623. https://doi.org/10.3102/0013189X18785623
  • White, M. C., Rowan, B., Alter, G., Blankenship, L., Greene, C., & Windisch, S. (2018). User guide to the measures of effective teaching longitudinal database (MET LDB). The University of Michigan.
  • White, M., Maher, B., & Rowan, B. (2022). Common core-related shifts in english language arts teaching from 2010 to 2018: A video study. The Elementary School Journal.
  • White, M., & Ronfeldt, M. (2021). Monitoring rater quality in observational systems: Issues due to unreliable estimates of rater quality. University of Michigan.
  • Wilhelm, A. G., Rouse, A. G., & Jones, F. (2018). Exploring differences in measurement and reporting of classroom observation inter-rater reliability. Practical Assessment, Research & Evaluation, 23(4), 16.
  • Woehr, D. J., & Huffcutt, A. I. (1994). Rater training for performance appraisal: A quantitative review. Journal of Occupational and Organizational Psychology, 67(3), 189–205. https://doi.org/10.1111/j.2044-8325.1994.tb00562.x