Search in:

Advanced search

The Teacher Educator Volume 58, 2023 - Issue 2

Submit an article Journal homepage

173

Views

CrossRef citations to date

Altmetric

Research Articles

A Case Study of a Multi-Faceted Approach to Evaluating Teacher Candidate Ratings

Eli Jonesa The University of MemphisCorrespondence[email protected]

https://orcid.org/0000-0002-0320-6341

Stefanie A. Windb The University of Alabama

https://orcid.org/0000-0002-1599-375X

Jan Burchamc Columbus State University

Anna Hartc Columbus State University

Thomas Daileyc Columbus State University

Pages 109-129 | Published online: 27 Jul 2022

Cite this article
https://doi.org/10.1080/08878730.2022.2104983
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Allen, D., & Tanner, K. (2006). Rubrics: Tools for making learning goals and evaluation criteria explicit for both teachers and learners. CBE Life Sciences Education, 5(3), 197–203. https://doi.org/10.1187/cbe.06-06-0168
PubMedGoogle Scholar
Andrich, D. (1978). Application of a psychometric rating model to ordered categories which are scored with successive integers. Applied Psychological Measurement, 2(4), 581–594. https://doi.org/10.1177/014662167800200413
Google Scholar
Barrett, S. (2001). The impact of training on rater variability. International Education Journal, 2(1), 49–58.
Google Scholar
Bell, C. A., Gitomer, D. H., McCaffrey, D. F., Hamre, B. K., Pianta, R. C., & Qi, Y. (2012). An argument approach to observation protocol validity. Educational Assessment, 17(2–3), 62–87. https://doi.org/10.1080/10627197.2012.715014
Google Scholar
Bergin, C., Wind, S. A., Grajeda, S., & Tsai, C.-L. (2017). Teacher evaluation: Are principals’ classroom observations accurate at the conclusion of training? Studies in Educational Evaluation, 55, 19–26. https://doi.org/10.1016/j.stueduc.2017.05.002
Web of Science ®Google Scholar
Bond, T. (2015). Applying the rasch model: Fundamental measurement in the human sciences (3rd ed.). Routledge.
Google Scholar
Bourke, T., Ryan, M., & Ould, P. (2018). How do teacher educators use professional standards in their practice? Teaching and Teacher Education, 75, 83–92. https://doi.org/10.1016/j.tate.2018.06.005
Web of Science ®Google Scholar
Bryant, C. L., Maarouf, S., Burcham, J., & Greer, D. (2016). The examination of a teacher candidate assessment rubric: A confirmatory factor analysis. Teaching and Teacher Education, 57, 79–96. https://doi.org/10.1016/j.tate.2016.03.012
Web of Science ®Google Scholar
Casabianca, J. M., McCaffrey, D. F., Gitomer, D. H., Bell, C. A., Hamre, B. K., & Pianta, R. C. (2013). Effect of observation mode on measures of secondary mathematics teaching. Educational and Psychological Measurement, 73(5), 757–783. https://doi.org/10.1177/0013164413486987
Web of Science ®Google Scholar
Cash, A. H., Hamre, B. K., Pianta, R. C., & Myers, S. S. (2012). Rater calibration when observational assessment occurs at large scale: Degree of calibration and characteristics of raters associated with calibration. Early Childhood Research Quarterly, 27(3), 529–542. https://doi.org/10.1016/j.ecresq.2011.12.006
Web of Science ®Google Scholar
Caughlan, S., & Jiang, H. (2014). Observation and teacher quality: Critical analysis of observational instruments in preservice teacher performance assessment. Journal of Teacher Education, 65(5), 375–388. https://doi.org/10.1177/0022487114541546
Web of Science ®Google Scholar
Choi, H., Benson, N. F., & Shudak, N. J. (2016). Assessment of teacher candidate dispositions: Evidence of reliability and validity. Teacher Education Quarterly, 43(3), 71–89.
Google Scholar
Congdon, P. J., & MeQueen, J. (2000). The stability of rater severity in large-scale assessment programs. Journal of Educational Measurement, 37(2), 163–178. https://doi.org/10.1111/j.1745-3984.2000.tb01081.x
Web of Science ®Google Scholar
Council of Chief State School Officers. (2011). Interstate teacher assessment and support Consortium (InTASC). In Model core teaching standards: A resource for state dialogue. Council of Chief State School Officers.
Google Scholar
Danielson, C. (1996). Enhancing professional practice: A framework for teaching. Association for supervision and curriculum development. Association for Supervision and Curriculum Development.
Google Scholar
Danielson, C. (2011). The framework for teaching evaluation instrument. The Danielson Group.
Google Scholar
Darling-Hammond, L. (2006). Assessing teacher education: The usefulness of multiple measures for assessing program outcomes. Journal of Teacher Education, 57(2), 120–138. https://doi.org/10.1177/0022487105283796
Web of Science ®Google Scholar
Darling-Hammond, L. (2010). Teacher education and the American future. Journal of Teacher Education, 61(1–2), 35–47. https://doi.org/10.1177/0022487109348024
Web of Science ®Google Scholar
Darling-Hammond, L., & Cook-Harvey, C. M. (2018). Educating the whole child: Improving school climate to support student success. Learning Policy Institute.
Google Scholar
De Ayala, R. J. (2009). The theory and practice of item response theory. Guilford Publications.
Google Scholar
Eisenman, G., Edwards, S., & Cushman, C. A. (2015). Bringing reality to classroom management in teacher education. Professional Educator, 39(1), 1–12.
Google Scholar
Engelhard, G., & Wind, S. A. (2018). Invariant measurement with raters and rating scales: Rasch models for rater-mediated assessments. Routledge.
Google Scholar
Graham, M., Milanowski, A., & Miller, J. (2012). Measuring and promoting inter-rater agreement of teacher and principal performance ratings. Center for Educator Compensation reform.
Google Scholar
Haj-Ali, R., & Feil, P. (2006). Rater reliability: Short-and long-term effects of calibration training. Journal of Dental Education, 70(4), 428–433. https://doi.org/10.1002/j.0022-0337.2006.70.4.tb04097.x
PubMedGoogle Scholar
Hill, H. C., Charalambous, C. Y., & Kraft, M. A. (2012). When rater reliability is not enough: Teacher observation systems and a case for the generalizability study. Educational Researcher, 41(2), 56–64. https://doi.org/10.3102/0013189X12437203
Web of Science ®Google Scholar
Ho, A. D., & Kane, T. J. (2013). The reliability of classroom observations by school personnel. Research paper. MET project. Bill & Melinda Gates Foundation. https://eric.ed.gov/?id=ED540957
Google Scholar
Hoyt, W. T., & Kerns, M.-D. (1999). Magnitude and moderators of bias in observer ratings: A meta-analysis. Psychological Methods, 4(4), 403–424. https://doi.org/10.1037/1082-989X.4.4.403
Web of Science ®Google Scholar
Jackson, E. D., Kelsey, K. D., & Rice, A. H. (2018). A case study of technology mediated observation in pre-service teaching experiences for edTPA implementation. NACTA Journal, 62(1), 1–10.
Google Scholar
Jones, E., & Bergin, C. (2019). Evaluating teacher effectiveness using classroom observations: A Rasch analysis of the rater effects of principals. Educational Assessment, 24(2), 91–118. https://doi.org/10.1080/10627197.2018.1564272
Web of Science ®Google Scholar
Knoch, U., Read, J., & von Randow, J. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing, 12(1), 26–43. https://doi.org/10.1016/j.asw.2007.04.001
Google Scholar
Ladd, K. L. (2000). A comparison of teacher education programs and graduates' perceptions of experiences. University of Missouri-Columbia.
Google Scholar
Linacre, J. M. (1989). Many-facet Rasch measurement. MESA Press.
Google Scholar
Linacre, J. M. (1994). Sample size and item calibration stability. Rasch Measurement Transactions, 7, 328.
Google Scholar
Linacre, J. M. (2002). Optimizing rating scale category effectiveness. Journal of Applied Measurement, 3(1), 85–106.
PubMedGoogle Scholar
Linacre, J. M. (2020). Facets computer program for many-facet Rasch measurement. (version 3.83.4). Winsteps.com.
Google Scholar
Linacre, J. M. (2021). Re: Minimum sample size for many-facets Rasch measurement [Discussion post]. Rasch Measurement Forum. https://raschforum.boards.net/thread/3521/minimum-sample-facet-rasch-measurement
Google Scholar
Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language Testing, 12(1), 54–71. https://doi.org/10.1177/026553229501200104
Google Scholar
Mantzicopoulos, P., French, B. F., Patrick, H., Watson, J. S., & Ahn, I. (2018). The stability of kindergarten teachers’ effectiveness: A generalizability study comparing the framework for teaching and the classroom assessment scoring system. Educational Assessment, 23(1), 24–46. https://doi.org/10.1080/10627197.2017.1408407
Web of Science ®Google Scholar
Marzano, R. J. (2007). The art and science of teaching: A comprehensive framework for effective instruction. Association for Supervision and Curriculum Development.
Google Scholar
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174. https://doi.org/10.1007/BF02296272
Web of Science ®Google Scholar
Murray, F. B. (2005). On building a unified system of accreditation in teacher education. Journal of Teacher Education, 56(4), 307–317. https://doi.org/10.1177/0022487105279842
Google Scholar
Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386–422.
PubMedGoogle Scholar
Raczynski, K. R., Cohen, A. S., Engelhard Jr, G., & Lu, Z. (2015). Comparing the effectiveness of self-paced and collaborative frame-of-reference training on rater accuracy in a large-scale writing assessment. Journal of Educational Measurement, 52(3), 301–318. https://doi.org/10.1111/jedm.12079
Web of Science ®Google Scholar
Raths, J., & Lyman, F. (2003). Summative evaluation of student teachers: An enduring problem. Journal of Teacher Education, 54(3), 206–216. https://doi.org/10.1177/0022487103054003003
Web of Science ®Google Scholar
Reagan, E. M., Terrell, D. G., Rogers, A. P., Schram, T., Tompkins, P., Ward, C., Birch, M. L., McCurdy, K., & McHale, G. (2019). Performance assessment for teacher candidate learning. Teacher Education Quarterly, 46(2), 114–141.
Google Scholar
Sandholtz, J. H., & Shea, L. M. (2012). Predicting performance: A comparison of university supervisors’ predictions and teacher candidates’ scores on a teaching performance assessment. Journal of Teacher Education, 63(1), 39–50. https://doi.org/10.1177/0022487111421175
Web of Science ®Google Scholar
Wei, R. C., & Pecheone, R. L. (2010). Assessment for learning in preservice teacher education: Performance-based assessments. In M. M. Kennedy (Ed.), Teacher assessment and the quest for teacher quality (pp. 69–132). Jossey-Bass.
Google Scholar
Wind, S. A. (2019). A nonparametric procedure for exploring differences in rating quality across test-taker subgroups in rater-mediated writing assessments. Language Testing, 36(4), 595–616. https://doi.org/10.1177/0265532219838014
Google Scholar
Wind, S. A., & Jones, E. (2019). Not just generalizability: A case for multifaceted latent trait models in teacher observation systems. Educational Researcher, 48(8), 521–533. https://doi.org/10.3102/0013189X19874084
Web of Science ®Google Scholar
Wolfe, E. W. (2013). A bootstrap approach to evaluating person and item fit to the Rasch model. Journal of Applied Measurement, 14(1), 1–9.
PubMedGoogle Scholar
Wu, M., & Adams, R. J. (2013). Properties of Rasch residual fit statistics. Journal of Applied Measurement, 14(4), 339–355.
PubMedGoogle Scholar
Youngs, P., & Whittaker, A. (2015). The role of EdTPA in assessing content specific instructional practices. In P. Youngs & J. Grissom (Eds.), Improving teacher evaluation systems (pp. 89–101). Teachers College Press.
Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

A Case Study of a Multi-Faceted Approach to Evaluating Teacher Candidate Ratings

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

A Case Study of a Multi-Faceted Approach to Evaluating Teacher Candidate Ratings

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date