Adnot, M., Dee, T., Katz, V., & Wyckoff, J. (2017). Teacher turnover, teacher quality, and student achievement in DCPS. Educational Evaluation and Policy Analysis, 39(1), 54–76. doi:10.3102/0162373716663646
Web of Science ®Google Scholar
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Google Scholar
Ark, T. K. (2015). Ordinal generalizability theory using an underlying latent variable framework (Doctoral dissertation. University of British Columbia. Retrieved from https://open.library.ubc.ca/cIRcle/collections/ubctheses/24/items/1.0166304
Google Scholar
Bejar, I. I., Williamson, D. M., & Mislevy, R. J. (2006). Human scoring. In D. M. Williamson, I. I. Bejar, & R. J. Mislevy (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 49–81). Mahwah, NJ: Erlbaum.
Google Scholar
Bell, C. A., Yi, Q., Croft, A. J., Leusner, D., McCaffrey, D. F., Gitomer, D. H., & Pianta, R. C. (2014). Improving observational score quality: Challenges in observer thinking. In T. J. Kane, K. A. Kerr, & R. C. Pianta (Eds.), Designing teacher evaluation systems: New guidance from the measures of effective teaching project (pp. 50–97). San Francisco, USA: Jossey-Bass.
Google Scholar
Bell, C. A., Yi, Q., Croft, A. J., Leusner, D., McCaffrey, D. F., Gitomer, D. H., & Pianta, R. C. (2015). Improving Observational Score Quality. Designing Teacher Evaluation Systems (2015), 50–97.
Google Scholar
Bell, C. A., Gitomer, D. H., McCaffrey, D. F., Hamre, B. K., Pianta, R. C., & Qi, Y. (2012). An argument approach to observation protocol validity. Educational Assessment, 17(2–3), 62–87. doi:10.1080/10627197.2012.715014
Google Scholar
Bill and Melinda Gates Foundation. (2012). Gathering feedback for teaching: Combining high quality observations with student surveys and achievement gains. Seattle, WA: Author.
Google Scholar
Blazar, D., Braslow, D., Charalambous, C. Y., & Hill, H. C. (2017). Attending to general and mathematics specific dimensions of teaching: Exploring factors across two observation instruments. Educational Assessment, 22(2), 71–94. doi:10.1080/10627197.2017.1309274
Web of Science ®Google Scholar
Casabianca, J. M., Lockwood, J. R., & McCaffrey, D. F. (2015). Trends in classroom observation scores. Educational and Psychological Measurement, 75(2), 311–337. doi:10.1177/0013164414539163
PubMed Web of Science ®Google Scholar
Cash, A. H., Hamre, B. K., Pianta, R. C., & Myers, S. S. (2012). Rater calibration when observational assessment occurs at large scale: Degree of calibration and characteristics of raters associated with calibration. Early Childhood Research Quarterly, 27(3), 529–542. doi:10.1016/j.ecresq.2011.12.006
Web of Science ®Google Scholar
Ciullo, S., Lembke, E. S., Carlisle, A., Thomas, C. N., Goodwin, M., & Judd, L. (2016). Implementation of evidence-based literacy practices in middle school response to intervention: An observation study. Learning Disability Quarterly, 39(1), 44–57. doi:10.1177/0731948714566120
Web of Science ®Google Scholar
Cook, D. A., Brydges, R., Ginsburg, S., & Hatala, R. (2015). A contemporary approach to validity arguments: A practical guide to Kane’s framework. Medical Education, 49, 560–575. doi:10.1111/medu.12678
PubMed Web of Science ®Google Scholar
Crawford, A. R., Johnson, E. S., Moylan, L. A., & Zheng, Y. (2018). Variance and reliability in a special educator evaluation instrument. Assessment for Effective Intervention. doi:10.1177/1534508418781010
Google Scholar
Eckes, T. (2011). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments. Frankfurt, Germany: Peter Lang.
Google Scholar
Engelhard, G., Jr. (2002). Monitoring raters in performance assessments. In G. Tindal & T. Haladyna (Eds.), Large-scale assessment programs for all students: Development, implementation, and analysis (pp. 261–287). Mahwah, NJ: Erlbaum.
Google Scholar
Erlich, O., & Shavelson, R. (1978). The search for correlations between measures of teacher behavior and student achievement: Measurement problem, conceptualization problem, or both? Journal of Educational Measurement, 15, 77–89. doi:10.1111/jedm.1978.15.issue-2
Web of Science ®Google Scholar
Fuchs, D., Hendricks, E., Walsh, M. E., Fuchs, L. S., Gilbert, J. K., Zhang, T. W., … Peng, P. (2018). Evaluating a multidimensional reading comprehension program and reconsidering the lowly reputation of tests of near-transfer. Learning Disabilities Research & Practice, 33(1), 11–23.
Web of Science ®Google Scholar
Fulcher, G., Davidson, F., & Kemp, J. (2011). Effective rating scale development for speaking tests: Performance decision trees. Language Testing, 28, 5–29.
Web of Science ®Google Scholar
Gitomer, D., Bell, C., Qi, Y., McCaffrey, D., Hamre, B. K., & Pianta, R. C. (2014). The instructional challenge in improving teaching quality: Lessons from a classroom observation protocol. Teachers College Record, 116(6), 1–32.
Web of Science ®Google Scholar
Goe, L., Bell, C. A., & Little, O. (2008). Approaches to evaluating teacher effectiveness: A research synthesis. Washington, DC: National Comprehensive Center for Teacher Quality. Retrieved from http://www.gtlcenter.org/
Google Scholar
Hall, E. (2014). A framework to support the validation of educator evaluation systems. National Center for the Improvement of Educational Assessment. Retrieved from http://www.nciea.org
Google Scholar
Herlihy, C., Karger, E., Pollard, C., Hill, H. C., Kraft, M. A., Williams, M., & Howard, S. (2014). State and local efforts to investigate the validity and reliability of scores from teacher evaluation systems. Teachers College Record, 116(1), 1–28.
Web of Science ®Google Scholar
Hill, H., & Grossman, P. (2013). Learning from teacher observations: Challenges and opportunities posed by new teacher evaluation systems. Harvard Educational Review, 83(2), 371–384.
Web of Science ®Google Scholar
Hill, H. C., Charalambous, C. Y., & Kraft, M. A. (2012). When rater reliability is not enough: Teacher observation systems and a case for the generalizability study. Educational Researcher, 41(2), 56–64.
Web of Science ®Google Scholar
Johnson, E. S., Crawford, A., Moylan, L. A., & Ford, J. W. (2016). Issues in evaluating special education teachers: Challenges and current perspectives. Texas Education Review, 4(1), 71–83.
Google Scholar
Johnson, E. S., Crawford, A., Moylan, L. A., & Zheng, Y. (2016). Explicit instruction rubric technical manual. Boise, ID: Boise State University.
Google Scholar
Johnson, E. S., Crawford, A. R., Moylan, L. A., & Zheng, Y. (2018). Using evidence-centered design to create a special educator observation system. Educational Measurement: Issues and Practice, 37(2), 35–44.
Web of Science ®Google Scholar
Johnson, E. S., Moylan, L. A., Crawford, A. R., & Zheng, Y. Z. (2019). Developing a comprehension instruction observation Rubric for special education teachers. Reading and Writing Quarterly, 35(2), 118–136.
Web of Science ®Google Scholar
Johnson, E. S., & Semmelroth, C. L. (2014). Special education teacher evaluation: Why it matters and what makes it challenging. Assessment for Effective Intervention, 39, 71–82.
Google Scholar
Johnson, E. S., Zheng, Y., Crawford, A. R., & Moylan, L. A. (2018). Developing an explicit instruction special education teacher observation instrument. Journal of Special Education. doi:10.1177/0022466918796224
Web of Science ®Google Scholar
Jones, N. (2019, February). Observing special education teachers in high-stakes teacher evaluation systems. Presentation give at the Pacific Coast Research Conference, Coronado, CA.
Google Scholar
Kane, M. (2004). Certification testing as an illustration of argument-based validation. Measurement: Interdisciplinary Research & Perspective, 2(3), 135–170. doi:10.1207/s15366359mea0203_1
Google Scholar
Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational Measurement (4th ed., pp. 17–64). Westport, CT: Praeger.
Google Scholar
Kane, M. T. (2013). The argument-based approach to validation. Social Psychology Review, 42(4), 448–457.
Google Scholar
Knoch, U. (2009). Diagnostic assessment of writing: A comparison of two rating scales. Language Testing, 26(2), 275–304.
Web of Science ®Google Scholar
Linacre, J. M. (1994). Sample size and item calibration [or person measure] stability. Rasch Measurement Transactions, 7, 328.
Google Scholar
Linacre, J. M. (2014). Facets 3.71. 4 [Computer software]. Chicago, IL: Winsteps.com.
Google Scholar
Mantzicopoulos, P., French, B. F., Patrick, H., Watson, J. S., & Ahn, I. (2018). The stability of kindergarten teachers’ effectiveness: A generalizability study comparing the framework for teaching and the classroom assessment scoring system. Educational Assessment, 23(1), 24–46.
Web of Science ®Google Scholar
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 3–67.
Google Scholar
Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-faceted Rasch measurement: Part I. Journal of Applied Measurement, 4, 386–422.
PubMedGoogle Scholar
Norris, C. E., & Borst, J. D. (2007). An examination of the reliabilities of two choral festival adjudication forms. Journal of Research in Music Education, 55, 237–251.
Web of Science ®Google Scholar
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory. New York, NY: McGraw-Hill.
Google Scholar
Shepard, L. (2012). Evaluating the use of tests to measure teacher effectiveness: Validity as a theory of action framework. A paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, British Columbia.
Google Scholar
Smith, E. V., & Kulikowich, J. M. (2004). An application of generalizability theory and many-facet Rasch measurement using a complex problem-solving skills assessment. Educational and Psychological Measurement, 64(4), 617–639.
Web of Science ®Google Scholar
Webb, N. M., Shavelson, R. J., & Haertel, E. H. (2006). 4 reliability coefficients and generalizability theory. Handbook of Statistics, 26, 81–124.
Google Scholar
Wigglesworth, G. (1993). Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction. Language Testing, 10(3), 305–319.
Google Scholar
Wong, C., Odom, S. L., Hume, K. A., Cox, C. W., Fettig, A., Kurcharczyk, S., … Schultz, T. R. (2015). Evidence-based practices for children, youth, and young adults with autism spectrum disorder: A comprehensive review. Journal of Autism and Developmental Disorders, 45, 1951–1966.
PubMed Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Validity of a Special Education Teacher Observation System

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature