CrossRef citations to date

Initial Considerations When Applying an Instructional Sensitivity Framework: Partitioning the Variation Between and Within Classrooms for Two Mathematics Assessments


  • Airasian, P. W., & Madaus, G. F. (1983). Linking testing and instruction: Policy issues. Journal of Educational Measurement, 20(2), 103–118. doi:10.1111/jedm.1983.20.issue-2
  • Bhola, D. S., Impara, J. C., & Buckendahl, C. W. (2003). Aligning tests with states’ content standards: Methods and issues. Educational Measurement: Issues and Practice, 22(3), 21–29. doi:10.1111/j.1745-3992.2003.tb00134.x
  • Bill and Melinda Gates Foundation. (2010a). Learning about teaching: Initial findings from the measures of effective teaching project. Seattle, WA: Author.
  • Bill and Melinda Gates Foundation. (2010b). Student assessments and the MET project. Seattle, WA: Author.
  • Bill and Melinda Gates Foundation. (2012). Gathering feedback for teaching: Combining high-quality observations with student surveys and achievement gains. Seattle, WA: Author.
  • Bliese, P. D. (2000). Within-group agreement, non-independent, and reliability: Implications for data aggregation and analysis. In K. Klein & S. W. J. Kozlowski (Eds.), Multilevel theory, research and methods in organizations: Foundations, extensions and new directions (pp. 349–381). San Francisco, CA: Jossey Bass.
  • Burstein, L. (1983). A word about this issue [Editor’s note]. Journal of Educational Measurement, 20(2), 99–102.
  • Burstein, L. (1989). Conceptual considerations in instructionally sensitive assessments (CSE Technical Report 333). Center for Research on Evaluation, Standards, and Student Testing. Los Angeles, CA: University of California, Los Angeles.
  • Cantrell, S. M. (2012). The Measures of Effective Teaching Project: An experiment to build evidence and trust. Education Finance and Policy, 7(2), 203–218. doi:10.1162/EDFP_a_00062
  • Chen, G., Kirkman, B. L., Kanfer, R., Allen, D., & Rosen, B. (2007). A multilevel study of leadership, empowerment, and performance in teams. Journal of Applied Psychology, 92(2), 331–346. doi:10.1037/0021-9010.92.2.331
  • Cox, R. C., & Vargas, J. S. (1966). A comparison of item-selection techniques for norm referenced and criterion referenced tests. Paper presented at the annual conference of the National Council on Measurement in Education, Chicago, IL.
  • D’Agostino, J. V., Welsh, M. E., & Corson, N. M. (2007). Instructional sensitivity of a state standards-based assessment. Educational Assessment, 12(1), 1–22. doi:10.1080/10627190709336945
  • Donlon, T. F., & Fischer, F. E. (1968). An index of an individual’s agreement with group-determined item difficulties. Educational and Psychological Measurement, 28(1), 105–113. doi:10.1177/001316446802800110
  • Evans, F. R., & Pike, L. W. (1973). The effects of instruction for three mathematics item formats. Journal of Educational Measurement, 10(4), 257–271. doi:10.1111/jedm.1973.10.issue-4
  • Grossman, P., Cohen, J., Ronfeldt, M., & Lindsay, B. (2014). The test matters: The relationship between classroom observation scores and teacher value added on multiple types of assessment. Educational Researcher, 43(6), 293–303. doi:10.3102/0013189X14544542
  • Haladyna, T. M. (1974). Effects of different samples on item and test characteristics of criterion-referenced tests. Journal of Educational Measurement, 11(2), 93–99. doi:10.1111/jedm.1974.11.issue-2
  • Haladyna, T. M., & Roid, G. H. (1981). The role of instructional sensitivity in the empirical review of criterion-referenced test items. Journal of Educational Measurement, 18(1), 39–53. doi:10.1111/jedm.1981.18.issue-1
  • Hanna, G. S., & Bennett, J. A. (1984). Instructional sensitivity expanded. Educational and Psychological Measurement, 44(3), 583–596. doi:10.1177/0013164484443006
  • Hanson, R. A., McMorris, R. F., & Bailey, J. D. (1986). Differences in instructional sensitivity between item formats and between achievement test items. Journal of Educational Measurement, 23(1), 1–12. doi:10.1111/jedm.1986.23.issue-1
  • Harnisch, D. L. (1983). Item response patterns: Applications for educational practice. Journal of Educational Measurement, 20(2), 191–206. doi:10.1111/jedm.1983.20.issue-2
  • Harnisch, D. L., & Linn, R. L. (1981). Analysis of item response patterns. Questionable test data and dissimilar curriculum practices. Journal of Educational Measurement, 18(3), 133–146. doi:10.1111/jedm.1981.18.issue-3
  • Herman, J. L., Webb, N. M., & Zuniga, S. A. (2007). Measurement issues in the alignment of standards and assessments. Applied Measurement in Education, 20(1), 101–126.
  • Hill, H. C., & Grossman, P. (2013). Learning from teacher observations: Challenges and opportunities posed by new teacher evaluation systems. Harvard Educational Review, 83(2), 371–384. doi:10.17763/haer.83.2.d11511403715u376
  • Ing, M. (2008). Using instructional sensitivity and instructional opportunities to interpret students’ mathematics performance. Journal of Educational Research and Policy Studies, 8, 23–43.
  • Ing, M. (2012). Using instructional sensitivity and instructional opportunities to interpret students’ mathematics performance. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, British Columbia, Canada.
  • Ing, M., & Webb, N. M. (2012). Characterizing mathematics classroom practice: Impact of observation and coding choices. Educational Measurement: Issues and Practice, 31, 14–26. doi:10.1111/emip.2012.31.issue-1
  • Inter-University Consortium for Political and Social Research. (2013). Year 1 section-level analytical file 4th–8th grade codebook (ICPSR 34309-0001). Ann Arbor, MI: Inter-University Consortium for Political and Social Research; University of Michigan.
  • Kane, M. T., & Brennan, R. L. (1980). Agreement coefficients as indices of dependability for domain-referenced tests. Applied Psychological Measurement, 4, 105–126. doi:10.1177/014662168000400111
  • Kane, T. J., Kerr, K. A., & Pianta, R. C. (Eds.). (2014). Designing teacher evaluation systems: New guidance from the measures of effective teaching project. San Francisco, CA: Jossey-Bass.
  • Li, M., Ruiz-Primo, M. A., & Wills, K. (2012). Comparing two study designs to estimate the instructional sensitivity of items. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, British Columbia, Canada.
  • Linn, R. L. (1983). Testing and instruction: Links and distinctions. Journal of Educational Measurement, 20(2), 179–189. doi:10.1111/jedm.1983.20.issue-2
  • Marsh, H. W., Lüdtke, O., Nagengast, B., Trautwein, U., Morin, A. J. S., Abduljabbar, A. S., & Köller, O. (2012). Classroom climate and contextual effects: Conceptual and methodological issues in the evaluation of group-level effects. Educational Psychologist, 47(2), 106–124. doi:10.1080/00461520.2012.670488
  • McCaffrey, D. F., Lockwood, J. R., Koretz, D., Louis, T. A., & Hamilton, L. (2009). Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics, 9(1), 67–101.
  • Mehrens, W. A., & Phillips, S. E. (1986). Detecting impacts of curricular differences in achievement test data. Journal of Educational Measurement, 23(3), 185–196. doi:10.1111/jedm.1986.23.issue-3
  • Mehrens, W. A., & Phillips, S. E. (1987). Sensitivity of item difficulties to curricular validity. Journal of Educational Measurement, 24(4), 357–370. doi:10.1111/jedm.1987.24.issue-4
  • Miller, M. D., & Linn, R. L. (1988). Invariance of item characteristic functions with variations in instructional coverage. Journal of Educational Measurement, 25(3), 205–219. doi:10.1111/jedm.1988.25.issue-3
  • Muthén, B. O. (1989). Using item-specific instructional information in achievement modeling. Psychometrika, 54(3), 385–396. doi:10.1007/BF02294624
  • Muthén, B. O. (1994). Instructionally sensitive psychometrics: Applications to the second international mathematics study. In I. Westbury, C. A. Ethington, L. A. Sosniak, & D. P. Baker (Eds.), In search of more effective mathematics instruction (pp. 293–324). Norwood, NJ: Ablex Publishing Corporation.
  • Muthén, B. O., Huang, L. C., Khoo, S. K., Goff, G. H., Novak, J. R., & Shin, J. C. (1995). Opportunity-to-learn effects on achievement: Analytical aspects. Educational Evaluation and Policy Analysis, 17(3), 371–403. doi:10.3102/01623737017003371
  • Muthén, B. O., Kao, C. F., & Burstein, L. (1991). Instructionally sensitive psychometrics: Application of a new IRT-based detection technique to mathematics achievement test items. Journal of Educational Measurement, 28(1), 1–22. doi:10.1111/j.1745-3984.1991.tb00340.x
  • Naumann, A., Fauth, B., Hochweber, J., & Klieme, E. (2012). Assessment of instructional sensitivity: Evaluations at test and item level. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, British Columbia, Canada.
  • Niemi, D., Wang, J., Steinberg, D. H., Baker, E. L., & Wang, H. (2007). Instructional sensitivity of a complex language arts performance assessment. Educational Assessment, 12(3&4), 215–237.
  • Polikoff, M. S. (2010). Instructional sensitivity as a psychometric property of assessments. Educational Measurement: Issues and Practice, 29(4), 3–14. doi:10.1111/emip.2010.29.issue-4
  • Polikoff, M. S. (2012). Evaluating the instructional sensitivity of four states’ student achievement tests. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, British Columbia, Canada.
  • Popham, W. J. (1971). Indices of adequacy for criterion-referenced test items. In W. J. Popham (Ed.), Criterion-referenced measurement: An introduction (pp. 79–98). Englewood Cliffs, NJ: Educational Technology Publications, Inc.
  • Popham, W. J. (2006). Determining the instructional sensitivity of accountability tests. Paper presented at the annual Large-Scale Assessment Conference, Council of Chief State School Officers. San Francisco, CA.
  • Popham, W. J. (2007). Instructional Insensitivity of Tests: Accountability’s Dire Drawback. Phi Delta Kappan, 89(2), 146–155. doi:10.1177/003172170708900211
  • Popham, W. J., & Husek, T. R. (1969). Implications of criterion-referenced measurement. Journal of Educational Measurement, 6(1), 1–9. doi:10.1111/j.1745-3984.1969.tb00654.x
  • Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage Publications.
  • Raudenbush, S. W., & Sadoff, S. (2008). Statistical inference when classroom quality is measured with error. Journal of Research on Educational Effectiveness, 1(2), 138–154. doi:10.1080/19345740801982104
  • Ridgway, J., Zawojewski, J. S., & Hoover, M. N. (2000). Problematising evidence-based policy and practice. Evaluation & Research in Education, 14, 181–192. doi:10.1080/09500790008666971
  • Rowan, B., Correnti, R., & Miller, R. (2002). What large-scale, survey research tells us about teacher effects on student achievement: Insights from the prospects study of elementary schools. Teachers College Record, 104(8), 1525–1567. doi:10.1111/tcre.2002.104.issue-8
  • Ruiz-Primo, M. A., Li, M., Wills, K., Giamellaro, M., Lan, M.-C., Mason, H., & Sands, D. (2012). Developing and evaluating instructionally sensitive assessments in science. Journal of Research in Science Teaching, 49(6), 691–712. doi:10.1002/tea.v49.6
  • Ruiz-Primo, M. A., Shavelson, R. J., Hamilton, L., & Klein, S. (2002). On the evaluation of systemic science education reform: Searching for instructional sensitivity. Journal of Research in Science Teaching, 39(5), 369–393. doi:10.1002/(ISSN)1098-2736
  • Sato, T. (1975). The construction and interpretation of S-P tables. Tokyo, Japan: Meiji Tosho.
  • Tatsuoka, K., & Tatsuoka, M. M. (1980). Detection of aberrant response patterns and their effects on dimensionality (Research Report 80-4.). Urbana, IL: University of Illinois, Computer-based Education Research Laboratory.
  • Webb, N. L. (2007). Issues related to judging the alignment of curriculum standards and assessments. Applied Measurement in Education, 20(1), 7–25. doi:10.1080/08957340709336728
  • Welsh, M. E., Eastwood, M., & D’Agostino, J. V. (2014). Conceptualizing teaching to the test under standards-based reform. Applied Measurement in Education, 27(2), 98–114. doi:10.1080/08957347.2014.880439
  • White, M., & Rowan, B. (2013). User guide to measure of effective teaching longitudinal database. Ann Arbor, MI: Inter-University Consortium for Political and Social Research; University of Michigan.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.