References
- Au, W. (2007). High-stakes testing and curricular control: A qualitative metasynthesis. Educational Researcher, 36(1), 258–267. https://doi.org/10.3102/0013189X07306523
- Bloxham, S., Boyd, P., & Orr, S. (2011). Mark my words: The role of assessment criteria in UK higher education grading practices. Studies in Higher Education, 36(6), 655–670. https://doi.org/10.1080/03075071003777716
- Bloxham, S., den-Outer, B., Hudson, J., & Price, M. (2016). Let’s stop the pretence of consistent marking: Exploring the multiple limitations of assessment criteria. Assessment & Evaluation in Higher Education, 41(3), 466–481. https://doi.org/10.1080/02602938.2015.1024607
- Bouwer, R., Verhavert, S., Lesterhuis, M., van Gasse, R., Donche, V., & De Maeyer, S. (2017, November). Interpreting the validity of misfit statistics in comparative judgement. AEA Europe.
- Brimi, H. M. (2011). Reliability of grading high school work in English. Practical Assessment, Research & Evaluation, 16(1), 1–12. https://doi.org/10.7275/j531-fz38
- Brookhart, S. M. (2013). The use of teacher judgement for summative assessment in the USA. Assessment in Education: Principles, Policy & Practice, 20(1), 69–90. https://doi.org/10.1080/0969594X.2012.703170
- Brookhart, S. M., & Chen, F. (2015). The quality and effectiveness of descriptive rubrics. Educational Reviews, 67(3), 343–368. https://doi.org/10.1080/00131911.2014.929565
- Brookhart, S. M., Guskey, T. R., Bowers, A. J., McMillan, J. H., Smith, J. K., Smith, L. F., Stevens, M. T., & Welsh, M. E. (2016). A century of grading research: Meaning and value in the most common educational measure. Review of Educational Research, 86(4), 803–848. https://doi.org/10.3102/0034654316672069
- Brookhart, S. M., & Nitko, A. J. (2008). Assessment and grading in classrooms. Pearson.
- Duncan, C. R., & Noonan, B. (2007). Factors affecting teachers’ grading and assessment practices. The Alberta Journal of Educational Research, 53(1), 1–21.
- Eells, W. C. (1930). Reliability of repeated grading of essay type examinations. Journal of Educational Psychology, 2(1), 48–52. https://doi.org/10.1037/h0071103
- Guskey, T. R., & Brookhart, S. M. (2019). What we know about grading: What works, what doesn’t, and what’s next. ASCD.
- Harlen, W., & Deakin Crick, R. (2003). Testing and motivation for learning. Assessment in Education: Principles, Policy & Practice, 10(2), 169–207. https://doi.org/10.1080/0969594032000121270
- Isnawati, I., & Saukah, A. (2017). Teachers’ grading decision making. TEFLIN Journal, 28(2), 155–169. https://doi.org/10.15639/teflinjournal.v28i2/155-169
- Jönsson, A., & Balan, A. (2018). Analytic or holistic: A study of agreement between different grading models. Practical Assessment, Research & Evaluation, 23(12), 1–11. https://doi.org/10.7275/z3gm-fp34
- Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Review, 2(2), 130–144. https://doi.org/10.1016/j.edurev.2007.05.002
- Korp, H. (2006). Lika chanser i gymnasiet? En studie om betyg, nationella prov och social reproduktion [Equal opportunities in upper-secondary school? A study about grades, national tests, and social reproduction] [Doctoral dissertation]. Malmö University.
- Kunnath, J. P. (2017). Teacher grading decisions: Influences, rationale, and practices. American Secondary Education, 45(3), 68–88.
- Malouff, J. M., & Thorsteinsson, E. B. (2016). Bias in grading: A meta-analysis of experimental research findings. Australian Journal of Education, 60(3), 245–256. https://doi.org/10.1177/0004944116664618
- McMillan, J. H. (2003). Understanding and improving teachers’ classroom assessment decision making: Implications for theory and practice. Educational Measurement: Issues and Practice, 22(4), 34–43. https://doi.org/10.1111/j.1745-3992.2003.tb00142.x
- Nijveldt, M. J. (2007). Validity in teacher assessment: An exploration of the judgement processes of assessors [Doctoral dissertation]. Leiden University.
- Panadero, E., & Jonsson, A. (2020). A critical review of the arguments against the use of rubrics. Educational Research Review, 30, 100329. https://doi.org/10.1016/j.edurev.2020.100329
- Parkes, J. (2013). Reliability in classroom assessment. In J. H. McMillan (Ed.), SAGE handbook of research on classroom assessment (pp. 107–123). SAGE.
- Pedulla, J. J., Abrams, L. M., Madaus, G. F., Russell, M. K., Ramos, M. A., & Miao, J. (2003). Perceived effects of state-mandated testing programs on teaching and learning: Findings from a national survey of teachers. National Board on Educational Testing and Public Policy.
- Randall, J., & Engelhard, G. (2008). Differences between teachers’ grading practices in elementary and middle schools. The Journal of Educational Research, 102(3), 175–185. https://doi.org/10.3200/JOER.102.3.175-186
- Sadler, D. R. (2005). Interpretations of criteria-based assessment and grading in higher education. Assessment & Evaluation in Higher Education, 30(2), 175–194. https://doi.org/10.1080/0260293042000264262
- Sadler, D. R. (2009). Indeterminacy in the use of preset criteria for assessment and grading. Assessment & Evaluation in Higher Education, 34(2), 159–179. https://doi.org/10.1080/02602930801956059
- Sadler, R. D. (1987). Specifying and promulgating achievement standards. Oxford Review of Education, 13(2), 191–209. https://doi.org/10.1080/0305498870130207
- Starch, D., & Elliott, E. C. (1912). Reliability of grading high-school work in English. The School Review, 20(7), 442–457. https://doi.org/10.1086/435971
- Starch, D., & Elliott, E. C. (1913a). Reliability of grading work in mathematics. The School Review, 21(4), 254–259. https://doi.org/10.1086/436086
- Starch, D., & Elliott, E. C. (1913b). Reliability of grading work in history. The School Review, 21(10), 676–681. https://doi.org/10.1086/436185
- Stemler, S. E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research & Evaluation, 9(4), 1–11. https://doi.org/10.7275/96jp-xz07
- Swedish National Agency of Education. (2019). Analyser av likvärdig betygssättning mellan elevgrupper och skolor [Analyses of equal grading between groups of students and schools] (Report No. 475).
- Tomas, C., Whitt, E., Lavelle-Hill, R., & Severn, K. (2019). Modelling holistic marks with analytic rubrics. Frontiers in Education: Assessment, Testing and Applied Measurement, 4(89). https://doi.org/10.3389/feduc.2019.00089