References
- Backhouse, J. (1976). Determination of grades for two groups sharing a common paper. Educational Research, 18(2), 126–137.
- Baird, J., Ahmed, A., Hopfenbeck, T., Brown, C., & Elliott, V. (2013). Research evidence relating to proposals for reform of the GCSE. Oxford: OUCEA, University of Oxford. Retrieved from http://oucea.education.ox.ac.uk/wordpress/wp-content/uploads/2013/04/WCQ-report-final.pdf
- Baird, J., Fearnley, A., Fowles, D., Jones, B., Morfidi, E., & White, D. (2001). Tiering in the GCSE: A study undertaken by AQA on behalf of the Joint Council for General Qualifications. London: Joint Council for General Qualifications.
- Baird, J., & Ireson, J. (2001). Teachers’ views on tiering and ability grouping at GCSE (AQA Internal Report). Guildford: AQA.
- Benton, T (2013, September). Examining the impact of tiered examinations on the aspirations of young people. Paper presented at British Educational Research Association Conference, Brighton. Retrieved from http://www.cambridgeassessment.org.uk/Images/145926-examining-the-impact-of-tiered-examinations-on-the-aspirations-of-young-people.pdf
- Brennan, R. (2001). An essay on the history and future of reliability from the perspective of replications. Journal of Educational Measurement, 38, 295–317.
- Center on International Education Benchmarking (CIEB). (2014). Top performing countries. Retrieved from http://www.ncee.org/programs-affiliates/center-on-international-education-benchmarking/top-performing-countries/
- Crooks, T., Kane, M., & Cohen, A. (1996). Threats to the valid use of assessments. Assessment in Education, 3, 265–286.
- Dorans, N., Moses, T., & Eignor, D. (2010). Principles and practices of test score equating. ETS. Retrieved from http://www.ets.org/Media/Research/pdf/RR-10-29.pdf
- Elwood, J. (2005). Gender and achievement: What have exams got to do with it? Oxford Review of Education, 31, 373–393.
- Elwood, J., & Murphy, P. (2002). Tests, tiers and achievement: Gender and performance at 16 and 14 in England. European Journal of Education, 37, 395–416.
- Gillborn, D., & Youdell, D. (2000). Rationing education: Policy, practice, reform and equity. Buckingham: Open University Press.
- Good, F., & Cresswell, M. (1988a). Grade awarding judgements in differentiated examinations. British Educational Research Journal, 14, 263–280.
- Good, F., & Cresswell, M. (1988b). Placing candidates who take differentiated papers on a common grade scale. Educational Research, 30, 177–189.
- Good, F., & Cresswell, M. (1988c). Can teachers enter candidates appropriately for examinations including differentiated papers? Educational Studies, 14, 289–97.
- Gove, M. (2013, February 6). Ofqual policy steer letter: Reforming key stage 4 qualifications. Letter to Ofqual. Retrieved from https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/278308/sos_ofqual_letter_060213.pdf
- Haertel, E. (2006). Reliability. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 65–110). Westport, CT: The American Council on Education/Praeger.
- Hamer, J., Murphy, R., Mitchell, T., Grant, A., & Smith, J. (2013). English Baccalaureate Certificate (EBC) proposals: Examining with and without tiers. London: Pearson. Retrieved from http://uk.pearson.com/content/dam/ped/pei/uk/pearson-uk/Documents/wcq/WCQ_Research_Tiering_in_GCSE_FINAL.pdf
- Harris, D. J. (1991). A comparison of Angoff's design I and design II for vertical equating using traditional and IRT methodology. Journal of Educational Measurement, 28, 221–235.
- He, Q. (2012). On-demand testing and maintaining standards for general qualifications in the UK using item response theory: Possibilities and challenges. Educational Research, 54, 89–112.
- He, Q., Glanville, M., Rodriguez-Jaramillo, L., & Opposs, D. (2013). Perceived impact of high stakes national tests and public examinations in England. Coventry: Ofqual.
- Ireson, J. (2008). Learners, learning and educational activity. Foundations and futures of education. Abingdon: Routledge.
- Ireson, J., & Hallam, S. (2009) Academic self-concepts in adolescence: Relations with achievement and ability grouping in schools. Learning and Instruction, 19, 201–213.
- Ireson, J., Hallam, S., & Hurley, C. (2005). What are the effects of ability grouping on GCSE attainment? British Educational Research Journal, 31, 443–458.
- Kane, M. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Washington, DC: American Council on Education/Praeger.
- Kielstra, P. (2012). The Learning Curve 2012 report: Lessons in country performance. London: Pearson. Retrieved from http://thelearningcurve.pearson.com/reports/the-learning-curve-report-2012
- Kielstra, P. (2014). The Learning Curve 2014 report: Education and skills for life. London: Pearson. Retrieved from http://thelearningcurve.pearson.com/reports/the-learning-curve-report-2014
- Kingdon, J., French, S., Pierce, G., & Woodthorpe, A. (1983). Awarding grades on differentiated papers in school examinations at 16 plus. Educational Research, 25, 220–229.
- Kolen, M., & Brennan, R. (2004). Test equating, scaling, and linking: Methods and practices. Berlin: Springer-Verlag.
- Kutnick, P., Sebba, J., Blatchford, P., Galton, M., & Thorp, J. (2005). The effects of pupil grouping: Literature review. Retrieved from https://www.education.gov.uk/publications/eOrderingDownload/RR688.pdf
- Lissitz, R., & Huynh, H. (2003). Vertical equating for state assessments: Issues and solutions in determination of adequate yearly progress and school accountability. Practical Assessment, Research & Evaluation, 8(10). Retrieved from http://PAREonline.net/getvn.asp?v=8&n=10
- Luecht, R., Brumfield, T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19, 189–202.
- Luecht, R. & Sireci, S. (2011). A Review of Models for Computer-Based Testing. New York: The College Board. Retrieved from http://research.collegeboard.org/sites/default/files/publications/2012/7/researchreport-2011-12-review-models-for-computer-based-testing.pdf
- McVittie, J. (2008). National qualifications: A short history. Glasgow: Scottish Qualifications Authority (SQA). Retrieved from http://www.sqa.org.uk/files_ccc/PNP_ResearchReport3_NationalQualificationsAShortHistory.pdf
- Mead, A. (2006). An introduction to multistage testing. Applied Measurement in Education, 19, 185–187.
- Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York, NY: The American Council on Education/MacMillan.
- Newton, P. (2012). Clarifying the consensus definition of validity. Measurement: Interdisciplinary Research and Perspectives, 10, 1–29.
- Oates, T. (2013). Tiering in GCSE – which structure holds most promise. Retrieved from http://www.cambridgeassessment.org.uk/ca/digitalAssets/207172_TieringInGCSE_TimOates_250413_FINAL.pdf
- Office for Standards in Education (Ofsted). (2012a). Mathematics: Made to measure. Retrieved from http://www.ofsted.gov.uk/resources/mathematics-made-measure
- Office for Standards in Education (Ofsted). (2012b). Moving English forward. Retrieved from http://www.ofsted.gov.uk/resources/moving-english-forward
- Office of Qualifications and Examinations Regulation (Ofqual). (2012). GCSE English 2012. Retrieved from http://www.ofqual.gov.uk/files/2012-11-02-gcse-english-final-report-and-appendices.pdf
- Office of Qualifications and Examinations Regulation (Ofqual). (2013). Design details of new GCSEs in England. Retrieved from http://ofqual.gov.uk/news/design-details-of-new-gcses-in-england/
- Pace, J. (2003, September 11–13). The differentiated paper system in Maltese SEC examination: Is it promoting quality, equity and fairness? Paper presented at the British Educational Research Association Annual Conference. Edinburgh: Heriot-Watt University. Retrieved from http://www.leeds.ac.uk/educol/documents/00003314.htm
- Reckase, M. (2010). Study of best practices for vertical scaling and standard setting with recommendations for FCAT 2.0. Florida: Florida Department of Education. Retrieved from https://www.fldoe.org/asp/k12memo/pdf/StudyBestPracticesVerticalScalingStandardSetting.pdf
- Stobart, G., White, J., Elwood, J., Hayden, M., & Mason, K. (1992). Differential performance in examinations at 16+: English and mathematics. London: SEAC.
- Tong, T., & Kolen, K. (2008, March). Maintenance of vertical scales. Paper presented at the National Council on Measurement in Education Annual Conference in New York City. Retrieved from http://images.pearsonassessments.com/images/tmrs/tmrs_rg/MaintenanceofVerticalScales.pdf?WT.mc_id=TMRS_Maintenance_of_Vertical_Scales
- Universities and Colleges Admissions Service (UCAS). (2013). International qualifications – for entry to university or college in 2014. Retrieved from http://www.ucas.com/sites/default/files/international-qualifications-2014.pdf
- West, A., Edge, A., & Stokes, E. (1999). Secondary education across Europe: Curricula and school examination systems. London: Centre for Educational Research, London School of Economics and Political Science. Retrieved from http://www.leeds.ac.uk/educol/documents/00001195.htm
- Wheadon, C., & Bèguin, A. (2010). Fears for tiers: Are candidates being appropriately rewarded for their performance in tiered examinations? Assessment in Education, 17, 287–300.
- Wheadon, C., Whitehouse, C., Spalding, V., Tremain, K., and Charman, M. (2009). Principles and practice of on-demand testing. Retrieved from http://www.ofqual.gov.uk/files/2009-01-principles-practice-on-demand-testing.pdf
- Wiliam, D. (1995). It will all end in tiers! British Journal of Curriculum and Assessment, 5(3), 21–24.
- Zheng, Y., Nozawa, Y., Gao, X., & Chang, H-H. (2012). Multistage adaptive testing for a large-scale classification test: The designs, automated heuristic assembly, and comparison with other testing modes. ACT Research Reports 2012-6. Iowa City: ACT. Retrieved from http://files.eric.ed.gov/fulltext/ED542026.pdf