758
Views
8
CrossRef citations to date
0
Altmetric
Articles

Predicting item difficulty of science national curriculum tests: the case of key stage 2 assessments

, , &
Pages 59-82 | Received 23 Dec 2015, Accepted 29 Aug 2016, Published online: 03 Nov 2016

References

  • Ahmed, A., & Pollitt, A. (1999). Curriculum demands and question difficulty. Paper presented at the International Association for Educational Assessment, Bled.
  • Ahmed, A., & Pollitt, A. (2007). Improving the quality of contextualized questions: An experimental investigation of focus. Assessment in Education: Principles, Policy & Practice, 14(2), 201–232. doi:10.1080/09695940701478909
  • Anagnostopoulou, K., Hatzinikita, V., & Christidou, V. (2012). PISA and biology school textbooks: The role of visual material. Procedia - Social and Behavioral Sciences, 46, 1839–1845. doi:10.1016/j.sbspro.2012.05.389
  • Barker, V. (2000). Beyond appearances: Students’ misconceptions about basic chemical ideas. A report prepared for the Royal Society of Chemistry. London: Education Division, Royal Society of Chemistry. Retrieved from http://www.dougdelamatter.com/website1/science/philosophy/articles/royal.pdf
  • Black, P., & Wiliam, D. (1998). Inside the black box: Raising standards through classroom assessment. London: King's College.
  • Bramley, T., Hughes, S., Fisher-Hoch, H., & Pollitt, A. (1998). Sources of difficulty in examination questions: Science. Research and Evaluation Division, U.C.L.E.S.
  • Carney, R.N., & Levin, J.R. (2002). Pictorial illustrations still improve students’ learning from text. Educational Psychology Review, 14(1), 5–26.
  • Crossley, S.A., Greenfield, J., & McNamara, D.S. (2008). Assessing text readability using cognitively based indices. TESOL Quarterly, 42(3), 475–493. doi:10.1002/j.1545-7249.2008.tb001
  • Crossley, S.A., Louwerse, M.M., McCarthy, P.M., & McNamara, D.S. (2007). A linguistic analysis of simplified and authentic texts. Modern Language Journal, 91(2), 15–30. doi:10.1111/j.1540-4781.2007.00507.x
  • DFEE. (1999). Weighing the baby: The report of the Independent Scrunity Panel on the 1999 Key Stage 2 National Curriculum tests in English and Mathematics. London: DFEE.
  • Ferrara, S., & Duncan, T. (2011). Comparing science achievement constructs: Targeted and achieved. The Educational Forum, 75(2), 143–156.
  • Ferrara, S., Phillips, G., Williams, P., Leinwand, S., Mohoney, S., & Ahadi, S. (2007). Vertically articulated performance standards: An exploratory study of inferences about achievement and growth. In R. Lissitz (Ed.), Assessing and modeling cognitive development in school (pp. 31–63). Maple Grove, MN: JAM Press.
  • Ferrara, S., & Steedle, J. (2014). GED item difficulty modeling results and recommendations. Paper presented to the General Educational Development Testing Service (GEDTS). Pearson Center for Next Generation Learning and Assessment.
  • Ferrara, S., Svetina, D., Skucha, S., & Davidson, A.H. (2011). Test development with performance standards and achievement growth in mind. Educational Measurement: Issues and Practice, 30(4), 3–15. doi:10.1111/j.1745-3992.2011.00218.x
  • Filippatou, D., & Pumfrey, P.D. (1996). Pictures, titles, reading accuracy and reading comprehension: A research review (1937–1995). Educational Research, 38(3), 259–291.
  • Fisher-Hoch, H., Hughes, S., & Bramley, T. (1997). What makes GCSE examination questions difficult? Outcomes of manipulating difficulty of GCSE questions. Paper presented at the British Educational Research Association Annual Conference, University of York.
  • Graesser, A.C., McNamara, D.S., & Kulikowich, J.M. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics. Educational Researcher, 40(5), 223–234. doi:10.3102/0013189X11413260
  • Haag, N., Heppt, B., Stanat, P., Kuhl, P., & Pant, H.A. (2013). Second language learners’ performance in mathematics: Disentangling the effects of academic language features. Learning and Instruction, 28, 24–34. doi:10.1016/j.learninstruc.2013.04.001
  • Hambleton, R.K., & Jirka, S.J. (2006). Anchor-based methods for judgmentally estimating item statisics. In S.M. Downing & T.M. Haladyna (Eds.), Handbook of test Development (pp. 399–420). Mahwah, NJ: Lawrence Erlbaum.</bib
  • He, Q., Anwyll, S., Glanville, M., & Opposs, D. (2014). An investigation of measurement invariance of the Key Stage 2 National Curriculum science sampling test in England. Research Papers in Education, 29(2), 211–239. doi:10.1080/00131881.2013.844942
  • Impara, J.C., & Plake, B.S. (1998). Teachers’ ability to estimate item difficulty: A test of the assumptions in the Angoff standard setting method. Journal of Educational Measurement, 35(1), 69–81.
  • Jones, B.E. (1993). GCSE inter-group cross-moderation studies 1992. Summary report on studies undertaken on the summer 1992 examinations in English, mathematics and science. Inter-Group Research Committee for the GCSE.
  • Jones, B.L., Lynch, P.P., & Reesink, C. (1987). Children's conceptions of the earth, sun and moon. International Journal of Science Education, 9(1), 43–53. doi:10.1080/0950069870090106
  • Kind, V. (2004). Beyond Appearances: Students’ misconceptions about basic chemical ideas (2nd ed.). Durham: Durham University. Retrieved from http://community.nsee.us/pd/pd2007_assessment/misconceptions/Beyond-appearances.pdf
  • Landauer, T.K., Foltz, P.W., & Laham, D. (1998). Introduction to latent semantic analysis. Discourse Processes, 25, 259–284
  • Levie, W.H., & Lentz, R. (1982). Effects of text illustrations: Review of research. Educational Communication and Technology, 30(4), 195–232.
  • Lin, S. (2004). Development and application of a two-tier diagnostic test for high school students’ understanding of flowering plant growth and development. International Journal of Science and Mathematics Education, 2, 175—-199.
  • Maughan, S., Styles, B., Lin, Y., & Kirkup, C. (2012). Partial estimates of reliability: Parallel form reliability in the Key Stage 2 science tests. In D. Opposs & Q. He (Eds.), Ofqual reliability compendium (pp. 67–90). Coventry: Ofqual.
  • Mayer, R.E. (1997). Multimedia learning: Are we asking the right questions? Educational Psychologist, 32(1), 1–19. doi:10.1207/s15326985ep3201
  • Mayer, R.E. (2001). Multimedia learning. New York, NY: Cambridge University Press.
  • Mayer, R.E., & Moreno, R. (2003). Nine ways to reduce cognitive load in multimedia learning. Educational Psychologist, 38, 43–52. doi:10.1207/S15326985EP3801
  • McLone, R.R., & Patrick, H. (1990). Standards in aAdvanced level mathematics. Report of study 1: A study of the demands made by the two approaches to “double mathematics”. An investigation conducted by the Standing Research Advisory Committee of the GCE Examining Boards. Cambridge: University of Cambridge Local Examinations Syndicate.
  • Messick, S. (1993). Validity. In R.L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York, NY: Macmillan.
  • Meyer, L. (2003). Repeat of AQA standards unit analyses originally run for the GCE economics awarding meeting.
  • Moline, S. (1995). I see what you mean. York, ME: Stenhouse.
  • Newton, P.E. (2009). The reliability of results from national curriculum testing in England. Educational Research, 51(2), 181–212.
  • Norris, S.P., & Phillips, L.M. (2003). How literacy in its fundamental sense is central to scientific literacy. Science Education, 87(2), 224–240.
  • OECD. (2009). PISA data analysis manual (Vol. 2). Paris: OECD.
  • Osborne, J. (2002). Science without literacy: A ship without a sail? Cambridge Journal of Education, 32(2), 203–218. doi:10.1080/03057640220147559
  • Papageorgiou, G., & Sakka, D. (2000). Primary school teachers’ views on fundamental chemical concepts. Chemistry Education: Research and Practice in Europe, 1(2), 237–247.
  • Parker, J., & Heywood, D. (1998). The earth and beyond: Developing primary teachers’ understanding of basic astronomical events. International Journal of Science Education, 20(5), 503–520. doi:10.1080/0950069980200501
  • Pollitt, A., & Ahmed, A. (1999). A new model of the question answering process. Paper presented at the International Association for Educational Assessment, Bled.
  • Pollitt, A., & Ahmed, A. (2000). Comprehension failures in educational assessment. Paper presented at the European Conference on Educational Research, Edinburgh.
  • Pollitt, A., Ahmed, A., & Crisp, V. (2007). The demands on examination syllabuses and question papers. In P. Newton, J.-A. Baird, H. Goldstein, H. Patrick, & P. Tymms (Eds.), Techniques for monitoring the comparability of examination standards (pp. 166–206). London: Qualifications and Curriculum Authority.
  • Pollitt, A., Entwistle, N.J., Hutchinson, C.J., & De Luca, C. (1985). What makes exam questions difficult? Edinburgh: Scottish Academic Press.
  • Pollitt, A., Hughes, S., Ahmed, A., Fisher-Hoch, H., & Bramley, T. (1998). The effects of structure on the demands in GCSE and A level questions. Report to Qualifications and Curriculum Authority. University of Cambridge Local Examinations Syndicate.
  • Reay, D., & Wiliam, D. (1999). ’I’ ll be a nothing’: Structure, agency and the construction of identity through assessment. British Educational Research Journal, 25(3), 343–354. doi:10.1080/0141192990250305
  • Samejima, F. (1997). Graded response model. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of modern item response theory (pp. 85–100). New York, NY: Springer Science.
  • SATs-Papers.co.uk. (2015, May 1st). Retrieved from http://www.sats-papers.co.uk/sats-papers-ks2.php?route=mustregister
  • Sharma, S.V. (2006). High school students interpreting tables and graphs: Implications for research. International Journal of Science and Mathematics Education, 4(2), 241–268. doi:10.1007/s10763-005-9005-8
  • Sharp, J.G. (1996). Children's astronomical beliefs: A preliminary study of Year 6 children in south-west England. International Journal of Science Education, 18(6), 685–712.
  • Shorrocks-Taylor, D., & Hargreaves, M. (1999). Making it clear: A review of language issues in testing with special reference to the National Curriculum mathematics tests at key stage 2. Educational Research, 41(2), 123–136. doi:10.1080/0013188990410201
  • Stobart, G. (2001). The validity of National Curriculum assessment. British Journal of Educational Studies, 49(1), 26–39. doi:10.1111/1467-8527.t01-1-00161
  • Tabachnik, B., & Fidell, L.S. (2007). Using multivariate statistics (5th ed.). Boston, MA: Pearson.
  • Vosniadou, S. (1991). Designing curricula for conceptual restructuring: Lessons from the study of knowledge acquisition in astronomy. Curriculum Studies, 23(3), 219–237.
  • Webb, N.L. (2007). Issues related to judging the alignment of the curriculum standards and assessments. Applied Measurement in Education, 20(1), 7–25.
  • Wolf, L.F., Smith, J.K., & Birnbaum, M.E. (1995). Consequence of performance, test, motivation, and mentally taxing items. Applied Measurement in Education, 8(4), 341–351.
  • Yore, L.D., & Treagust, D.F. (2006). Current Realities and Future Possibilities: Language and science literacy – empowering research and informing instruction. International Journal of Science Education, 28(2-3), 291–314. doi:10.1080/09500690500336973

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.