768
Views
1
CrossRef citations to date
0
Altmetric
Original Articles

Examining the inseparability of content knowledge from LSP reading ability: an approach combining bifactor-multidimensional item response theory and structural equation modeling

&

References

  • Ackerman, T. (1994). Using multidimensional item response theory to understand what items and tests are measuring. Applied Measurement in Education, 7(4), 255–278.
  • Alderson, J. C. (1981). Report of the discussion on testing English for specific purposes. In J. C. Alderson & A. Hughes (Eds.), Issues in language testing (pp. 123–134). London, UK: The British Council.
  • Alderson, J. C., & Urquhart, A. (1988). This test is unfair: I’m not an economist. In P. L. Carrell, J. Devine, & D. E. Eskey (Eds.), Interactive approaches to second language reading (pp. 168–182). Cambridge, UK: Cambridge University Press.
  • Alderson, J. C., & Urquhart, A. H. (1985). The effect of students’ academic discipline on their performance on ESP reading tests. Language Testing, 2(2), 192–204. doi:10.1177/026553228500200207
  • American Educational Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in Education [NCME]. (2014). Standards for educational and psychological testing. Washington, DC: Americal Psychological Association (APA).
  • Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford, UK: Oxford University Press.
  • Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press.
  • Bachman, L. F., & Kunnan, A. J. (2005). Statistical analyses for language assessment workbook and CD ROM. Cambridge: Cambridge University Press.
  • Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing and developing useful language tests. Oxford, UK: Oxford University Press.
  • Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice. Oxford, UK: Oxford University Press.
  • Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. doi:10.1007/BF02293801
  • Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153–168. doi:10.1007/BF02294533
  • Byrne, B. M. (2010). Structural equation modeling with Mplus: Basicconcepts, applications, and programming. New York, NY, and London, UK: Routledge.
  • Cai, L. (2010a). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika, 75(1), 33–57. doi:10.1007/s11336-009-9136-x
  • Cai, L. (2010b). Metropolis-Hastings Robbins-Monro algorithm for confirmatory item factor analysis. Journal of Educational and Behavioral Statistics, 35(3), 307–335. doi:10.3102/1076998609353115
  • Cai, L., Du Toit, S., & Thissen, D. (2011). IRTPRO: Flexible, multidimensional, multiple categorical IRT modeling [Computer software]. Chicago, IL: Scientific Software International.
  • Cai, L., Thissen, D., & Du Toit, S. (2011). IRTPRO user’s guide. Lincolnwood, IL: Scientific Software International.
  • Cai, L., Yang, J. S., & Hansen, M. (2011). Generalized full-information item bifactor analysis. Psychological Methods, 16, 221–248. doi:10.1037/a0023350
  • Cai, Y. (2015). The value of using test responses data for content validity: An application of the bifactor-MIRT to a nursing knowledge test. Nurse Education Today, 35(2), 1181–1185. doi:10.1016/j.nedt.2015.05.014
  • Chapelle, C., Enright, M., & Jamieson, J. (2008). Building a validity argument for the Test of English as a Foreign Language. New York, NY: Routledge.
  • Chen, W. H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265–289. doi:10.3102/10769986022003265
  • Cho, E., & Kim, S. (2015). Cronbach’s coefficient alpha well known but poorly understood. Organizational Research Methods, 18(2), 207–230. doi:10.1177/1094428114555994
  • Clapham, C. (1996). The development of IELTS: A study of the effect of background knowledge on reading comprehension. Cambridge, UK: Cambridge University Press.
  • Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78(1), 98–104. doi:10.1037/0021-9010.78.1.98
  • Davies, A. (2001). The logic of testing languages for specific purposes. Language Testing, 18(2), 133–147. doi:10.1177/026553220101800202
  • DeMars, C. E. (2012). Confirming testlet effects. Applied Psychological Measurement, 36(2), 104–121. doi:10.1177/0146621612437403
  • DeMars, C. E. (2013). A tutorial on interpreting bifactor model scores. International Journal of Testing, 13(4), 354–378. doi:10.1080/15305058.2013.799067
  • Douglas, D. (2000). Assessing language for specific purposes. Cambridge, UK: Cambridge University Press.
  • Douglas, D. (2013). ESP and assessment. In B. Paltridge & S. Starfield (Eds.), The handbook of English for specific purposes (pp. 367–383). Malden, MA: Wiley-Blackwell.
  • Enders, C. K. (2010). Applied missing data analysis. New York, NY, and London,UK: The Guilford Press.
  • Faber, P. (2012). A cognitive linguistics view of terminology and specialized language. Berlin, Germany: Walter de Gruyter.
  • Fulcher, G. (2000). The “communicative” legacy in language testing. System, 28(4), 483–497. doi:10.1016/s0346-251x(00)00033-6
  • Gibbons, R. D., & Hedeker, D. R. (1992). Full-information item bi-factor analysis. Psychometrika, 57(3), 423–436. doi:10.1007/BF02295430
  • Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36(2), 193–202.
  • Huhta, M., Vogt, K., Johnson, E., Tulkki, H., & Hall, D. R. (2013). Needs analysis for language course design: A holistic approach to ESP. Cambridge, UK: Cambridge University Press.
  • IBM Corporation. (1989–2011). IBM SPSS Statistics for Windows, Version 20.0 [computer program]. Armonk, NY: Author.
  • Jöreskog, K. G., Sörbom, D., & Du Toit, S. (2001). LISREL 8: New statistical features. Lincolnwood, IL: Scientific Software International.
  • Kim, H., & Elder, C. (2015). Interrogating the construct of aviation English: Feedback from test takers in Korea. Language Testing, 32(2), 1–21. doi:10.1177/0265532214544394
  • Kintsch, W. (2012). Psychological models of reading comprehension and their implications for assessment. In J. Sabatini, E. Albro, & T. O’Reilly (Eds.), Measuring up: Advances in how we assess reading ability (pp. 21–38). Plymouth, UK: Rowman & Littlefield Education.
  • Kintsch, W., & Van Dijk, T. A. (1978). Toward a model of text comprehension and production. Psychological Review, 85(5), 363–394. doi:10.1037/0033-295X.85.5.363
  • Knoch, U., & Macqueen, S. (2016). Language assessment for the workplace. In D. Tsagari & J. Banerjee (Eds.), Handbook of second language assessment (pp. 291–307). Boston, MA: De Gruyter Mouton.
  • Krashen, S., & Brown, C. L. (2007). What is academic language proficiency. STETS Language & Communication Review, 6(1), 1–5.
  • Krekeler, C. (2006). Language for Special Academic Purposes (LSAP) testing: The effect of background knowledge revisited. Language Testing, 23(1), 99–130. doi:10.1191/0265532206lt323oa
  • Kunnan, A. J. (2014). Fairness and justice in language assessment. In A. J. Kunnan (Ed.), The companion to language assessment (pp. 1098–1114). Malden, MA: Wiley.
  • Kunnan, A. J. (2017). Evaluating language assessments. New York, NY: Routledge.
  • Lee, Y. W. (2004). Examining passage-related local item dependence (LID) and measurement construct using Q3 statistics in an EFL reading comprehension test. Language Testing, 21(1), 74–100. doi:10.1191/0265532204lt260oa
  • Li, Y., Bolt, D. M., & Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological Measurement, 30(1), 3–21. doi:10.1177/0146621605275414
  • Liu, G. Z., Chiu, W. Y., Lin, C. C., & Barrett, N. E. (2014). English for Scientific Purposes (EScP): Technology, trends, and future challenges for science education. Journal of Science Education and Technology, 23(6), 827–839. doi:10.1007/s10956-014-9515-7
  • Marsh, H. W., & Hocevar, D. (1985). Application of confirmatory factor analysis to the study of self-concept: First-and higher order factor models and their invariance across groups. Psychological Bulletin, 97(3), 562–582. doi:10.1037/0033-2909.97.3.562
  • McDonald, R. P. (2000). A basis for multidimensional item response theory. Applied Psychological Measurement, 24(2), 99–114. doi:10.1177/01466210022031552
  • McNamara, D. S., Graesser, A. C., McCarthy, P. M., & Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix. Cambridge, UK: Cambridge University Press.
  • METS. (2007). Medical english test system level two (for Nurses). Shanghai, China: Higher Education Press.
  • Ministry of Health. (2007). China nurse licensing examination syllabus 2007. Beijing, China: People’s Medical Publishing House.
  • Muthén, L. K., & Muthén, B. Q. (1998–2015). Mplus 7.4 [Computer software]. Los Angeles, CA: Muthén & Muthén.
  • Paap, M. C., & Veldkamp, B. P. (2012). Minimizing the testlet effect: Identifying critical testlet features by means of tree-based regression. In T. J. H. M. Eggen & B. P. Veldkamp (Eds.), Psychometrics in Practice at RCEC. Enschede, the Netherlands: RCEC, Cito/University of Twente.
  • Purpura, J. E. (1998). The development and construct validation of an instrument designed to investigate selected cognitive background characteristics of test-takers. In A. J. Kunnan (Ed.), Validation in language assessment (pp. 111–140). Mahwah, NJ: Lawrance Eribaum.
  • Purpura, J. E. (2004). Assessing grammar. Cambridge, UK: Cambridge University Press.
  • Purpura, J. E. (2017). Assessing meaning. In E. Shohamy, I. G. Or, & S. May (Eds.), Language testing and assessment (3rd ed., pp. 33–61). New York, NY: Springer.
  • Reckase, M. D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9(4), 401–412. doi:10.1177/014662168500900409
  • Reckase, M. D. (2009). Multidimensional item response theory. London, UK, and New York, NY: Springer.
  • Reise, S. P. (2012). The rediscovery of bifactor measurement models. Multivariate Behavioral Research, 47, 667–696. doi:10.1080/00273171.2012.715555
  • Sireci, S. G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet‐based tests. Journal of Educational Measurement, 28(3), 237–247. doi:10.1111/jedm.1991.28.issue-3
  • Steinberg, L., & Thissen, D. (2013). Item response theory. In J. S. Comer & P. C. Kendall (Eds.), The Oxford handbook of research strategies for clinical psychology (pp. 336–373). Oxford, UK: Oxford University Press.
  • Tapiero, I. (2007). Situation models and levels of coherence: Toward a definition of comprehension. London, UK: Taylor & Francis.
  • Thissen, D. (2013). Using the testlet response model as a shortcut to multidimensional item response theory subscore computation. In R. E. Millsap, L. A. Van Der Ark, D. M. Bolt, & C. M. Woods (Eds.), New developments in quantitative psychology (pp. 29–40). New York, NY: Springer.
  • Usó-Juan, E. (2006). The compensatory nature of discipline-related knowledge and English-language proficiency in reading English for Academic purposes. The Modern Language Journal, 90(2), 210–227. doi:10.1111/modl.2006.90.issue-2
  • Van Der Linden, W. J., & Glas, C. A. (2000). Computerized adaptive testing: Theory and practice. New York, NY: Springer.
  • Van Der Linden, W. J., & Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. New York, NY: Springer.
  • Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. Cambridge, UK: Cambridge University Press.
  • Wainer, H., & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24(3), 185–201. doi:10.1111/jedm.1987.24.issue-3
  • Wainer, H., & Wang, X. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37(3), 203–220. doi:10.1111/jedm.2000.37.issue-3
  • Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30(3), 187–213. doi:10.1111/jedm.1993.30.issue-3

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.