Search in:

Advanced search

Language Assessment Quarterly Volume 19, 2022 - Issue 2

Submit an article Journal homepage

840

Views

CrossRef citations to date

Altmetric

Research Article

Application of Bi-factor MIRT and Higher-order CDM Models to an In-house EFL Listening Test for Diagnostic Purposes

Shangchao Mina Institute of Applied Linguistics, Zhejiang University, Hangzhou, ChinaView further author information

Hongwen Caib Center for Linguistics and Applied Linguistics, Guangdong University of Foreign Studies, Guangzhou, ChinaView further author information

Lianzhen Hea Institute of Applied Linguistics, Zhejiang University, Hangzhou, ChinaCorrespondence[email protected]
View further author information

Pages 189-213 | Published online: 30 Nov 2021

Cite this article
https://doi.org/10.1080/15434303.2021.1980571
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Reprints & Permissions

References

Adams, R., & Wu, M. (Eds.). (2002). PISA 2000 technical report. OECD Publications. https://doi.org/https://doi.org/10.1787/9789264199521-en
Google Scholar
Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52(3), 317–332. https://doi.org/https://doi.org/10.1007/BF02294359
Web of Science ®Google Scholar
Alderson, C. (2005). Diagnosing foreign language proficiency: The interface between learning and assessment. Continuum.
Google Scholar
Alderson, C. (2010). “Cognitive diagnosis and Q-matrices in language assessment”: A commentary. Language Assessment Quarterly, 7(1), 96–103. https://doi.org/https://doi.org/10.1080/15434300903426748
Web of Science ®Google Scholar
Alderson, C., Brunfaut, T., & Harding, L. (2015). Towards a theory of diagnosis in second and foreign language assessment: Insights from professional practice across diverse fields. Applied Linguistics, 36(2), 236–260. https://doi.org/https://doi.org/10.1093/applin/amt046
Web of Science ®Google Scholar
Alderson, C., & Huhta, A. (2005). The development of a suite of computer-based diagnostic tests based on the Common European Framework. Language Testing, 22(3), 301–320. https://doi.org/https://doi.org/10.1191/0265532205lt310oa
Google Scholar
Aryadoust, V. (2011). Application of the fusion model to while-listening performance tests. SHIKEN: JALT Testing & Evaluation SIG Newsletter, 15(2), 2–9. https://hosted.jalt.org/test/ary_2.htm
Google Scholar
Aryadoust, V. (2021). A cognitive diagnostic assessment study of the listening test of the Singapore-Cambridge General Certificate of Education O-Level: Application of DINA, DINO, G-DINA, HO-DINA, and RRUM. International Journal of Listening, 35(1), 29–52. https://doi.org/https://doi.org/10.1080/10904018.2018.1500915
Google Scholar
Aryadoust, V., Foo, S., & Ng, L. Y. (2021). What can gaze behaviors, neuroimaging data, and test scores tell us about test method effects and cognitive load in listening assessments? Language Testing, 1–34. https://doi.org/https://doi.org/10.1177/02655322211026876
Google Scholar
Bolt, D. (2007). The present and future of IRT-based cognitive diagnostic models (ICDMs) and related methods. Journal of Educational Measurement, 44(4), 377–383. https://doi.org/https://doi.org/10.1111/j.1745-3984.2007.00045.x
Web of Science ®Google Scholar
Bolt, D. (2019). Bifactor MIRT as an appealing and related alternative to CDMs in the presence of skill attribute continuity. In M. von Davier & Y.-S. Lee (Eds.), Handbook of diagnostic classification models (pp. 395–417). Springer.
Google Scholar
Bolt, D., & Kim, J.-S. (2018). Parameter invariance and skill attribute continuity in the DINA model. Journal of Educational Measurement, 55(2), 264–280. https://doi.org/https://doi.org/10.1111/jedm.12175
Web of Science ®Google Scholar
Bradshaw, L., & Madison, M. (2016). Invariance properties for general diagnostic classification models. International Journal of Testing, 16(2), 99–118. https://doi.org/https://doi.org/10.1080/15305058.2015.1107076
Web of Science ®Google Scholar
Buck, G., & Tatsuoka, K. (1998). Application of the rule-space procedure to language testing: Examining attributes of a free response listening test. Language Testing, 15(2), 119–157. https://doi.org/https://doi.org/10.1177/026553229801500201
Google Scholar
Cai, L. (2010). A two-tier full-information item factor analysis model with applications. Psychometrika, 75(4), 581–612. https://doi.org/https://doi.org/10.1007/s11336-010-9178-0
Web of Science ®Google Scholar
Cai, L., Thissen, D., & du Toit, S. (2011). IRTPRO user’s guide. Scientific Software International, Inc.
Google Scholar
Cai, Y., Tu, D., & Ding, S. (2018). Theorems and methods of a complete Q Matrix with attribute hierarchies under restricted Q-matrix design. Frontiers in Psychology, 9, 1413. https://doi.org/https://doi.org/10.3389/fpsyg.2018.01413
PubMed Web of Science ®Google Scholar
Carroll, J. B. (1972). Defining language comprehension. In R. O. Freedle & J. B. Carroll (Eds.), Language comprehension and the acquisition of knowledge. (pp. 1–29). John Wiley.
Google Scholar
Chen, J., de la Torre, J., & Zhang, Z. (2013). Relative and absolute fit evaluation in cognitive diagnosis modeling. Journal of Educational Measurement, 50(2), 123–140. https://doi.org/https://doi.org/10.1111/j.1745-3984.2012.00185.x
Web of Science ®Google Scholar
Choi, H. J. (2010). A model that combines diagnostic classification assessment with mixture item response theory models [Unpublished doctoral dissertation]. University of Georgia. https://getd.libs.uga.edu/pdfs/choi_hye-jeong_201005_phd.pdf
Google Scholar
Choi, I., & Papageorgiou, S. (2020). Evaluating subscore uses across multiple levels: A case of reading and listening subscores for young EFL learners. Language Testing, 37(2), 254–279. https://doi.org/https://doi.org/10.1177/0265532219879654
Web of Science ®Google Scholar
de la Torre, J., & Douglas, J. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69(3), 333–353. https://doi.org/https://doi.org/10.1007/BF02295640
Web of Science ®Google Scholar
de la Torre, J., & Lee, Y. S. (2010). A note on the invariance of the DINA model parameters. Journal of Educational Measurement, 47(1), 115–127. https://doi.org/https://doi.org/10.1111/j.1745-3984.2009.00102.x
Web of Science ®Google Scholar
DeMars, C. E. (2013). A tutorial on interpreting bifactor model scores. International Journal of Testing, 13(4), 354–378. https://doi.org/https://doi.org/10.1080/15305058.2013.799067
Google Scholar
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Erlbaum.
Google Scholar
Embretson, S. E., & Yang, X. (2013). A multicomponent latent trait model for diagnosis. Psychometrika, 78(1), 14–36. https://doi.org/https://doi.org/10.1007/s11336-012-9296-y
PubMed Web of Science ®Google Scholar
Fan, J., & Yan, X. (2017). From test performance to language use: Using self-assessment to validate a high-stakes English proficiency test. The Asia-Pacific Education Researcher, 26(1–2), 61–73. https://doi.org/https://doi.org/10.1007/s40299-017-0327-4
Web of Science ®Google Scholar
Field, J. (2008). Listening in the language classroom. Cambridge University Press.
Google Scholar
Field, J. (2013). Cognitive validity. In L. Taylor & A. Geranpayeh (Eds.), Examining listening (pp. 77–151). Cambridge University Press.
Google Scholar
Geranpayeh, A., & Kunnan, A. J. (2007). Differential item functioning in terms of age in the certificate in advanced English examination. Language Assessment Quarterly, 4(2), 190–222. https://doi.org/https://doi.org/10.1080/15434300701375758
Google Scholar
Haberman, S., Sinharay, S., & Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical and Statistical Psychology, 62(1), 79–95. https://doi.org/https://doi.org/10.1348/000711007X248875
PubMed Web of Science ®Google Scholar
Harding, L., Alderson, C., & Brunfaut, T. (2015). Diagnostic assessment of reading and listening in a second or foreign language: Elaborating on diagnostic principles. Language Testing, 32(3), 317–336. https://doi.org/https://doi.org/10.1177/0265532214564505
Web of Science ®Google Scholar
He, L., & Chen, D. (2017). Developing common listening ability scales for Chinese learners of English. Language Testing in Asia, 7(4), 1–12. https://doi.org/https://doi.org/10.1186/s40468-017-0033-4
Google Scholar
Henning, G. (1992). Dimensionality and construct validity of language tests. Language Testing, 9(1), 1–11. https://doi.org/https://doi.org/10.1177/026553229200900102
Google Scholar
Holzknecht, F., Eberharter, K., Kremmel, B., Zehentner, M., McCray, G., Konrad, E., & Spöttl, C. (2017). Looking into listening: Using eye-tracking to establish the cognitive validity of the Aptis Listening Test (ARAGs Research Reports). British Council. https://www.britishcouncil.org/sites/default/files/looking_into_listening.pdf
Google Scholar
Jang, E. E. (2009). Cognitive diagnostic assessment of L2 reading comprehension ability: Validity arguments for applying Fusion Model to LanguEdge assessment. Language Testing, 26(1), 31–73. https://doi.org/https://doi.org/10.1177/0265532208097336
Web of Science ®Google Scholar
Jang, E. E., Dunlop, M., Park, G., & Van Der Boom, E. H. (2015). How do young students with different profiles of reading skill mastery, perceived ability, and goal orientation respond to holistic diagnostic feedback? Language Testing, 32(3), 359–383. https://doi.org/https://doi.org/10.1177/0265532215570924
Web of Science ®Google Scholar
Jang, E. E., Kim, H., Vincett, M., Barron, C., & Russell, B. (2019). Improving IELTS reading test score interpretations and utilisation through cognitive diagnosis model-based skill profiling (IELTS Research Reports Online Series No. 2). British Council, Cambridge Assessment English and IDP:. https://www.ielts.org/research/research-reports/online-series-2019-2
Google Scholar
Javidanmehr, Z., & Sarab, A. M. R. (2019). Retrofitting non-diagnostic reading comprehension assessment: Application of the G-DINA model to a high stakes reading comprehension test. Language Assessment Quarterly, 16(3), 294–311. https://doi.org/https://doi.org/10.1080/15434303.2019.1654479
Web of Science ®Google Scholar
Kim, S. Y., Lee, W. C., & Kolen, M. J. (2020). Simple-structure multidimensional item response theory equating for multidimensional tests. Educational and Psychological Measurement, 80(1), 91-125. https://doi.org/https://doi.org/10.1177/0013164419854208
Google Scholar
Kirkpatrick, R., Wang, C., Shin, C.-W., Chien, Y., & Goodman, J. (2013, April). Profile classification for cognitive diagnostic assessment: A simulation study [Conference paper presentation]. The 2013 AnnualMeeting of National Council on Measurement in Education, SanFrancisco, CA, United States.
Google Scholar
Klem, M., Gustafsson, J.-E., & Hagtvet, B. (2015). The dimensionality of language ability in four-year-olds: Construct validation of a language screening tool. Scandinavian Journal of Educational Research, 59(2), 195–213. https://doi.org/https://doi.org/10.1080/00313831.2014.904416
Web of Science ®Google Scholar
Kunnan, A. J., & Jang, E. E. (2009). Diagnostic feedback in language assessment. In M. H. Long & C. J. Doughty (Eds.), The handbook of language teaching (pp. 610–627). Wiley-Blackwell.
Google Scholar
Lee, Y., & Sawaki, Y. (2009). Application of three cognitive diagnosis models to ESL reading and listening assessments. Language Assessment Quarterly, 6(3), 239–263. https://doi.org/https://doi.org/10.1080/15434300903079562
Web of Science ®Google Scholar
Li, H., Hunter, C. V., & Lei, P.-W. (2016). The selection of cognitive diagnostic models for a reading comprehension test. Language Testing, 33(3), 391–409. https://doi.org/https://doi.org/10.1177/0265532215590848
Web of Science ®Google Scholar
Li, X., & Wang, W.-C. (2015). Assessment of differential item functioning under cognitive diagnosis models: The DINA model example. Journal of Educational Measurement, 52(1), 28–54. https://doi.org/https://doi.org/10.1111/jedm.12061
Web of Science ®Google Scholar
Liu, R., Huggins-Manley, A. C., & Bulut, O. (2018). Retrofitting diagnostic classification models to responses from IRT-based assessment forms. Educational and Psychological Measurement, 78(3), 357–383. https://doi.org/https://doi.org/10.1177/0013164416685599
PubMed Web of Science ®Google Scholar
Liu, Y., Li, Z., & Liu, H. (2019). Reporting valid and reliable overall scores and domain scores using bi-factor model. Applied Psychological Measurement, 43(7), 562–576. https://doi.org/https://doi.org/10.1177/0146621618813093
PubMed Web of Science ®Google Scholar
Ma, W., de la Torre, J., Sorrel, M., & Jiang, Z. (2020). Package ‘GDINA’. CRAN. https://cran.r-project.org/web/packages/GDINA/GDINA.pdf
Google Scholar
Maydeu-Olivares, A., & Joe, H. (2014). Assessing approximate fit in categorical data analysis. Multivariate Behavioral Research, 49(4), 305–328. https://doi.org/https://doi.org/10.1080/00273171.2014.911075
PubMed Web of Science ®Google Scholar
McNamara, T. (1996). Measuring second language performance. Longman.
Google Scholar
Mellenbergh, G. L. (2019). Counteracting methodological errors in behavioral research. Springer.
Google Scholar
Min, S., & He, L. (2014). Applying unidimensional and multidimensional item response theory models in testlet-based reading assessment. Language Testing, 31(4), 453–477. https://doi.org/https://doi.org/10.1177/0265532214527277
Web of Science ®Google Scholar
Min, S., & Jiang, Z. (2020). 校本听力考试与《中国英语能力等级量表》对接研究 [Linking the listening subtest of an in-house English proficiency test to China’s Standards of English Language Ability (CSE)]. Foreign Language Education, 41(4), 47–51. https://www.cnki.com.cn/Article/CJFDTotal-TEAC202004009.htm
Google Scholar
Mirzaei, A., Vincheh, M. H., & Hashemian, M. (2020). Retrofitting the IELTS reading section with a general cognitive diagnostic model in an Iranian EAP context. Studies in Educational Evaluation, 64. Article number: 100817. https://doi.org/https://doi.org/10.1016/j.stueduc.2019.100817
Google Scholar
Morin, A. J. S., Arens, A. K., & Marsh, H. W. (2016). A bifactor exploratory structural equation modeling framework for the identification of distinct sources of construct-relevant psychometric multidimensionality. Structural Equation Modeling: A Multidisciplinary Journal, 23(1), 116–139. https://doi.org/https://doi.org/10.1080/10705511.2014.961800
Web of Science ®Google Scholar
Musek, J. (2017). The general factor of personality. Academic Press.
Google Scholar
Neyman, J., & Pearson, E. S. (1992). On the problem of the most efficient tests of statistical hypotheses. In S. Kotz & N. L. Johnson (Eds.), Breakthroughs in statistics (pp. 73–108). Springer.
Google Scholar
Pae, T. (2004). DIF for examinees with different academic backgrounds. Language Testing, 21(1), 53–73. https://doi.org/https://doi.org/10.1191/0265532204lt274oa
Google Scholar
Ranjbaran, F., & Alavi, S. M. (2017). Developing a reading comprehension test for cognitive diagnostic assessment: A RUM analysis. Studies in Educational Evaluation, 55, 167–179. https://doi.org/https://doi.org/10.1016/j.stueduc.2017.10.007
Web of Science ®Google Scholar
Ravand, H., & Robitzsch, A. (2018). Cognitive diagnostic model of best choice: A study of reading comprehension. Educational Psychology, 38(10), 1255–1277. https://doi.org/https://doi.org/10.1080/01443410.2018.1489524
Web of Science ®Google Scholar
Reckase, M. (1979). Unifactor latent trait models applied to multifactor tests: Results and implications. Journal of Educational and Behavioral Statistics, 4(3), 207–230. https://doi.org/http://doi.org/10.3102/10769986004003207
Google Scholar
Reise, S. P. (2012). The rediscovery of bifactor measurement models. Multivariate Behavioral Research, 47(5), 667–696. https://doi.org/https://doi.org/10.1080/00273171.2012.715555
PubMed Web of Science ®Google Scholar
Reise, S. P., Cook, K. F., & Moore, T. M. (2014). Evaluating the impact of multidimensionality on unidimensional item response theory model parameters. In S. P. Reise & D. A. Revicki (Eds.), Handbook of item response theory modeling: Applications to typical performance assessment (pp. 13–40). Routledge.
Google Scholar
Rupp, A. A. (2007). The answer is in the question: A guide for describing and investigating the conceptual foundations and statistical properties of cognitive psychometric models. International Journal of Testing, 7(2), 95–125. https://doi.org/https://doi.org/10.1080/15305050701193454
Google Scholar
Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. Guilford Press.
Google Scholar
Sawaki, Y., Kim, H. J., & Gentile, C. (2009). Q-Matrix construction: Defining the link between constructs and test items in large-scale reading and listening comprehension assessments. Language Assessment Quarterly, 6(3), 190–209. https://doi.org/https://doi.org/10.1080/15434300902801917
Web of Science ®Google Scholar
Sawaki, Y., Stricker, L., & Oranje, A. (2009). Factor structure of the TOEFL Internet-based test. Language Testing, 26(1), 5–30. https://doi.org/https://doi.org/10.1177/0265532208097335
Web of Science ®Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464. https://doi.org/https://doi.org/10.1214/aos/1176344136
Web of Science ®Google Scholar
Sinharay, S., Puhan, G., Haberman, S. J., & Hambleton, R. K. (2019). Subscores: When to communicate them, what are their alternatives, and some recommendations. In D. Zapata-Rivera (Ed.), Score reporting research and applications (pp. 80–107). Routledge Taylor & Francis Group.
Google Scholar
Sinharay, S., Puhan, G., & Haberman, S. J. (2010). Reporting diagnostic scores in educational testing: Temptations, pitfalls, and some solutions. Multivariate Behavioral Research, 45(3), 553–573. https://doi.org/https://doi.org/10.1080/00273171.2010.483382
PubMed Web of Science ®Google Scholar
Song, M.-Y. (2008). Do divisible subskills exist in second language (L2) comprehension? A structural equation modeling approach. Language Testing, 25(4), 435–464. https://doi.org/https://doi.org/10.1177/0265532208094272
Web of Science ®Google Scholar
Stout, W. (2007). Skills diagnosis using IRT-based continuous latent trait models. Journal of Educational Measurement, 44(4), 313–324. https://doi.org/https://doi.org/10.1111/j.1745-3984.2007.00041.x
Web of Science ®Google Scholar
Templin, J., & Bradshaw, L. (2014). Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika, 79(2), 317–339. https://doi.org/https://doi.org/10.1007/s11336-013-9362-0
PubMed Web of Science ®Google Scholar
Toprak, E., Aryadoust, V., & Goh, C. (2019). The log-linear cognitive diagnosis modeling (LCDM) in second language listening assessment. In V. Aryadoust & M. Raquel (Eds.), Quantitative data analysis for language assessment Volume II: Advanced methods (pp. 56–78). Routledge.
Google Scholar
Urmston, A., Raquel, M., & Tsang, C. (2013). Diagnostic testing of Hong Kong tertiary students’ English language proficiency: The development and validation of DELTA. Hong Kong Journal of Applied Linguistics, 14(2), 60–82. https://www.academia.edu/13521940
Google Scholar
Vandergrift, L. (2007). Recent developments in second and foreign language listening comprehension research. Language Teaching, 40(3), 191–210. https://doi.org/https://doi.org/10.1017/S0261444807004338
Google Scholar
Vandergrift, L., & Goh, C. C. M. (2012). Teaching and learning second language listening. Routledge.
Google Scholar
von Davier, M., & Haberman, S. (2014). Hierarchical diagnostic classification models morphing into unidimensional ‘Diagnostic’ classification models – A commentary. Psychometrika, 79(2), 340–346. https://doi.org/https://doi.org/10.1007/s11336-013-9363-z
PubMed Web of Science ®Google Scholar
Weeks, J. P. (2015). Multidimensional test linking. In S. P. Reise & D. A. Revicki (Eds.), Handbook of item response theory modeling: Applications to typical performance assessment (pp. 406–434). Routledge.
Google Scholar
Yi, Y.-S. (2017). Probing the relative importance of different attributes in L2 reading and listening comprehension items: An application of cognitive diagnostic models. Language Testing, 34(3), 1–19. https://doi.org/https://doi.org/10.1177/0265532216646141
Web of Science ®Google Scholar
Yu, X., Cheng, Y., & Chang, -H.-H. (2019). Recent developments in cognitive diagnostic computerized adaptive testing (CD-CAT): A comprehensive review. In M. von Davier & Y.-S. Lee (Eds.), Handbook of diagnostic classification models (pp. 307–331). Springer.
Google Scholar
Zhan, P., Jiao, H., Liao, D., & Li, F. (2019). A longitudinal higher-order diagnostic classification model. Journal of Educational and Behavioral Statistics, 44(3), 251–281. https://doi.org/https://doi.org/10.3102/1076998619827593
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Application of Bi-factor MIRT and Higher-order CDM Models to an In-house EFL Listening Test for Diagnostic Purposes

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Application of Bi-factor MIRT and Higher-order CDM Models to an In-house EFL Listening Test for Diagnostic Purposes

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date