Search in:

Advanced search

Language Assessment Quarterly Volume 19, 2022 - Issue 2

Submit an article Journal homepage

791

Views

CrossRef citations to date

Altmetric

Research Article

Developing and Validating a Computerized Adaptive Testing System for Measuring the English Proficiency of Taiwanese EFL University Students

Heng-Tsung Danny Huanga National Taiwan University, Taipei, TaiwanView further author information

Shao-Ting Alan Hungb National Taiwan University of Science and Technology, Taipei, TaiwanView further author information

Hsiu-Yi Chaoc National Taiwan Ocean University, Keelung, TaiwanView further author information

Jyun-Hong Chend Soochow University, Taipei, TaiwanView further author information

Tsui-Peng Line National Sun Yat-sen University, Kaohsiung, TaiwanView further author information

Ching-Lin Shihe National Sun Yat-sen University, Kaohsiung, TaiwanCorrespondence[email protected]
View further author information

Pages 162-188 | Published online: 24 Oct 2021

Cite this article
https://doi.org/10.1080/15434303.2021.1984490
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Abidin, S. A. Z., & Jmail, A. (2015). Toward an English proficiency test for postgraduates in Malaysia. SAGE Open, 5(3), 1–10. https://doi.org/https://doi.org/10.1177/2158244015597725
Web of Science ®Google Scholar
Adams, R. J., Wilson, M., & Wang, W. (1997). The Multidimensional Random Coefficients Multinomial Logit Model. Applied Psychological Measurement, 21(1), 1–23. https://doi.org/https://doi.org/10.1177/0146621697211001
Web of Science ®Google Scholar
AERA, APA, & NCME. (2014) . Standards for educational and psychological testing. AERA.
Google Scholar
Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation. Cambridge University Press.
Google Scholar
Bachman, L., & Palmer, A. (1996). Language testing in practice. Oxford University Press.
Google Scholar
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–472). Addison-Wesley.
Google Scholar
Bock, R. D., & Zimowski, M. F. (1997). Multiple group IRT. In W. J. Van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 433–448). Springer.
Google Scholar
Bortolotti, S. L. V., Tezza, R., de Andrade, D. F., Bornia, A. C., & De Sousa Junior, A. F. (2013). Relevance and advantages of using the item response theory. Quality and Quantity, 47(4), 2341–2360. https://doi.org/https://doi.org/10.1007/s11135-012-9684-5
Google Scholar
Brooks, L., & Swan, M. (2014). Contextualizing performances: Comparing performances during TOEFL iBT and real-life academic speaking activities. Language Assessment Quarterly, 11(4), 353–373. https://doi.org/https://doi.org/10.1080/15434303.2014.947532
Web of Science ®Google Scholar
Brown, J. D. (1997). Computers in language testing: Present research and some future directions. Language Learning & Technology, 1(1), 44–59. https://scholarspace.manoa.hawaii.edu/bitstream/10125/25003/1/01_01_brown.pdf.
Google Scholar
Brown, W. (1910). Some experimental results in the correlation of mental abilities 1. British Journal of Psychology, 3(3), 296–322 . 1904‐1920.https://doi.org/https://doi.org/10.1111/j.2044-8295.1910.tb00207.x
Google Scholar
Buck, G. (2001). Assessing listening. Cambridge University Press.
Google Scholar
Bulut, O., & Kan, A. (2012). Application of computerized adaptive testing to entrance examination for graduate studies in Turkey. Eurasian Journal of Educational Research, 49, 61–80. https://files.eric.ed.gov/fulltext/EJ1059924.pdf
Google Scholar
Burston, J., & Neophytou, M. (2014). Lessons learned in designing and implementing a computer-adaptive test for English. The EUROCALL Review, 22(2), 19–25. https://doi.org/https://doi.org/10.4995/eurocall.2014.3632
Google Scholar
Carlson, J. E., & von Davier, M. (2013). Item response theory. ETS R&D Scientific and Policy Contributions Series (ETS SPC–13–05). Educational Testing Service.
Google Scholar
Chalhoub-Deville, M., & Deville, C. (1999). Computer adaptive testing in second language contexts. Annual Review of Applied Linguistics, 19, 273–299. https://doi.org/https://doi.org/10.1017/S0267190599190147
Google Scholar
Chapelle, C. A., Chung, Y., Hegelheimer, V., Pendar, N., & Xu, J. (2010). Towards a computer-delivered test of productive grammatical ability. Language Testing, 27(4), 443–469. https://doi.org/https://doi.org/10.1177/0265532210367633
Web of Science ®Google Scholar
Chapelle, C. A., & Douglas, D. (2006). Assessing language through computer technology. Cambridge University Press.
Google Scholar
Chapelle, C. A., Enright, M. E., & Jamieson, J. (2010). Does an argument-based approach to validity make a difference? Educational Measurement: Issues and Practice, 29(1), 3–13. https://doi.org/https://doi.org/10.1111/j.1745-3992.2009.00165.x
Google Scholar
Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (2008). Building a validity argument for the test of English as a foreign language. Routledge.
Google Scholar
Chapelle, C. A. (2011). Validation in language assessment. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (Vol. II, pp. 717–730). Routledge.
Google Scholar
Chen, J. H., Chao, H. Y., & Chen, S. Y. (2020). A dynamic stratification method for improving trait estimation in computerized adaptive testing under item exposure control. Applied Psychological Measurement, 44(3), 182–196. https://doi.org/https://doi.org/10.1177/0146621619843820
PubMed Web of Science ®Google Scholar
Chen, J., & Wang, L. (2010). Computer adaptive testing: A new trend in language testing. In International Conference on Artificial Intelligence and Education (ICAIE) (pp. 725–728). IEEE.
Google Scholar
Choi, I., Sung, K., & Boo, J. (2003). Comparability of a paper-based language test and a computer-based. Language Testing, 20(3), 295–320. https://doi.org/https://doi.org/10.1191/0265532203lt258oa
Google Scholar
Choi, S. W., & King, D. R. (2015). R Package MAT: Simulation of multidimensional adaptive testing for dichotomous IRT models. Applied Psychological Measurement, 39(3), 239–240. https://doi.org/https://doi.org/10.1177/0146621614567940
PubMed Web of Science ®Google Scholar
Chun-Shin Limited. (2019). 2018年臺灣大型企業人才國際化及外語職能管理調查報告 [An investigation of the foreign language competence of the staff members in large-sized enterprises in 2018]. Retrieved August 15th, 2020, from http://www.toeic.com.tw/img_report_2019/2018report.pdf?fbclid=IwAR2s4lzrE01fVijx0LGIxNLnCBFhewmwnzVjhSDsy02sSJOQMBxX2dkzJvA
Google Scholar
Chun-Shin Limited. (2020). Newsletter 55. Retrieved August 15th, 2020, from http://www.toeic.com.tw/file/20069046.pdf?fbclid=IwAR2s4lzrE01fVijx0LGIxNLnCBFhewmwnzVjhSDsy02sSJOQMBxX2dkzJvA
Google Scholar
Chun-Shin Limited. (n.d.) 大專技職院校英文能力畢業門檻 [English graduation benchmarks of Taiwanese universities]. Retrieved August 15, 2020, from http://www.toeic.com.tw/university/img_new/college.pdf
Google Scholar
Cotos, E. (2011). Potential of automated writing evaluation feedback. CALICO Journal, 28(2), 420–459. https://doi.org/https://doi.org/10.11139/cj.28.2.420-459
Google Scholar
Council of Europe. (2001) . Common European framework of reference for languages: Learning, teaching, assessment. Cambridge University Press.
Google Scholar
Crystal, D. (2003). English as a global language. Cambridge University Press.
Google Scholar
Cumming, A. (2013). Validation of language assessments. In C. Chapelle (Ed.), The encyclopedia of applied linguistics. (pp. 1-10). John Wiley and Sons.
Google Scholar
Doong, S. H. (2009). A knowledge-based approach for item exposure control in computerized adaptive testing. Journal of Educational and Behavioral Statistics, 34(4), 530–558. https://doi.org/https://doi.org/10.3102/1076998609336667
Web of Science ®Google Scholar
Douglas, D. (2010). Understanding language testing. Hodder Education.
Google Scholar
Duc, P. H. (2015, August 13–15). Building a computer-based model of assessment for writing skills. Paper presented at the 6th International Conference on TESOL, Ho Chi Minh City, Vietnam.
Google Scholar
Dunkel, P. (1999). Considerations in developing or using second/foreign language proficiency computer-adaptive tests. Language Learning & Technology, 2(2), 77–93. https://doi.org/http://doi.org/10.1025/25044
Google Scholar
ETS. (2018). TOEIC listening & reading score descriptors. Retrieved June 24th, 2020, from https://www.ets.org/s/toeic/pdf/listening-reading-score-descriptors.pdf
Google Scholar
ETS. (2019). The importance of learning English. Retrieved August 15th, 2020, from https://www.etsglobal.org/fr/en/blog/news/importance-of-learning-english
Google Scholar
ETS. (2020). Performance descriptors for the TOEFL iBT Test. Retrieved June 20th, 2020, from https://www.ets.org/s/toefl/pdf/pd-toefl-ibt.pdf
Google Scholar
Fenwick, E. K., Loe, B. S., Khadka, J., Man, R. E., Rees, G., & Lamoureux, E. L. (2020). Optimizing measurement of vision-related quality of life: A computerized adaptive test for the impact of vision impairment questionnaire (IVI-CAT). Quality of Life Research, 29(3), 765–774. https://doi.org/https://doi.org/10.1007/s11136-019-02354-y
PubMed Web of Science ®Google Scholar
Fulcher, G., & Davidson, F. (2007). Language testing and assessment. Routledge.
Google Scholar
Fulcher, G. (2010). Practical language testing. Hooder Education.
Google Scholar
Green, A. (2014). Exploring language assessment and testing: Language in action. Routledge.
Google Scholar
Henning, G. (1984). Advantages of latent trait measurement in language testing. Language Testing, 1(2), 123–133. https://doi.org/https://doi.org/10.1177/026553228400100201
Google Scholar
Her, O.-S., Chou, C. P., Su, S.-W., Chiang, K.-H., & Chen, Y.-H. (2013). 我國大學英語畢業門檻政策之檢討 [A critical review of the english benchmark policy for gradutation in taiwan’s universities]. Educational Policy Forum, 16(3), 1–30. https://doi.org/https://doi.org/10.3966/156082982013081603001.
Google Scholar
Hughes, A. (2003). Testing for language teachers. CUP.
Google Scholar
Kane, M. (1992). An argument-based approach to validation. Psychological Bulletin, 112(3), 527–535. https://doi.org/https://doi.org/10.1037/0033-2909.112.3.527
Web of Science ®Google Scholar
Kane, M. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38(4), 319–342. https://doi.org/https://doi.org/10.1111/j.1745-3984.2001.tb01130.x
Web of Science ®Google Scholar
Kane, M. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/https://doi.org/10.1111/jedm.12000
Web of Science ®Google Scholar
Kane, M. (2006). Validation. In R. Brennen (Ed.), Educational measurement (4th ed., pp. 17–64). Greenwood.
Google Scholar
Kim, S., & Kolen, M. J. (2007). Effects on scale linking of different definitions of criterion functions for the IRT characteristic curve methods. Journal of Educational and Behavioral Statistics, 32(4), 371–397. https://doi.org/https://doi.org/10.3102/1076998607302632
Web of Science ®Google Scholar
Koizumi, R., In’nami, Y., Asano, K., & Agawa, T. (2016). Validity evidence of Criterion® for assessing L2 writing proficiency in a Japanese university context. Language Testing in Asia, 6(5), 1–26. https://doi.org/https://doi.org/10.1186/s40468-016-0027-7
Google Scholar
Larson, J. W., & Madsen, H. S. (1985). Computerized adaptive language testing: Moving beyond computer-assisted testing. CALICO Journal, 2(3), 32–43. https://journals.equinoxpub.com/CALICO/article/ viewFile/23643/19648.
Google Scholar
LTTC. (2016). GEPT level descriptors. Retrieved June 18th, 2020, from https://www.lttc.ntu.edu.tw/E_LTTC/E_GEPT.htm
Google Scholar
Melitz, J. (2016). English as a global language. In V. Ginsburgh & S. Weber (Eds.), The palgrave handbook of economics and language (pp. 583–615). Palgrave Macmillan.
Google Scholar
Meunier, L. E. (1994). Computer adaptive language tests (CALT) offer a great potential for functional testing. Yet, why don’t they? CALICO Journal, 11(4), 23–39. https://www.jstor.org/stable/24152755.
Google Scholar
MOE. (2004). 教育部未來四年施政主軸行動方案表[MOE action plan for policy initiatives for the next four years]. www.edu.tw/userfiles/url/20120921102842/a931022.doc
Google Scholar
Mulder, J., & Van Der Linden, W. J. (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika, 74(2), 273–296. https://doi.org/https://doi.org/10.1007/s11336-008-9097-5
PubMed Web of Science ®Google Scholar
O’Sullivan, B. (2011). Language testing: Theories and practices. Palgrave Macmillan.
Google Scholar
O’Sullivan, B. (2012). Assessment issues in languages for specific purposes. The Modern Language Journal, 96(s1), 71–88. https://doi.org/https://doi.org/10.1111/j.1540-4781.2012.01298.x
Web of Science ®Google Scholar
O’Sullivan, B. (2014). Adapting tests to the local context. Plenary presentation at the 2nd British Council New Directions in English Language Assessment conference, Tokyo, Japan
Google Scholar
O’Sullivan, B. (2020). Foreword: Localization. In L. I. Su, C. J. Weir, & J. R. W. Wu (Eds.), English proficiency testing in Asia: A new paradigm bridging global and local contexts (pp. xiii–xxviii). Routledge.
Google Scholar
Ockey, G. J. (2012). Item response theory. In G. Fulcher & F. Davidson (Eds.), Routledge handbook of language testing in a nutshell (pp. 336–349). Routledge, Taylor & Francis Group.
Google Scholar
Pan, Y.-C., & Newfields, T. (2012). Tertiary EFL proficiency graduation requirements in Taiwan: A study of washback on learning. Electronic Journal of Foreign Language Teaching, 9(1), 108–122. https://e-flt.nus.edu.sg/wp-content/uploads/2020/09/v9n12012/pan.pdf.
Google Scholar
Pan, Y., & Roever, C. (2016). Consequences of test use: A case study of employers’ voice on the social impact of English certification exit requirements in Taiwan. Language Testing in Asia, 6(6), 1–21. https://doi.org/https://doi.org/10.1186/s40468-016-0029-5
Google Scholar
Price, G. (2014). English for all? Neoliberalism, globalization, and language policy in Taiwan. Language in Society, 43(5), 567–589. https://doi.org/https://doi.org/10.1017/S0047404514000566
Web of Science ®Google Scholar
Rasch, G. (1960). Studies in mathematical psychology: I. Probabilistic models for some intelligence and attainment tests. Nielsen & Lydiche.
Google Scholar
Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. The University of Chicago Press.
Google Scholar
Rezaie, M., & Golshan, M. (2015). Computer adaptive test (CAT): Advantages and limitations. International Journal of Educational Investigations, 2(5), 128–137. http://www.ijeionline.com/attachments/article/42/IJEI_Vol.2_No.5_2015-5-11.pdf.
Google Scholar
Robitzsch, A., Kiefer, T., & Wu, M. (2020). TAM: Test Analysis Modules. R package version 3, 5–19. https://CRAN.R-project.org/package=TAM
Google Scholar
Ross, S. J. (2008). Language testing in Asia: Evolution, innovation, and policy challenges. Language Testing, 25(1), 5–13. https://doi.org/https://doi.org/10.1177/0265532207083741.
Google Scholar
Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61(2), 331–354. https://doi.org/https://doi.org/10.1007/BF02294343
Web of Science ®Google Scholar
Shih, C.-M. (2012). Policy analysis of the English graduation benchmark in Taiwan. Perspectives in Education, 30(3), 60. https://journals.ufs.ac.za/index.php/pie/article/view/1770
Google Scholar
Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3(3), 271. https://doi.org/https://doi.org/10.1111/j.2044-8295.1910.tb00206.x
Google Scholar
Suvoruv, R., & Hegelheimer, V. (2014). Computer-assisted language testing. In J. Kunnan (Ed.), The companion to language assessment (pp. 593–613). Wiley-Blackwell.
Google Scholar
Tang, C.-J. (2011). 英語畢業門檻考試對大學生英語學習的影響 [The impact of university exit exams on students’ english learning experience]. Foreign Language Studies, 14, 1–24. https://doi.org/https://doi.org/10.30404/FLS.201106_(14).0001.
Google Scholar
Tao, Y.-H., Wu, Y.-L., & Chang, H.-Y. (2008). A practical computer adaptive testing model for small-scale scenarios. Educational Technology & Society, 11(3), 259–274. https://www.jstor.org/stable/jeductechsoci.11.3.259.
Web of Science ®Google Scholar
Urquhart, A. H., & Weir, C. J. (1998). Reading in a second language: Process, product, and practice. Longman.
Google Scholar
Vongpumivitch, V. (2012). English-as-a-Foreign-Language assessment in Taiwan. Language Assessment Quarterly, 9(1), 1–10. https://doi.org/https://doi.org/10.1080/15434303.2012.649592
Web of Science ®Google Scholar
Wagner, E. (2020). Duolingo english test, revised version july 2019. Language Assessment Quarterly, 17(3), 300–315. https://doi.org/https://doi.org/10.1080/15434303.2020.1771343
Web of Science ®Google Scholar
Wilson, M., Allen, D. D., & Li, J. C. (2006). Improving measurement in health education and health behavior research using item response modeling: Comparison with the classical test theory approach. Health Education Research, 21(suppl_1), i19–i32. https://doi.org/https://doi.org/10.1093/her/cyl053
PubMedGoogle Scholar
Wu, J. R. W. (2020). Introduction. In L. I. Su, C. J. Weir, & J. R. W. Wu (Eds.), English proficiency testing in Asia: A new paradigm bridging global and local contexts (pp. 1–8). Routledge.
Google Scholar
Xi, X. (2010). Automated scoring and feedback systems: Where are we and where are we heading? Language Testing, 27(3), 291–300. https://doi.org/https://doi.org/10.1177/0265532210364643
Web of Science ®Google Scholar
Xi, X. (2008). Methods of test validation. In E. Shohamy & N. H. Hornberger (Eds.), Encyclopedia of language and education (Vol. 7, pp. 177–196). Springer Science and Business Media LLC.
Google Scholar
Yeom, S., & Jun, H. (2020). Young Korean EFL learners’ reading and test-taking strategies in a paper and a computer-based reading comprehension tests. Language Assessment Quarterly, 17(3), 282–299. https://doi.org/https://doi.org/10.1080/15434303.2020.1731753
Web of Science ®Google Scholar
Young, R., Shermis, M. D., Brutten, S. R., & Perkins, K. (1996). From conventional to computer-adaptive testing of ESL reading comprehension. System, 24(1), 23–40. https://doi.org/https://doi.org/10.1016/0346-251X(95)00051-K
Google Scholar
Yu, G., & Zhang, J. (2017). Computer-based English language testing in China: Present and future. Language Assessment Quarterly, 14(2), 177–188. https://doi.org/https://doi.org/10.1080/15434303.2017.1303704
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Developing and Validating a Computerized Adaptive Testing System for Measuring the English Proficiency of Taiwanese EFL University Students

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Developing and Validating a Computerized Adaptive Testing System for Measuring the English Proficiency of Taiwanese EFL University Students

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date