References
- Abidin, S. A. Z., & Jmail, A. (2015). Toward an English proficiency test for postgraduates in Malaysia. SAGE Open, 5(3), 1–10. https://doi.org/https://doi.org/10.1177/2158244015597725
- Adams, R. J., Wilson, M., & Wang, W. (1997). The Multidimensional Random Coefficients Multinomial Logit Model. Applied Psychological Measurement, 21(1), 1–23. https://doi.org/https://doi.org/10.1177/0146621697211001
- AERA, APA, & NCME. (2014) . Standards for educational and psychological testing. AERA.
- Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation. Cambridge University Press.
- Bachman, L., & Palmer, A. (1996). Language testing in practice. Oxford University Press.
- Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397–472). Addison-Wesley.
- Bock, R. D., & Zimowski, M. F. (1997). Multiple group IRT. In W. J. Van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 433–448). Springer.
- Bortolotti, S. L. V., Tezza, R., de Andrade, D. F., Bornia, A. C., & De Sousa Junior, A. F. (2013). Relevance and advantages of using the item response theory. Quality and Quantity, 47(4), 2341–2360. https://doi.org/https://doi.org/10.1007/s11135-012-9684-5
- Brooks, L., & Swan, M. (2014). Contextualizing performances: Comparing performances during TOEFL iBT and real-life academic speaking activities. Language Assessment Quarterly, 11(4), 353–373. https://doi.org/https://doi.org/10.1080/15434303.2014.947532
- Brown, J. D. (1997). Computers in language testing: Present research and some future directions. Language Learning & Technology, 1(1), 44–59. https://scholarspace.manoa.hawaii.edu/bitstream/10125/25003/1/01_01_brown.pdf.
- Brown, W. (1910). Some experimental results in the correlation of mental abilities 1. British Journal of Psychology, 3(3), 296–322 . 1904‐1920.https://doi.org/https://doi.org/10.1111/j.2044-8295.1910.tb00207.x
- Buck, G. (2001). Assessing listening. Cambridge University Press.
- Bulut, O., & Kan, A. (2012). Application of computerized adaptive testing to entrance examination for graduate studies in Turkey. Eurasian Journal of Educational Research, 49, 61–80. https://files.eric.ed.gov/fulltext/EJ1059924.pdf
- Burston, J., & Neophytou, M. (2014). Lessons learned in designing and implementing a computer-adaptive test for English. The EUROCALL Review, 22(2), 19–25. https://doi.org/https://doi.org/10.4995/eurocall.2014.3632
- Carlson, J. E., & von Davier, M. (2013). Item response theory. ETS R&D Scientific and Policy Contributions Series (ETS SPC–13–05). Educational Testing Service.
- Chalhoub-Deville, M., & Deville, C. (1999). Computer adaptive testing in second language contexts. Annual Review of Applied Linguistics, 19, 273–299. https://doi.org/https://doi.org/10.1017/S0267190599190147
- Chapelle, C. A., Chung, Y., Hegelheimer, V., Pendar, N., & Xu, J. (2010). Towards a computer-delivered test of productive grammatical ability. Language Testing, 27(4), 443–469. https://doi.org/https://doi.org/10.1177/0265532210367633
- Chapelle, C. A., & Douglas, D. (2006). Assessing language through computer technology. Cambridge University Press.
- Chapelle, C. A., Enright, M. E., & Jamieson, J. (2010). Does an argument-based approach to validity make a difference? Educational Measurement: Issues and Practice, 29(1), 3–13. https://doi.org/https://doi.org/10.1111/j.1745-3992.2009.00165.x
- Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (2008). Building a validity argument for the test of English as a foreign language. Routledge.
- Chapelle, C. A. (2011). Validation in language assessment. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (Vol. II, pp. 717–730). Routledge.
- Chen, J. H., Chao, H. Y., & Chen, S. Y. (2020). A dynamic stratification method for improving trait estimation in computerized adaptive testing under item exposure control. Applied Psychological Measurement, 44(3), 182–196. https://doi.org/https://doi.org/10.1177/0146621619843820
- Chen, J., & Wang, L. (2010). Computer adaptive testing: A new trend in language testing. In International Conference on Artificial Intelligence and Education (ICAIE) (pp. 725–728). IEEE.
- Choi, I., Sung, K., & Boo, J. (2003). Comparability of a paper-based language test and a computer-based. Language Testing, 20(3), 295–320. https://doi.org/https://doi.org/10.1191/0265532203lt258oa
- Choi, S. W., & King, D. R. (2015). R Package MAT: Simulation of multidimensional adaptive testing for dichotomous IRT models. Applied Psychological Measurement, 39(3), 239–240. https://doi.org/https://doi.org/10.1177/0146621614567940
- Chun-Shin Limited. (2019). 2018年臺灣大型企業人才國際化及外語職能管理調查報告 [An investigation of the foreign language competence of the staff members in large-sized enterprises in 2018]. Retrieved August 15th, 2020, from http://www.toeic.com.tw/img_report_2019/2018report.pdf?fbclid=IwAR2s4lzrE01fVijx0LGIxNLnCBFhewmwnzVjhSDsy02sSJOQMBxX2dkzJvA
- Chun-Shin Limited. (2020). Newsletter 55. Retrieved August 15th, 2020, from http://www.toeic.com.tw/file/20069046.pdf?fbclid=IwAR2s4lzrE01fVijx0LGIxNLnCBFhewmwnzVjhSDsy02sSJOQMBxX2dkzJvA
- Chun-Shin Limited. (n.d.) 大專技職院校英文能力畢業門檻 [English graduation benchmarks of Taiwanese universities]. Retrieved August 15, 2020, from http://www.toeic.com.tw/university/img_new/college.pdf
- Cotos, E. (2011). Potential of automated writing evaluation feedback. CALICO Journal, 28(2), 420–459. https://doi.org/https://doi.org/10.11139/cj.28.2.420-459
- Council of Europe. (2001) . Common European framework of reference for languages: Learning, teaching, assessment. Cambridge University Press.
- Crystal, D. (2003). English as a global language. Cambridge University Press.
- Cumming, A. (2013). Validation of language assessments. In C. Chapelle (Ed.), The encyclopedia of applied linguistics. (pp. 1-10). John Wiley and Sons.
- Doong, S. H. (2009). A knowledge-based approach for item exposure control in computerized adaptive testing. Journal of Educational and Behavioral Statistics, 34(4), 530–558. https://doi.org/https://doi.org/10.3102/1076998609336667
- Douglas, D. (2010). Understanding language testing. Hodder Education.
- Duc, P. H. (2015, August 13–15). Building a computer-based model of assessment for writing skills. Paper presented at the 6th International Conference on TESOL, Ho Chi Minh City, Vietnam.
- Dunkel, P. (1999). Considerations in developing or using second/foreign language proficiency computer-adaptive tests. Language Learning & Technology, 2(2), 77–93. https://doi.org/http://doi.org/10.1025/25044
- ETS. (2018). TOEIC listening & reading score descriptors. Retrieved June 24th, 2020, from https://www.ets.org/s/toeic/pdf/listening-reading-score-descriptors.pdf
- ETS. (2019). The importance of learning English. Retrieved August 15th, 2020, from https://www.etsglobal.org/fr/en/blog/news/importance-of-learning-english
- ETS. (2020). Performance descriptors for the TOEFL iBT Test. Retrieved June 20th, 2020, from https://www.ets.org/s/toefl/pdf/pd-toefl-ibt.pdf
- Fenwick, E. K., Loe, B. S., Khadka, J., Man, R. E., Rees, G., & Lamoureux, E. L. (2020). Optimizing measurement of vision-related quality of life: A computerized adaptive test for the impact of vision impairment questionnaire (IVI-CAT). Quality of Life Research, 29(3), 765–774. https://doi.org/https://doi.org/10.1007/s11136-019-02354-y
- Fulcher, G., & Davidson, F. (2007). Language testing and assessment. Routledge.
- Fulcher, G. (2010). Practical language testing. Hooder Education.
- Green, A. (2014). Exploring language assessment and testing: Language in action. Routledge.
- Henning, G. (1984). Advantages of latent trait measurement in language testing. Language Testing, 1(2), 123–133. https://doi.org/https://doi.org/10.1177/026553228400100201
- Her, O.-S., Chou, C. P., Su, S.-W., Chiang, K.-H., & Chen, Y.-H. (2013). 我國大學英語畢業門檻政策之檢討 [A critical review of the english benchmark policy for gradutation in taiwan’s universities]. Educational Policy Forum, 16(3), 1–30. https://doi.org/https://doi.org/10.3966/156082982013081603001.
- Hughes, A. (2003). Testing for language teachers. CUP.
- Kane, M. (1992). An argument-based approach to validation. Psychological Bulletin, 112(3), 527–535. https://doi.org/https://doi.org/10.1037/0033-2909.112.3.527
- Kane, M. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38(4), 319–342. https://doi.org/https://doi.org/10.1111/j.1745-3984.2001.tb01130.x
- Kane, M. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/https://doi.org/10.1111/jedm.12000
- Kane, M. (2006). Validation. In R. Brennen (Ed.), Educational measurement (4th ed., pp. 17–64). Greenwood.
- Kim, S., & Kolen, M. J. (2007). Effects on scale linking of different definitions of criterion functions for the IRT characteristic curve methods. Journal of Educational and Behavioral Statistics, 32(4), 371–397. https://doi.org/https://doi.org/10.3102/1076998607302632
- Koizumi, R., In’nami, Y., Asano, K., & Agawa, T. (2016). Validity evidence of Criterion® for assessing L2 writing proficiency in a Japanese university context. Language Testing in Asia, 6(5), 1–26. https://doi.org/https://doi.org/10.1186/s40468-016-0027-7
- Larson, J. W., & Madsen, H. S. (1985). Computerized adaptive language testing: Moving beyond computer-assisted testing. CALICO Journal, 2(3), 32–43. https://journals.equinoxpub.com/CALICO/article/ viewFile/23643/19648.
- LTTC. (2016). GEPT level descriptors. Retrieved June 18th, 2020, from https://www.lttc.ntu.edu.tw/E_LTTC/E_GEPT.htm
- Melitz, J. (2016). English as a global language. In V. Ginsburgh & S. Weber (Eds.), The palgrave handbook of economics and language (pp. 583–615). Palgrave Macmillan.
- Meunier, L. E. (1994). Computer adaptive language tests (CALT) offer a great potential for functional testing. Yet, why don’t they? CALICO Journal, 11(4), 23–39. https://www.jstor.org/stable/24152755.
- MOE. (2004). 教育部未來四年施政主軸行動方案表[MOE action plan for policy initiatives for the next four years]. www.edu.tw/userfiles/url/20120921102842/a931022.doc
- Mulder, J., & Van Der Linden, W. J. (2009). Multidimensional adaptive testing with optimal design criteria for item selection. Psychometrika, 74(2), 273–296. https://doi.org/https://doi.org/10.1007/s11336-008-9097-5
- O’Sullivan, B. (2011). Language testing: Theories and practices. Palgrave Macmillan.
- O’Sullivan, B. (2012). Assessment issues in languages for specific purposes. The Modern Language Journal, 96(s1), 71–88. https://doi.org/https://doi.org/10.1111/j.1540-4781.2012.01298.x
- O’Sullivan, B. (2014). Adapting tests to the local context. Plenary presentation at the 2nd British Council New Directions in English Language Assessment conference, Tokyo, Japan
- O’Sullivan, B. (2020). Foreword: Localization. In L. I. Su, C. J. Weir, & J. R. W. Wu (Eds.), English proficiency testing in Asia: A new paradigm bridging global and local contexts (pp. xiii–xxviii). Routledge.
- Ockey, G. J. (2012). Item response theory. In G. Fulcher & F. Davidson (Eds.), Routledge handbook of language testing in a nutshell (pp. 336–349). Routledge, Taylor & Francis Group.
- Pan, Y.-C., & Newfields, T. (2012). Tertiary EFL proficiency graduation requirements in Taiwan: A study of washback on learning. Electronic Journal of Foreign Language Teaching, 9(1), 108–122. https://e-flt.nus.edu.sg/wp-content/uploads/2020/09/v9n12012/pan.pdf.
- Pan, Y., & Roever, C. (2016). Consequences of test use: A case study of employers’ voice on the social impact of English certification exit requirements in Taiwan. Language Testing in Asia, 6(6), 1–21. https://doi.org/https://doi.org/10.1186/s40468-016-0029-5
- Price, G. (2014). English for all? Neoliberalism, globalization, and language policy in Taiwan. Language in Society, 43(5), 567–589. https://doi.org/https://doi.org/10.1017/S0047404514000566
- Rasch, G. (1960). Studies in mathematical psychology: I. Probabilistic models for some intelligence and attainment tests. Nielsen & Lydiche.
- Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. The University of Chicago Press.
- Rezaie, M., & Golshan, M. (2015). Computer adaptive test (CAT): Advantages and limitations. International Journal of Educational Investigations, 2(5), 128–137. http://www.ijeionline.com/attachments/article/42/IJEI_Vol.2_No.5_2015-5-11.pdf.
- Robitzsch, A., Kiefer, T., & Wu, M. (2020). TAM: Test Analysis Modules. R package version 3, 5–19. https://CRAN.R-project.org/package=TAM
- Ross, S. J. (2008). Language testing in Asia: Evolution, innovation, and policy challenges. Language Testing, 25(1), 5–13. https://doi.org/https://doi.org/10.1177/0265532207083741.
- Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61(2), 331–354. https://doi.org/https://doi.org/10.1007/BF02294343
- Shih, C.-M. (2012). Policy analysis of the English graduation benchmark in Taiwan. Perspectives in Education, 30(3), 60. https://journals.ufs.ac.za/index.php/pie/article/view/1770
- Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3(3), 271. https://doi.org/https://doi.org/10.1111/j.2044-8295.1910.tb00206.x
- Suvoruv, R., & Hegelheimer, V. (2014). Computer-assisted language testing. In J. Kunnan (Ed.), The companion to language assessment (pp. 593–613). Wiley-Blackwell.
- Tang, C.-J. (2011). 英語畢業門檻考試對大學生英語學習的影響 [The impact of university exit exams on students’ english learning experience]. Foreign Language Studies, 14, 1–24. https://doi.org/https://doi.org/10.30404/FLS.201106_(14).0001.
- Tao, Y.-H., Wu, Y.-L., & Chang, H.-Y. (2008). A practical computer adaptive testing model for small-scale scenarios. Educational Technology & Society, 11(3), 259–274. https://www.jstor.org/stable/jeductechsoci.11.3.259.
- Urquhart, A. H., & Weir, C. J. (1998). Reading in a second language: Process, product, and practice. Longman.
- Vongpumivitch, V. (2012). English-as-a-Foreign-Language assessment in Taiwan. Language Assessment Quarterly, 9(1), 1–10. https://doi.org/https://doi.org/10.1080/15434303.2012.649592
- Wagner, E. (2020). Duolingo english test, revised version july 2019. Language Assessment Quarterly, 17(3), 300–315. https://doi.org/https://doi.org/10.1080/15434303.2020.1771343
- Wilson, M., Allen, D. D., & Li, J. C. (2006). Improving measurement in health education and health behavior research using item response modeling: Comparison with the classical test theory approach. Health Education Research, 21(suppl_1), i19–i32. https://doi.org/https://doi.org/10.1093/her/cyl053
- Wu, J. R. W. (2020). Introduction. In L. I. Su, C. J. Weir, & J. R. W. Wu (Eds.), English proficiency testing in Asia: A new paradigm bridging global and local contexts (pp. 1–8). Routledge.
- Xi, X. (2010). Automated scoring and feedback systems: Where are we and where are we heading? Language Testing, 27(3), 291–300. https://doi.org/https://doi.org/10.1177/0265532210364643
- Xi, X. (2008). Methods of test validation. In E. Shohamy & N. H. Hornberger (Eds.), Encyclopedia of language and education (Vol. 7, pp. 177–196). Springer Science and Business Media LLC.
- Yeom, S., & Jun, H. (2020). Young Korean EFL learners’ reading and test-taking strategies in a paper and a computer-based reading comprehension tests. Language Assessment Quarterly, 17(3), 282–299. https://doi.org/https://doi.org/10.1080/15434303.2020.1731753
- Young, R., Shermis, M. D., Brutten, S. R., & Perkins, K. (1996). From conventional to computer-adaptive testing of ESL reading comprehension. System, 24(1), 23–40. https://doi.org/https://doi.org/10.1016/0346-251X(95)00051-K
- Yu, G., & Zhang, J. (2017). Computer-based English language testing in China: Present and future. Language Assessment Quarterly, 14(2), 177–188. https://doi.org/https://doi.org/10.1080/15434303.2017.1303704