REFERENCES
- Alderson , J. C. 2000 . Assessing reading , Cambridge, , UK : Cambridge University Press .
- Bachman , L. F. , Lynch , B. and Mason , M. 1995 . Investigating variability in tasks and rater judgements in a performance test of foreign language speaking . Language Testing , 12 : 238 – 257 .
- Bachman , L. F. and Palmer , A. S. 1996 . Language testing in practice: Designing and developing useful language tests , Oxford, , UK : Oxford University Press .
- Bachman , L. F. and Palmer , A. S. 2010 . Language assessment in practice: Developing language assessments and justifying their use in the real world , Oxford, , UK : Oxford University Press .
- Borg , S. 2003 . Teacher cognition in language teaching: A review of research on what language teachers think, know, believe, and do . Language Teaching , 36 : 81 – 109 .
- Brown , A. 2000 . An investigation of the rating process in the IELTS oral interview . IELTS Research Reports , 3 : 49 – 85 .
- Buck , G. 2001 . Assessing listening , Cambridge, , UK : Cambridge University Press .
- Carr , N. 2008 . “ Decisions about automated scoring: What they mean for our constructs ” . In Towards adaptive CALL: Natural language processing for diagnostic language assessment , Edited by: Chapelle , C. A. , Chung , Y. R. and Xu , J. 82 – 101 . Ames, IA : Iowa State University .
- Carr , N. T. , Pan , M. and Xi , X. Construct refinement and automated scoring in Web-based testing . Symposium paper presented at the 24th Annual Language Testing Research Colloquium . Hong Kong. December .
- Cumming , A. , Kantor , R. and Powers , D. 2002 . Decision making while rating ESL/EFL writing tasks: A descriptive framework . The Modern Language Journal , 86 : 67 – 96 .
- Davies , A. 2001 . The logic of testing languages for specific purposes . Language Testing , 18 : 133 – 147 .
- Douglas , D. 2000 . Assessing language for specific purposes , Cambridge, , UK : Cambridge University Press .
- Eckes , T. 2008 . Rater types in writing performance assessments: A classification approach to rater variability . Language Testing , 25 : 155 – 185 .
- Elder , C. , Harding , L. and Knoch , U. 2009 . OET reading revision study , Melbourne, , Australia : Language Testing Research Centre, University of Melbourne . Final report
- Gass , S. and Mackey , A. 2000 . Stimulated recall methodology and second language research , Mahwah, NJ : Erlbaum .
- Harding , L. and Ryan , K. 2009 . Decision making in marking open-ended listening test items: The case of the OET . Spaan Fellow Working Papers in Second or Foreign Language Assessment , 7 : 99 – 114 .
- Kane , M. 1992 . An argument-based approach to validity . Psychological Bulletin , 112 : 527 – 535 .
- Kane , M. 2006 . “ Validation ” . In Educational measurement , 4th , Edited by: Brennan , R. L. Washington, DC : American Council on Education/Praeger .
- Knoch , U. Investigating the effectiveness of individualized feedback for rating behaviour: A longitudinal study . Paper presented at the 31st Annual Language Testing Research Colloquium . Denver, Colorado. March .
- Lumley , T. and Brown , A. 1996 . Specific-purpose language performance tests: Task and interaction . Australian Review of Applied Linguistics , Series S ( 13 ) : 105 – 136 .
- Lumley , T. , Lynch , B. and McNamara , T. 1994 . A new approach to standard-setting in language assessment . Melbourne Papers in Language Testing , 3 ( 2 ) : 19 – 40 .
- Lumley , T. and McNamara , T. 1995 . Rater characteristics and rater bias: Implications and training . Language Testing , 12 : 54 – 71 .
- May , L. 2006 . An examination of rater orientations on a paired candidate discussion task through stimulated verbal recall . Melbourne Papers in Language Testing , 11 ( 1 ) : 29 – 51 .
- McNamara , T. F. 1990a . Assessing the second language proficiency of health professionals , Melbourne, , Australia : The University of Melbourne . Unpublished doctoral dissertation
- McNamara , T. F. 1990b . Item Response Theory and the validation of an ESP test for health professionals . Language Testing , 7 : 52 – 76 .
- McNamara , T. F. 1991 . Test dimensionality: IRT analysis of an ESP listening test . Language Testing , 8 : 139 – 159 .
- McNamara , T. F. 1996 . Measuring second language performance , London, , UK : Addison Wesley Longman .
- McNamara , T. and Lumley , T. 1997 . The effect of interlocutor and assessment mode variables in overseas assessments of speaking skills in occupational settings . Language Testing , 14 : 140 – 156 .
- Milanovic , M. , Saville , N. and Shen , S. A study of the decision-making behaviour of composition markers . Performance testing, cognition and assessment: Selected papers from the 15th Language Testing Research Colloquium, Studies in Language Testing 3 . Edited by: Milanovic , M. and Saville , N. pp. 92 – 114 . Cambridge, , UK : Cambridge University Press .
- Orr , M. 2002 . The FCE Speaking test: Using rater reports to help interpret test scores . System , 30 : 143 – 154 .
- Ryan , K. 2007 . Assessing the OET: The nurses' perspective , Melbourne, , Australia : The University of Melbourne . Unpublished master's thesis
- Weir , C. J. 1993 . Understanding and developing language tests , New York, NY : Prentice Hall .
- Xi , X. 2008 . “ Methods of test validation ” . In Encyclopedia of language and education: Language testing and assessment , 2nd , Edited by: Shohamy , E. and Hornberger , N. H. Vol. 7 , 177 – 196 . New York, NY : Springer .