Abstract
The study utilised a fine-grained diagnostic checklist to assess first-year undergraduates in Hong Kong and evaluated its validity and usefulness for diagnosing academic writing in English. Ten English language instructors marked 472 academic essays with the checklist. They also agreed on a Q-matrix, which specified the relationships among the checklist items and five writing subskills. This conceptual Q-matrix was refined iteratively via fitting a psychometric model (i.e. the reduced reparameterised unified model) to empirical data (i.e. the checklist marks) through the computer program Arpeggio Suite. The final Q-matrix was found to be valid and useful; it had far fewer parameters but greater power to discriminate masters and non-masters of academic writing skills. The study found that the cognitive diagnostic model (CDM)-based skill diagnosis could identify the strengths and weaknesses in the five writing subskills for students across three proficiency levels and could provide richer and finer information than the traditional raw-score approach. However, limitations and caveats of the CDM approach were also observed, which warrant future investigations of its application in assessing writing.
Acknowledgment
The author would like to thank the colleagues from the Center for Language in Education at the Education University of Hong Kong for participating in this project. She would also like to thank the anonymous reviewers and the editors for their critical comments and suggestions. The remaining flaws in this article are hers.
Notes
1. Using the Rasch model notation, the three-facets Rasch model used to evaluate rater reliability is stated as ln [Pnij/(1 − Pnij)] = Bn − Di − Cj, where Bn = Person ability, Di = item difficulty, Cj = severity of raters and Pnij = the probability that person n on item i by rater j is judged as ‘Yes’. The model specification for the FACETS software is: Model = ?, ?, ?, 1–35, D.
2. For detailed explanation of G, H and R, interested readers can refer to Eckes (Citation2011, pp. 44, 45). All three indices are measures of the variance of rater severity.