ABSTRACT
In the present study we investigated the effect of test format on oral performance in terms of test scores and discourse features (accuracy, fluency, and complexity). Moreover, we explored how the scores obtained on different test formats relate to such features. To this end, 23 Iranian EFL learners participated in three test formats of monologue, interview, and group oral test. Four raters rated the recorded productions holistically and analytically. The results of Friedman and Wilcoxon Signed Rank Tests indicated that the participants obtained the highest scores on the group oral test followed by the monologue and interview, although the differences between the group oral test and monologue were not statistically significant. Analysis of the produced discourse also indicated significant differences among the three test formats. The most accurate production was found on the group oral test and the most complex production occurred in monologue. Further analysis indicated that accuracy features were significantly related to both analytic and holistic ratings in all the test formats. The study highlights the complex relationship that exists between the features of the discourse produced on a test task and the factors raters consider in rating.
Acknowledgments
The authors thank the raters who participated in this study and the anonymous reviewers and editors for their constructive and insightful comments on earlier versions of the article.
Notes
1 Cohen’s (Citation1988) criteria for effect sizes: .1 = small effect, .3 = medium effect, and .5 = large effect.
2 Cohen’s (Citation1988) guidelines to interpret correlation: .10–.29 = small, .30–.49 = medium, .50–1.0 = large.