Contextualizing Performances: Comparing Performances During TOEFL iBTTM and Real-Life Academic Speaking Activities: Language Assessment Quarterly: Vol 11, No 4

2,387

Views

CrossRef citations to date

Altmetric

Abstract

In this study we compare test takers’ performance on the Speaking section of the TOEFL iBT^TM and their performances during their real-life academic studies. Thirty international graduate students from mixed language backgrounds in two different disciplines (Sciences and Social Sciences) responded to two independent and four integrated speaking tasks of the TOEFL iBT and participated in semistructured interviews. For the real-life academic contexts, we recorded the performances of our participants in one in-class and one out-of-class speaking activity. On the basis of an analysis of the participants’ speaking (examining grammatical, discourse, and lexical features), we demonstrate that there are some overlapping and some distinct differences in their performances across contexts. Our findings both support and raise questions about the extrapolation inference claim of the validity argument of the Speaking section of the TOEFL iBT.

Notes

¹ The TOEFL iBT Speaking tasks were scored by six ETS raters, who each scored answers to two different prompts. Each task was scored by two different raters. Of the total rating decisions, in only two instances was adjudication necessary.

² Although yep, yeah, and yes can have a range of intended meanings and discourse functions, we did not make a distinction between literal and intended meanings of these utterances; all were classified as the literal meaning of yes.

³ We decided to use contexts instead of types of activity (presentations vs. group discussions) to report our results. However, it should be noted that we ran each analysis by activity type, and our results revealed the same patterns.

⁴ Following Field (Citation2009), we used Pearson’s correlation coefficient r as a measure of effect size, with an r of 0 meaning there is no effect and an r of 1 meaning there is a perfect effect. Following Cohen (Citation1992), Field suggests that r = .10 is a small effect; r = .30 is a medium effect; and r = .50 is a large effect (Field, Citation2009, p. 57).

⁵ As explained earlier, we decided to use the clause as the common denominator for our analyses. However, clauses based on the AS-unit may still raise the issue of comparability, though to a lesser degree than AS-units, across the three contexts. The out-of-class context in particular produced a great number of very short independent subclausal units (counted as one-clause AS-units) consisting of only one or two words (e.g., short answers, such as yes, or sure in spoken interaction). These clauses obviously have little room for errors. This type of clause never occurred in the SSTiBT context, and this may have been a major factor contributing to significantly higher grammatical accuracy in the out-of-class context as seen above. To examine if this trend would still hold if we compared the grammatical inaccuracy measure calculated with longer AS-units from the three contexts, we selected AS-units from each context that are three, four, and five clauses long and aggregated all the errors occurring in those clauses and calculated the average number of errors per clause. Results indicated that the same trend holds. The participants made .443 errors per clause in the SSTiBT, .257 in the in-class context, and .215 in the out-of-class context.

⁶ Our measure of informal language refers to colloquial use of language such as the use of like as a filler in conversation.

⁷ Our findings from the grammar measures show a clear pattern of decreasing syntactic complexity and increasing grammatical accuracy moving from the SSTiBT to the out-of-class context. To some, this pattern may imply cognitive trade-offs between syntactic complexity and grammatical accuracy (Skehan, Citation1998). However, as with other measures in our study, syntactic complexity and grammatical accuracy may have been affected by a complex interplay of different aspects of the context (both cognitive and affective). Therefore, the grammatical findings should not be taken to suggest a simple inverse relationship in which complexity and accuracy compete for attentional resources.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Contextualizing Performances: Comparing Performances During TOEFL iBT^TM and Real-Life Academic Speaking Activities

Information for

Open access

Opportunities

Help and information

Contextualizing Performances: Comparing Performances During TOEFL iBTTM and Real-Life Academic Speaking Activities

Abstract

Notes

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature

Contextualizing Performances: Comparing Performances During TOEFL iBT^TM and Real-Life Academic Speaking Activities