1,484
Views
30
CrossRef citations to date
0
Altmetric
Original Articles

Toward Spoken Human–Computer Tutorial Dialogues

, &
Pages 289-323 | Published online: 15 Dec 2010
 

Abstract

Oral discourse is the primary form of human–human communication, hence, computer interfaces that communicate via unstructured spoken dialogues will presumably provide a more efficient, meaningful, and naturalistic interaction experience. Within the context of learning environments, there are theoretical positions supporting a speech facilitation hypothesis that predicts that spoken tutorial dialogues will increase learning more than typed dialogues. We evaluated this hypothesis in an experiment where 24 participants learned computer literacy via a spoken and a typed conversation with AutoTutor, an intelligent tutoring system with conversational dialogues. The results indicated that (a) enhanced content coverage was achieved in the spoken condition; (b) learning gains for both modalities were on par and greater than a no-instruction control; (c) although speech recognition errors were unrelated to learning gains, they were linked to participants' evaluations of the tutor; (d) participants adjusted their conversational styles when speaking compared to typing; (e) semantic and statistical natural language understanding approaches to comprehending learners' responses were more resilient to speech recognition errors than syntactic and symbolic-based approaches; and (f) simulated speech recognition errors had differential impacts on the fidelity of different semantic algorithms. We discuss the impact of our findings on the speech facilitation hypothesis and on human–computer interfaces that support spoken dialogues.

Notes

1WER and WRR are standard metrics for assessing the reliability of automatic speech recognition systems. WER = [S + D + I]/N, where S, D, and I are the number of substitutions, deletions, and insertions in the automatically recognized text (with errors) when compared to the ideal text (no errors) of N words. WRR = 1 – WER.

2In this and subsequent analyses, the interaction order (spoken then typed vs. typed then spoken) was included as a between-subjects factor. However, the main effect for interaction order was never significant, nor were the two-way interactions between interaction order and other variables. Therefore, speaking first and then typing, versus typing first and then speaking, had no impact on the dependent variables.

3 p < .05 in this and subsequent analyses unless explicitly noted.

Acknowledgments. We thank our research colleagues in the Emotive Computing Group and the Tutoring Research Group at the University of Memphis (http://emotion.autotutor.org). Special thanks to Jeremiah Sullins and O'meed Entezari for their valuable contributions to this study.

Support. This research was supported by the National Science Foundation (REC 0106965, ITR 0325428, and HCC 0834847). Any opinions, findings, and conclusions or recommendations expressed in this article are those of the authors and do not necessarily reflect the views of NSF.

HCI Editorial Record. Received September 8, 2008. Revision received June 17, 2009. Accepted by John Anderson. Final manuscript received December 11, 2009. — Editor

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 329.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.