90
Views
6
CrossRef citations to date
0
Altmetric
Original Article

Discovering Cues to Error Detection in Speech Recognition Output: A User-Centered Approach

, , &
Pages 237-270 | Published online: 08 Dec 2014

References

  • Anderberg, M.R. Cluster Analysis for Applications. New York: Academic Press, 1973.
  • Arnold, S.C.; Mark, L.; and Goldthwaite, J. Programming by voice, VocalProgramming. In M. Tremaine, E. Cole, and E. Mynatt (eds.), Proceedings of the Fourth International ACM Conference on Assistive Technologies. New York: ACM Press, 2000, pp. 149-155.
  • Bain, K.; Basson, S.H.; and Wald, M. Speech recognition in university classrooms: Liberated learning project. In V.L. Hanson and J.A. Jacko (eds.), Proceedings of the Fifth International ACM Conference on Assistive Technologies. New York: ACM Press, 2002, pp. 192-196.
  • Brill, E.; Florian, R.; Henderson, J.C.; and Mangu, L. Beyond n-grams: Can linguistic sophistication improve language modeling? In C. Boitet and P. Whitelock (eds.), Proceedings of the Thirty-Sixth Annual Meeting on Association for Computational Linguistics. Morristown, NJ: Association for Computational Linguistics, 1998, pp. 186-190.
  • Carpenter, P.; Jin, C.; Wilson, D.; Zhang, R.; Bohus, D.; and Rudnicky, A.I. Is this conversation on track? In P. Dalsgaard, B. Lindberg, H. Benner, and Z. Tan (eds.), Proceedings of the Seventh European Conference on Speech Communication and Technology. Bonn, Germany: International Speech Communication Association, 2001, pp. 2121-2124.
  • Chase, L. Error-Responsive Feedback Mechanisms for Speech Recognizers. Ph.D. dissertation, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 1997.
  • Chase, L. Word and acoustic confidence annotation for large vocabulary speech recognition. In G. Kokkinakis, N. Fakotakis, and E. Dermatas (eds.), Proceedings of the Fifth European Conference on Speech Communication and Technology. Bonn, Germany: International Speech Communication Association, 1997, pp. 815-818.
  • Deng, L., and Huang, X. Challenges in adopting speech recognition. Communications of the ACM, 47, 1 (January 2004), 69-75.
  • Duchateau, J.; Demuynck, K.; and Wambacq, P. Confidence scoring based on backward language models. In F.J. Taylor, J. Principe, and H. Bourlard (eds.), 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1. Los Alamitos, CA: IEEE Computer Society Press, 2002, pp. 221-224.
  • Ein-Dor, P., and Spiegler, I. Natural language access to multiple databases: A model and a prototype. Journal of Management Information Systems, 12, 1 (Summer 1995), 171-197.
  • Ericsson, K.A., and Simon, H.A. Protocol Analysis: Verbal Reports as Data. Cambridge, MA: MIT Press, 1993.
  • Feng, J., and Sears, A. Using confidence scores to improve hands-free speech based navigation in continuous dictation systems. ACM Transactions on Computer-Human Interaction, 11, 4 (December 2004), 329-356.
  • Furui, S. Automatic speech recognition and its application to information extraction. In R. Dale and K. Church (eds.), Proceedings of the Thirty-Seventh Annual Meeting of the Association for Computational Linguistics. Morristown, NJ: Association for Computational Linguistics, 1999, pp. 11-20.
  • Gauvain, J.-L., and Lamel, L. Large vocabulary speech recognition based on statistical methods. In W. Chou and B.H. Juang (eds.), Pattern Recognition in Speech and Language Processing. Boca Raton, FL: CRC Press, 2003, pp. 149-189.
  • Gillick, L.; Ito, Y.; and Young, J. A probabilistic approach to confidence estimation and evaluation. In M.K. Lang and H. Hoge (eds.), 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2. Los Alamitos, CA: IEEE Computer Society Press, 1997, pp. 879-882.
  • Hagen, A.; Connors, D.A.; and Pellom, B.L. The analysis and design of architecture systems for speech recognition on modern handheld-computing devices. In R. Gupta and Y. Nakamura (eds.), Proceedings of the First IEEE/ACM/IFIP International Conference on Hard-ware/Software Codesign and System Synthesis. New York: ACM Press, 2003, pp. 65-70.
  • Hernandez-Abrego, G., and Marino, J.B. Contextual confidence measures for continuous speech recognition. In H. Abut and L. Onural (eds.), 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3. Los Alamitos, CA: IEEE Computer Society Press, 2000, pp. 1803-1806.
  • Higgins, E.L., and Raskind, M.H. Speaking to read: The effects of continuous vs. discrete speech recognition systems on the reading and spelling of children with learning disabilities. Journal of Special Education Technology, 15, 1 (Winter 2000) (available at jset.unlv.edu/ 15.1/higgins/first.html).
  • Hoffman, T. Speech recognition powers utility's customer service. ComputerWorld, September 12, 2005 (available at www.computerworld.com/managementtopics/management/ helpdesk/story/0,10801,104535,00.html).
  • Kemp, T., and Schaaf, T. Estimating confidence using word lattices. In G. Kokkinakis, N. Fakotakis, and E. Dermatas (eds.), Proceedings of the Fifth European Conference on Speech Communication and Technology. Bonn, Germany: International Speech Communication Association, 1997, pp. 827-830.
  • Krahmer, E.; Swerts, M.; Theune, M.; and Weegels, M. Error detection in spoken human- machine interaction. International Journal of Speech Technology, 4, 1 (March 2001), 19-30.
  • Lai, J., and Vergo, J. MedSpeak: Report creation with continuous speech recognition. In S. Pemberton (ed.), Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. New York: ACM Press, 1997, pp. 431-438.
  • Levine, H.G., and Rossmoore, D. Diagnosing the human threats to information technology implementation: A missing factor in systems analysis illustrated in a case study. Journal of Management Information Systems, 10, 2 (Fall 1993), 55-74.
  • Liu, Y. Structural Event Detection for Rich Transcription of Speech. Ph.D. dissertation, School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, 2004.
  • Lubert, J.; Kotler, A.; Shein, F.; and Tam, C. Speech recognition. SNOW, Toronto, ON, 1998 (available at snow.utoronto.ca/best/special/speechrecognition.html).
  • Maison, B., and Gopinath, R. Robust confidence annotation and rejection for continuous speech recognition. In V.J. Mathews and A. Swindlehurst (eds.), 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1. Los Alamitos, CA: IEEE Computer Society Press, 2001, pp. 389-392.
  • Mangu, L., and Padmanabhan, M. Error corrective mechanisms for speech recognition. In V.J. Mathews and A. Swindlehurst (eds.), 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1. Los Alamitos, CA: IEEE Computer Society Press, 2001, pp. 29-32.
  • Mann, W.C., and Thompson, S.A. Rhetorical structure theory: A theory of text organization. In L. Polanyi (ed.), The Structure of Discourse. Norwood, NJ: Ablex, 1987, pp. 85-96.
  • Mao, J.-Y., and Benbasat, I. The use of explanations in knowledge-based systems: Cognitive perspectives and a process-tracing analysis. Journal of Management Information Systems, 17, 2 (Fall 2000), 153-180.
  • McTear, M.F. Spoken dialogue technology: Enabling the conversational user interface. ACM Computing Surveys, 34, 1 (March 2002), 90-169.
  • Nunamaker, J.F., Jr.; Konsynski, B.R.; Chen, M.; Vinze, A.S.; King, D.R.; and Heltne, M.M. Knowledge-based systems support for information centers. Journal of Management Information Systems, 5, 1 (Summer 1988), 6-24.
  • Pao, C.; Schmid, P.; and Glass, J. Confidence scoring for speech understanding systems. In R.H. Mannell and J. Robert-Ribes (eds.), Proceedings of the Fifth International Conference on Spoken Language Processing. Canberra: Australian Speech Science and Technology Association, 1998, pp. 815-818.
  • Pradhan, S.S., and Ward, W.H. Estimating semantic confidence for spoken dialogue systems. In F.J. Taylor, J. Principe, and H. Bourlard (eds.), 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1. Los Alamitos, CA: IEEE Computer Society Press, 2002, pp. 233-236.
  • Ringger, E.K., and Allen, J.F. Error correction via a post-processor for continuous speech recognition. In M.H. Hayes and M.A. Clements (eds.), 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1. Los Alamitos, CA: IEEE Computer Society Press, 1996, pp. 427-430.
  • Robertson, J.; Wong, W.Y.; Chung, C.; and Kim, D.K. Automatic speech recognition for generalised time based media retrieval and indexing. In W. Effelsberg and B.C. Smith (eds.), Proceedings of the Sixth ACM International Conference on Multimedia. New York: ACM Press, 1998, pp. 241-246.
  • San-Segundo, R.; Pellom, B.; Hacioglu, K.; Ward, W.; and Pardo, J.M. Confidence measures for spoken dialogue systems. In V.J. Mathews and A. Swindlehurst (eds.), 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1. Los Alamitos, CA: IEEE Computer Society Press, 2001, pp. 393-396.
  • Sarikaya, R.; Gao, Y.; and Picheny, M. Word level confidence measurement using semantic features. In W. Siu, A.G. Constantinides, and Y. Chan (eds.), 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1. Los Alamitos, CA: IEEE Computer Society Press, 2003, pp. 604-607.
  • Sarma, A., and Palmer, D.D. Context-based speech recognition error detection and correction. In J.B. Hirschberg, S. Dumais, D. Marcu, and S. Roukos (eds.), Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics 2004: Short Papers. East Stroudsburg, PA: Association for Computational Linguistics, 2004, pp. 85-88.
  • Sears, A.; Feng, J.; Oseitutu, K.; and Karat, C.-M. Hands-free speech-based navigation during dictation: Difficulties, consequences, and solutions. Human-Computer Interaction, 18, 3 (2003), 229-257.
  • Sears, A.; Karat, C.-M.; Oseitutu, K.; Karimullah, A.; and Feng, J. Productivity, satisfaction, and interaction strategies of individuals with spinal cord injuries and traditional users interacting with speech recognition software. Universal Access in the Information Society, 1, 1 (June 2001), 4-15.
  • Skantze, G., and Edlund, J. Early error detection on word level. In B. Milner (ed.), Proceedings of COST278 and ISCA Tutorial and Research Workshop on Robustness Issues in Conversational Interaction. Bonn, Germany: International Speech Communication Association, 2004 (available at www.isca-speech.org/archive/robust2004/rob4_17.html).
  • Suhm, B.; Myers, B.; and Waibel, A. Multimodal error correction for speech user interfaces. ACM Transactions on Computer-Human Interaction, 8, 1 (March 2001), 60-98.
  • Weintraub, M.; Beaufays, F.; Rivlin, Z.; Konig, Y.; and Stolcke, A. Neural-network based measures of confidence for word recognition. In M.K. Lang and H. Hoge (eds.), 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2. Los Alamitos, CA: IEEE Computer Society Press, 1997, pp. 887-890.
  • Wendemuth, A.; Rose, G.; and Dolfing, J.G.A. Advances in confidence measures for large vocabulary. In D. Cochran and A. Spanias (eds.), 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2. Los Alamitos, CA: IEEE Computer Society Press, 1999, pp. 705-708.
  • Wessel, F.; Schluter, R.; and Ney, H. Using posterior probabilities for improved speech recognition. In H. Abut and L. Onural (eds.), 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3. Los Alamitos, CA: IEEE Computer Society Press, 2000, pp. 1587-1590.
  • Wessel, F.; Schluter, R.; Macherey, K.; and Ney, H. Confidence measures for large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 9, 3 (March 2001), 288-298.
  • Young, S.R. Detecting misrecognitions and out-of-vocabulary words. In 1994 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2. Los Alamitos, CA: IEEE Computer Society Press, 1994, pp. 21-24.
  • Zhang, D., and Adipat, B. Challenges, methodologies, and issues in the usability testing of mobile applications. International Journal of Human-Computer Interaction, 18, 3 (July 2005), 293-308.
  • Zhang, R., and Rudnicky, A.I. Word level confidence annotation using combinations of features. In P. Dalsgaard, B. Lindberg, H. Benner, and Z. Tan (eds.), Proceedings of the Seventh European Conference on Speech Communication and Technology. Bonn, Germany: International Speech Communication Association, 2001, pp. 2105-2108.
  • Zhou, L.; Shi, Y.; Feng, J.; and Sears, A. Data mining for detecting errors in dictation speech recognition. IEEE Transactions on Speech and Audio Processing, 13, 5 (September 2005), 681-688.
  • Zhou, Z., and Meng, H. A two-level schema for detecting recognition errors. In S.H. Kim, S. Lee, Y. Oh, and Y. Lee (eds.), Proceedings of the Eighth International Conference on Spoken Language Processing. Bonn, Germany: International Speech Communication Association, 2004, pp. 449-452.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.