3,217
Views
9
CrossRef citations to date
0
Altmetric
Full Papers

Optical laser microphone for human-robot interaction: speech recognition in extremely noisy service environments

ORCID Icon, , , ORCID Icon, , & show all
Pages 304-317 | Received 29 Apr 2021, Accepted 08 Dec 2021, Published online: 13 Jan 2022

References

  • Nakadai K, Okuno HG, Mizumoto T. Development, deployment and applications of robot audition open source software HARK. J Robot Mechatron. 2017;29(1):16–25.
  • Rothberg S, Allen M, Castellini P, et al. An international review of laser Doppler vibrometry: making light work of vibration measurement. Opt Lasers Eng. 2017;99:11–22.
  • Bicen B, Jolly S, Jeelani K, et al. Integrated optical displacement detection and electrostatic actuation for directional optical microphones with micromachined biomimetic diaphragms. IEEE Sens J. 2009;9(12):1933–1941.
  • Leclère Q, Laulagnet B. Nearfield acoustic holography using a laser vibrometer and a light membrane. J Acoust Soc Am. 2009;126(3):1245–1249.
  • Chen Y, Wu F, Shuai W, et al. Kejia robot–an attractive shopping mall guider. In: International Conference on Social Robotics. Springer; 2015. p. 145–154.
  • Novoa J, Wuth J, Escudero JP, et al. DNN-HMM based automatic speech recognition for HRI scenarios. In: Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction. Association for Computing Machinery; 2018. p. 150–159.
  • Lee SC, Wang JF, Chen MH. Threshold-based noise detection and reduction for automatic speech recognition system in human-robot interactions. Sensors. 2018;18(7). DOI:https://doi.org/10.3390/s18072068.
  • Suzuki M, Honjo T. Spot-forming method by using two shotgun microphones. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, Hong Kong; 2015. p. 188–191.
  • Valin J, Yamamoto S, Rouat J, et al. Robust recognition of simultaneous speech by a mobile robot. IEEE Trans Robot. 2007;23(4):742–752.
  • Lim JS, Oppenheim AV. Enhancement and bandwidth compression of noisy speech. Proc IEEE. 1979;67(12):1586–1604.
  • Boll S. A spectral subtraction algorithm for suppression of acoustic noise in speech. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Washington, DC, USA; Vol. 4; 1979. p. 200–203.
  • Berouti M, Schwartz R, Makhoul J. Enhancement of speech corrupted by acoustic noise. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Washington, DC, USA; Vol. 4; 1979. p. 208–211.
  • Hansen P, Jensen S. Subspace-based noise reduction for speech signals via diagonal and triangular matrix decompositions: survey and analysis. EURASIP J Adv Signal Process. 2007;2007:092953. DOI:https://doi.org/10.1155/2007/92953.
  • Nakadai K, Okuno HG. Robot audition and computational auditory scene analysis. Adv Intell Syst. 2020;2(9):Article ID 2000050.
  • Nakadai K, Lourens T, Okuno H, et al. Active audition for humanoid. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, Austin, USA; 2000. p. 832–839.
  • Takeda R, Nakadai K, Takahashi T, et al. Efficient blind dereverberation and echo cancellation based on independent component analysis for actual acoustic signals. Neural Comput. 2012;24:234–272.
  • Lotter T, Vary P. Noise reduction by joint maximum a posteriori spectral amplitude and phase estimation with super-Gaussian speech modelling. In: European Signal Processing Conference, Vienna, Austria; 2004. p. 1457–1460.
  • Krawczyk M, Gerkmann T. STFT phase reconstruction in voiced speech for an improved single-channel speech enhancement. IEEE/ACM Trans Audio Speech Lang Process. 2014;22(12):1931–1940.
  • Li K, Lee C. A deep neural network approach to speech bandwidth expansion. In: IEEE International Conference on Acoustics, Speech and Signal Processing, South Brisbane, Australia; 2015. p. 4395–4399.
  • Rethage D, Pons J, Serra X. A Wavenet for Speech Denoising. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, Canada; 2018. p. 5069–5073.
  • Aygün H, Apolskis A. The quality and reliability of the mechanical stethoscopes and laser Doppler vibrometer (LDV) to record tracheal sounds. Appl Acoust. 2020;161:Article ID 107159.
  • Ismail HM, Pretty CG, Signal MK, et al. Laser doppler vibrometer validation of an optical flow motion tracking algorithm. Biomed Signal Process Control. 2019;49:322–327.
  • Malekjafarian A, Martinez D, OBrien EJ. The feasibility of using laser doppler vibrometer measurements from a passing vehicle for bridge damage detection. Shock Vib. 2018;2018; DOI:https://doi.org/10.1155/2018/9385171.
  • Chen DM, Xu Y, Zhu W. Identification of damage in plates using full-field measurement with a continuously scanning laser doppler vibrometer system. J Sound Vib. 2018;422:542–567.
  • Oikawa Y, Goto M, Ikeda Y, et al. Sound field measurements based on reconstruction from laser projections. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, USA; Vol. 4; 2005. p. 661–664.
  • Morita N, Nogami H, Hayashida Y, et al. Development of a miniaturized laser Doppler velocimeter for use as a slip sensor for robot hand control. In: IEEE International Conference on Micro Electro Mechanical Systems, Estoril, Portugal; 2015. p. 748–751.
  • Margerit P, Gobin T, Lebée A, et al. The robotized laser doppler vibrometer: on the use of an industrial robot arm to perform 3D full-field velocity measurements. Opt Lasers Eng. 2021;137:Article ID 106363.
  • Jungbluth J, Siedentopp K, Krieger R, et al. Combining virtual and robot assistants – a case study about integrating amazon's Alexa as a voice interface in robotics. In: Robotix-Academy Conference for Industrial Robotics, Luxembourg; 2018. p. 1–5.
  • Yamamoto T, Takagi Y, Ochiai A, et al. Human support robot as research platform of domestic mobile manipulator. In: RoboCup 2019: Robot World Cup XXIII. Springer International Publishing; 2019. p. 457–465.
  • Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA; 2016. p. 779–788.
  • Zhang X, Wang D. Boosting contextual information for deep neural network based voice activity detection. IEEE/ACM Trans Audio Speech Lang Process. 2016;24(2):252–264.
  • Eyben F, Weninger F, Squartini S, et al. Real-life voice activity detection with LSTM Recurrent Neural Networks and an application to Hollywood movies. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada; 2013. p. 483–487.
  • Zazo R, Sainath TN, Simko G, et al. Feature learning with raw-waveform CLDNNs for voice activity detection. In: Interspeech, San Francisco, USA; 2016. p. 3668–3672.
  • Kim J, Hahn M. Voice activity detection using an adaptive context attention model. IEEE Signal Process Lett. 2018;25(8):1181–1185.
  • Wiseman J. py-webrtcvad. 2016. [last accessed: 2021 Sep 23]. Available from: https://github.com/wiseman/py-webrtcvad/.
  • Rix A, Beerends J, Hollier M, et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, USA; Vol. 2; 2001. p. 749–752.
  • Taal C, Hendriks R, Heusdens R, et al. A short-time objective intelligibility measure for time-frequency weighted noisy speech. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, USA; 2010. p. 4214–4217.
  • Jackson GM, Leventhall G. Household appliance noise. Appl Acoust. 1975;8:101–118.