38
Views
0
CrossRef citations to date
0
Altmetric
Research Article

EchoTap: Non-Verbal Sound Interaction with Knock and Tap Gestures

ORCID Icon, ORCID Icon & ORCID Icon
Received 06 Dec 2023, Accepted 22 Apr 2024, Published online: 03 Jun 2024

References

  • Aggarwal, S., & Sharma, S. (2021). Voice based deep learning enabled user interface design for smart home application system [Paper presentation]. 2021 2nd International Conference on Communication, Computing and Industry 4.0 (C2I4) (pp. 1–6), Bangalore, India. https://doi.org/10.1109/C2I454156.2021.9689435
  • Ahmed, E., Yaqoob, I., Gani, A., Imran, M., & Guizani, M. (2016). Internet-of-things-based smart environments: State of the art, taxonomy, and open research challenges. IEEE Wireless Communications, 23(5), 10–16. https://doi.org/10.1109/MWC.2016.7721736
  • Antonacci, F., Prandi, G., Bernasconi, G., Galli, R., & Sarti, A. (2009). Audio-based object recognition system for tangible acoustic interfaces [Paper presentation]. 2009 IEEE International Workshop on Haptic Audio Visual Environments and Games (pp. 123–128), Lecco, Italy.
  • Bazilinskyy, P., & de Winter, J. (2015). Auditory interfaces in automated driving: An international survey. PeerJ Computer Science, 1(8), e13. https://doi.org/10.7717/peerj-cs.13
  • Becker, V., Fessler, L., & Sörös, G. (2019). Gestear: Combining audio and motion sensing for gesture recognition on smartwatches [Paper presentation]. Proceedings of the 2019 ACM International Symposium on Wearable Computers, Association for Computing Machinery (pp. 10–19). New York, NY., https://doi.org/10.1145/3341163.3347735
  • Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., Fissore, L., Laface, P., Mertins, A., Ris, C., Rose, R., Tyagi, V., Wellekens, C. (2007). Automatic speech recognition and speech variability: A review. Speech Communication, 49(10–11), 763–786. https://doi.org/10.1016/j.specom.2007.02.006
  • Brade, M., Kammer, D., Keck, M., Groh, R. (2010). Immersive data grasping using the explore table. Proceedings of the Fifth International Conference on Tangible, Embedded, and Embodied Interaction (pp. 419–420). Association for Computing Machinery (ACM).
  • Brazil, E. (2009). A review of methods and frameworks for sonic interaction design: Exploring existing approaches. International Symposium on Computer Music Modeling and Retrieval (pp. 41–67). Springer.
  • Brooke, J. (1996). Sus: A “quick and dirty’usability. Usability Evaluation in Industry, 189, 189–194. https://doi.org/10.1201/9781498710411-35
  • Chauhan, J., Hu, Y., Seneviratne, S., Misra, A., Seneviratne, A., Lee, Y. (2017). Breathprint: Breathing acoustics-based user authentication. Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services (pp. 278–291). Association for Computing Machinery (ACM).
  • Chen, K., Du, X., Zhu, B., Ma, Z., Berg-Kirkpatrick, T., Dubnov, S. (2022). Hts-at: A hierarchical token-semantic audio transformer for sound classification and detection. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 646–650). Institute of Electrical and Electronics Engineers (IEEE).
  • Chen, T., Xu, L., Xu, X., & Zhu, K. (2021). Gestonhmd: Enabling gesture-based interaction on low-cost vr head-mounted display. IEEE Transactions on Visualization and Computer Graphics, 27(5), 2597–2607. https://doi.org/10.1109/TVCG.2021.3067689
  • Chen, W., Sun, Q., Chen, X., Xie, G., Wu, H., & Xu, C. (2021). Deep learning methods for heart sounds classification: A systematic review. Entropy, 23(6), 667. https://doi.org/10.3390/e23060667
  • Cheng, P., & Roedig, U. (2022). Personal voice assistant security and privacy—a survey. Proceedings of the IEEE, 110(4), 476–507. https://doi.org/10.1109/JPROC.2022.3153167
  • Cokelek, M., Imamoglu, N., Ozcinar, C., Erdem, E., & Erdem, A. (2021). Leveraging frequency based salient spatial sound localization to improve 360 video saliency prediction [Paper presentation]. 2021 17th International Conference on Machine Vision and Applications (MVA) (pp. 1–5). Online. https://doi.org/10.23919/MVA51890.2021.9511406
  • Corbett, E., & Weber, A. (2016). What can i say? addressing user experience challenges of a mobile voice user interface for accessibility. Proceedings of the 18th International Conference on Human-Computer Interaction with Mobile Devices and Services (pp. 72–82). Association for Computing Machinery (ACM).
  • Csapó, Á., & Wersényi, G. (2013). Overview of auditory representations in human-machine interfaces. ACM Computing Surveys (CSUR), 46(2), 1–23. https://doi.org/10.1145/2543581.2543586
  • Donato, B. D., Dewey, C., Michailidis, T. (2020). Human-sound interaction: Towards a human-centred sonic interaction design approach. Proceedings of the 7th International Conference on Movement and Computing (pp. 1–4). Association for Computing Machinery (ACM).
  • Duraibi, S. (2020). Voice biometric identity authentication model for iot devices. International Journal of Security, Privacy and Trust Management (IJSPTM), 9(1–2), 1–10. https://doi.org/10.5121/ijsptm.2020.9201
  • Dybkjær, L., & Bernsen, N. O. (2000). Usability issues in spoken dialogue systems. Natural Language Engineering, 6(3–4), 243–271. https://doi.org/10.1017/S1351324900002461
  • Franinovic, K., & Serafin, S. (2013). Sonic interaction design. Mit Press.
  • Gaikwad, S. K., Gawali, B. W., & Yannawar, P. (2010). A review on speech recognition technique. International Journal of Computer Applications, 10(3), 16–24. https://doi.org/10.5120/1462-1976
  • Gemmeke, J. F., Ellis, D. P., Freedman, D., Jansen, A., Lawrence, W., Moore, R. C., Plakal, M., & Ritter, M. (2017). Audio set: An ontology and human-labeled dataset for audio events [Paper presentation]. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 776–780), New Orleans, LA. https://doi.org/10.1109/ICASSP.2017.7952261
  • Geronazzo, M., & Serafin, S. (2023). Sonic interactions in virtual environments. Springer Nature.
  • Gong, T., Cho, H., Lee, B., & Lee, S. J. (2019). Knocker: Vibroacoustic-based object recognition with smartphones. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 3(3), 1–21. https://doi.org/10.1145/3351240
  • Grumiaux, P. A., Kitić, S., Girin, L., & Guérin, A. (2022). A survey of sound source localization with deep learning methods. The Journal of the Acoustical Society of America, 152(1), 107–151. https://doi.org/10.1121/10.0011809
  • Hanifa, R. M., Isa, K., & Mohamad, S. (2021). A review on speaker recognition: Technology and challenges. Computers & Electrical Engineering, 90, 107005. https://doi.org/10.1016/j.compeleceng.2021.107005
  • Harrison, C., Schwarz, J., Hudson, S. E. (2011). Tapsense: Enhancing finger interaction on touch surfaces. Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (pp. 627–636). Association for Computing Machinery (ACM).
  • Horn, M., & Bers, M. (2019). Tangible computing. The Cambridge handbook of computing education research (Vol. 1, pp. 663–678). Cambridge University Press.
  • Ishii, H. (2007). Tangible user interfaces. The human-computer interaction handbook (pp. 495–514). CRC Press.
  • Ismail, S., Siddiqi, I., & Akram, U. (2018). Localization and classification of heart beats in phonocardiography signals—a comprehensive review. EURASIP Journal on Advances in Signal Processing, 2018(1), 27. https://doi.org/10.1186/s13634-018-0545-9
  • Jeong, J. Y., Kim, J. H., Yoon, H. Y., & Jeong, J. W. (2021). Knock&tap: Classification and localization of knock and tap gestures using deep sound transfer learning [Paper presentation]. Companion Publication of the 2021 International Conference on Multimodal Interaction (pp. 1–6), Montréal, QC. https://doi.org/10.1145/3461615.3485428
  • Jiang, W., Yu, D., Irlitti, A., Goncalves, J., Kostakos, V., & He, X. (2023). Knock the reality: Virtual interface registration in mixed reality [Paper presentation]. 2023 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW) (pp. 615–616), Shanghai, China. https://doi.org/10.1109/VRW58643.2023.00150
  • Jylhä, A. (2011). Sonic gestures as input in human-computer interaction: Towards a systematic approach. Proceedings of the SMC2011-8th Sound and Music Computing Conference. Zenodo.
  • Jylhä, A., & Erkut, C. (2009). A hand clap interface for sonic interaction with the computer [Paper presentation]. CHI’09 Extended Abstracts on Human Factors in Computing Systems (pp. 3175–3180), Boston, MA. https://doi.org/10.1145/1520340.1520452
  • Kabir, M. M., Mridha, M. F., Shin, J., Jahan, I., & Ohi, A. Q. (2021). A survey of speaker recognition: Fundamental theories, recognition methods and opportunities. IEEE Access, 9, 79236–79263. https://doi.org/10.1109/ACCESS.2021.3084299
  • Katsuragawa, K., Kamal, A., Liu, Q. F., Negulescu, M., & Lank, E. (2019). Bi-level thresholding: Analyzing the effect of repeated errors in gesture input. ACM Transactions on Interactive Intelligent Systems (TiiS), 9(2–3), 1–30. https://doi.org/10.1145/3181672
  • Kim, Y., Reza, M., McGrenere, J., & Yoon, D. (2021). Designers characterize naturalness in voice user interfaces: Their goals, practices, and challenges [Paper presentation]. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (pp. 1–13). Online. https://doi.org/10.1145/3411764.3445579
  • Kong, Q., Cao, Y., Iqbal, T., Wang, Y., Wang, W., & Plumbley, M. D. (2020). Panns: Large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 2880–2894. https://doi.org/10.1109/TASLP.2020.3030497
  • Koo, J., Choi, S., & Hwang, S. (2024). Generalized outlier exposure: Towards a trustworthy out-of-distribution detector without sacrificing accuracy. Neurocomputing, 577, 127371. https://doi.org/10.1016/j.neucom.2024.127371
  • Lafreniere, B., R., Jonker, T., Santosa, S., Parent, M., Glueck, M., Grossman, T., Benko, H., Wigdor, D. (2021). False positives vs. false negatives: The effects of recovery time and cognitive costs on input error preference. The 34th Annual ACM Symposium on User Interface Software and Technology (pp. 54–68). Association for Computing Machinery (ACM).
  • Lane, N. D., Georgiev, P., Qendro, L. (2015). Deepear: Robust smartphone audio sensing in unconstrained acoustic environments using deep learning. Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (pp. 283–294). Association for Computing Machinery (ACM).
  • Latifi, S., & Torres-Reyes, N. (2019). Audio enhancement and synthesis using generative adversarial networks: A survey. International Journal of Computer Applications, 182(35), 27. https://doi.org/10.5120/ijca2019918334
  • Liu, S., Keren, G., Parada-Cabaleiro, E., & Schuller, B. (2021). N-hans: A neural network-based toolkit for in-the-wild audio enhancement. Multimedia Tools and Applications, 80, 28365–28389. https://doi.org/10.1007/s11042-021-11080-y
  • Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B. (2021). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 10012–10022). Institute of Electrical and Electronics Engineers (IEEE).
  • Lopes, P., Jota, R., & Jorge, J. A. (2011). Augmenting touch interaction through acoustic sensing [Paper presentation]. Proceedings of the ACM International Conference on Interactive Tabletops and Surfaces (pp. 53–56). https://doi.org/10.1145/2076354.2076364
  • Loshchilov, I., & Hutter, F. (2017, May 6–9). Decoupled weight decay regularization [Paper presentation]. 7th International Conference on Learning Representations, {ICLR} 2019, New Orleans, LA.
  • Luger, E., Sellen, A. (2016). Like having a really bad pa” the gulf between user expectation and experience of conversational agents. Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (pp. 5286–5297).
  • Luo, G., Yang, P., Chen, M., & Li, P. (2020). Hci on the table: Robust gesture recognition using acoustic sensing in your hand. IEEE Access, 8, 31481–31498. https://doi.org/10.1109/ACCESS.2020.2973305
  • Merrill, D., Raffle, H., Aimi, R. (2008). The sound of touch: Physical manipulation of digital sound. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 739–742). Association for Computing Machinery (ACM).
  • Mo, S., & Tian, Y. (2023). Audio-visual grouping network for sound localization from mixtures. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10565–10574). Institute of Electrical and Electronics Engineers (IEEE).
  • Mollyn, V., Arakawa, R., Goel, M., Harrison, C., & Ahuja, K. (2023). Imuposer: Full-body pose estimation using imus in phones, watches, and earbuds [Paper presentation]. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (pp. 1–12), Hamburg, Germany. https://doi.org/10.1145/3544548.3581392
  • Nassif, A. B., Shahin, I., Attili, I., Azzeh, M., & Shaalan, K. (2019). Speech recognition using deep neural networks: A systematic review. IEEE Access, 7, 19143–19165. https://doi.org/10.1109/ACCESS.2019.2896880
  • Nees, M. A., & Walker, B. N. (2011). Auditory displays for in-vehicle technologies. Reviews of Human Factors and Ergonomics, 7(1), 58–99. https://doi.org/10.1177/1557234X11410396
  • Nogueira, A. F. R., Oliveira, H. S., Machado, J. J., & Tavares, J. M. R. (2022). Sound classification and processing of urban environments: A systematic literature review. Sensors, 22(22), 8608. https://doi.org/10.3390/s22228608
  • Oh, Y., Schwalm, M., & Kalpin, N. (2022). Multisensory benefits for speech recognition in noisy environments. Frontiers in Neuroscience, 16, 1031424. https://doi.org/10.3389/fnins.2022.1031424
  • Ono, M., Shizuki, B., Tanaka, J. (2013). Touch & activate: Adding interactivity to existing objects using active acoustic sensing. Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology (pp. 31–40). Association for Computing Machinery (ACM).
  • Park, D. S., Chan, W., Zhang, Y., Chiu, C. C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019). Specaugment: A simple data augmentation method for automatic speech recognition. arXiv Preprint arXiv:1904.08779.
  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., … Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems (p. 32).
  • Pearl, C. (2016). Designing voice user interfaces: Principles of conversational experiences. O’Reilly Media, Inc.
  • Poslad, S. (2011). Ubiquitous computing: Smart devices, environments and interactions. John Wiley & Sons.
  • Pyae, A., & Joelsson, T. N. (2018). Investigating the usability and user experiences of voice user interface: A case of google home smart speaker. Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services Adjunct (pp. 127–131). Association for Computing Machinery (ACM).
  • Rascon, C., & Meza, I. (2017). Localization of sound sources in robotics: A review. Robotics and Autonomous Systems, 96, 184–210. https://doi.org/10.1016/j.robot.2017.07.011
  • Rocchesso, D., Serafin, S., Behrendt, F., Bernardini, N., Bresin, R., Eckel, G., Franinovic, K., Hermann, T., Pauletto, S., Susini, P., & Visell, Y. (2008). Sonic interaction design: Sound, information and experience. CHI’08 Extended Abstracts on Human Factors in Computing Systems (pp. 3969–3972). Association for Computing Machinery (ACM).
  • Roth, J., Liu, X., Ross, A., & Metaxas, D. (2013). Biometric authentication via keystroke sound [Paper presentation]. 2013 International Conference on Biometrics (ICB) (pp. 1–8), Madrid, Spain. https://doi.org/10.1109/ICB.2013.6613015
  • Ryu, S., & Kim, S. C. (2020). Impact sound-based surface identification using smart audio sensors with deep neural networks. IEEE Sensors Journal, 20(18), 10936–10944. https://doi.org/10.1109/JSEN.2020.2993321
  • Schiettecatte, B., & Vanderdonckt, J. (2008). Audiocubes: A distributed cube tangible interface based on interaction range for sound design. Proceedings of the 2nd International Conference on Tangible and Embedded Interaction (pp. 3–10). Association for Computing Machinery (ACM).
  • Seeed Studio. (2023). ReSpeaker core v2.0. Seeed Studio. https://wiki.seeedstudio.com/ReSpeaker_Core_v2.0/ (Accessed on October 31, 2023).
  • Seng, K. P., Ang, L. M., Peter, E., & Mmonyi, A. (2023). Machine learning and ai technologies for smart wearables. Electronics, 12(7), 1509. https://doi.org/10.3390/electronics12071509
  • Serafin, S., Geronazzo, M., Erkut, C., Nilsson, N. C., & Nordahl, R. (2018). Sonic interactions in virtual reality: State of the art, current challenges, and future directions. IEEE Computer Graphics and Applications, 38(2), 31–43. https://doi.org/10.1109/MCG.2018.193142628
  • Shi, L., Ashoori, M., Zhang, Y., Azenkot, S. (2018). Knock knock, what’s there: Converting passive objects into customizable smart controllers. Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services (pp. 1–13). Association for Computing Machinery (ACM).
  • Shin, J., Lee, S., Gong, T., Yoon, H., Roh, H., Bianchi, A., Lee, S. J. (2022). Mydj: Sensing food intakes with an attachable on your eyeglass frame. Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (pp. 1–17). Association for Computing Machinery (ACM).
  • Shneiderman, B., Plaisant, C., Cohen, M., Jacobs, S., Elmqvist, N., & Diakopoulos, N. (2016). Designing the user interface: Strategies for effective human-computer interaction. Pearson.
  • Singh, N., Agrawal, A., & Khan, R. (2018). Voice biometric: A technology for voice based authentication. Advanced Science, Engineering and Medicine, 10 (7–8), 754–759. https://doi.org/10.1166/asem.2018.2219
  • Soro, A. (2012). Gestures and cooperation: Considering non verbal communication in the design of interactive spaces. Università degli Studi di Cagliari.
  • Sterkenburg, J., Landry, S., & Jeon, M. (2019). Design and evaluation of auditory-supported air gesture controls in vehicles. Journal on Multimodal User Interfaces, 13, 55–70. https://doi.org/10.1007/s12193-019-00298-8
  • Tomlinson, B. J., Walker, B. N., & Moore, E. B. (2020). Auditory display in interactive science simulations: Description and sonification support interaction and enhance opportunities for learning [Paper presentation]. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (pp. 1–12), O’ahu, Hawaii. https://doi.org/10.1145/3313831.3376886
  • Tripathi, A. M., & Mishra, A. (2021). Environment sound classification using an attention-based residual neural network. Neurocomputing, 460, 409–423. https://doi.org/10.1016/j.neucom.2021.06.031
  • Ullmer, B., Shaer, O., Mazalek, A., & Hummels, C. (2022). Weaving fire into form: Aspirations for tangible and embodied interaction. Morgan & Claypool.
  • Winters, R. M., Tomlinson, B. J., Walker, B. N., & Moore, E. B. (2019). Sonic interaction design for science education. Ergonomics in Design, 27(1), 5–10. https://doi.org/10.1177/1064804618797399
  • Wu, H., & Wang, J. (2016). A visual attention-based method to address the midas touch problem existing in gesture-based interaction. The Visual Computer, 32, 123–136. https://doi.org/10.1007/s00371-014-1060-0
  • Xu, C., Li, Z., Zhang, H., Rathore, A. S., Li, H., Song, C., Wang, K., Xu, W. (2019). Waveear: Exploring a mmwave-based noise-resistant speech sensing for voice-user interface. Proceedings of the 17th Annual International Conference on Mobile Systems, Applications, and Services (pp. 14–26). Association for Computing Machinery (ACM).
  • Xu, X., Gong, J., Brum, C., Liang, L., Suh, B., Gupta, S. K., Agarwal, Y., Lindsey, L., Kang, R., Shahsavari, B., Nguyen, T., Nieto, H., Hudson, S. E., Maalouf, C., Mousavi, J. S., & Laput G. (2022). Enabling hand gesture customization on wrist-worn devices [Paper presentation]. Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (pp. 1–19). https://doi.org/10.1145/3491102.3501904

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.