942
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Learning Bidirectional Action-Language Translation with Limited Supervision and Testing with Incongruent Input

ORCID Icon, , , , , & show all
Article: 2179167 | Received 23 Dec 2022, Accepted 02 Feb 2023, Published online: 22 Feb 2023

References

  • Abramson, J., A. Ahuja, A. Brussee, F. Carnevale, M. Cassin, S. Clark, A. Dudzik, P. Georgiev, A. Guy, T. Harley, et al. 2020. Imitating interactive intelligence. arXiv preprint arXiv: 201205672 abs/2012.05672:1–768.
  • Ahn, M., A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, C. Fu, K. Gopalakrishnan, K. Hausman, et al. 2022. Do as I can and not as I say: Grounding language in robotic affordances. arXiv preprint arXiv: 220401691 abs/2204.01691:1–34.
  • Antunes, A., A. Laflaquiere, T. Ogata, and A. Cangelosi. 2019. A bi-directional multiple timescales LSTM model for grounding of actions and verbs. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, pp. 2614–21.
  • Aravena, P., E. Hurtado, R. Riveros, J. F. Cardona, F. Manes, and A. Ibáñez. 2010. Applauding with closed hands: Neural signature of action-sentence compatibility effects. PLoS One 5 (7):e11751. doi:10.1371/journal.pone.0011751.
  • Arevalo, J., T. Solorio, M. Montes-y Gómez, and F. A. González. 2020. Gated multimodal networks. Neural Computing & Applications 32 (14):10209–28. doi:10.1007/s00521-019-04559-1.
  • Bisk, Y., A. Holtzman, J. Thomason, J. Andreas, Y. Bengio, J. Chai, M. Lapata, A. Lazaridou, J. May, A. Nisnevich, et al. 2020, November. Experience grounds language. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Virtual, pp. 8718–35. Association for Computational Linguistics.
  • Brown, T., B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33:1877–901.
  • Canals, L., and Y. Mor. 2023. Towards a signature pedagogy for technology-enhanced task-based language teaching: Defining its design principles. ReCALL 35 (1):4–18. doi:10.1017/S0958344022000118.
  • Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2019, June. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–86. Association for Computational Linguistics.
  • Eisermann, A., J. H. Lee, C. Weber, and S. Wermter. 2021, Jul. Generalization in multimodal language learning from simulation. In Proceedings of the International Joint Conference on Neural Networks (IJCNN 2021), Shenzhen, China.
  • Glenberg, A. M., and M. P. Kaschak. 2002. Grounding language in action. Psychonomic Bulletin & Review 9 (3):558–65. doi:10.3758/BF03196313.
  • Hatori, J., Y. Kikuchi, S. Kobayashi, K. Takahashi, Y. Tsuboi, Y. Unno, W. Ko, and J. Tan. 2018. Interactively picking real-world objects with unconstrained spoken language instructions. In 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, pp. 3774–81. IEEE.
  • Hauk, O., I. Johnsrude, and F. Pulvermüller. 2004. Somatotopic representation of action words in human motor and premotor cortex. Neuron 41 (2):301–07. doi:10.1016/S0896-6273(03)00838-9.
  • Heinrich, S., Y. Yao, T. Hinz, Z. Liu, T. Hummel, M. Kerzel, C. Weber, and S. Wermter. 2020. Crossmodal language grounding in an embodied neurocognitive model. Frontiers in Neurorobotics 14:52. doi:10.3389/fnbot.2020.00052.
  • Hochreiter, S., and J. Schmidhuber. 1997. Long short-term memory. Neural Computation 9 (8):1735–80. doi:10.1162/neco.1997.9.8.1735.
  • Irshad, M. Z., C.-Y. Ma, and Z. Kira. 2021. Hierarchical cross-modal agent for robotics vision-and-language navigation. In 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi'an, China, pp. 13238–46.
  • Jaegle, A., S. Borgeaud, J.-B. Alayrac, C. Doersch, C. Ionescu, D. Ding, S. Koppula, D. Zoran, A. Brock, E. Shelhamer, et al. 2022. Perceiver io: A general architecture for structured inputs & outputs. In International Conference on Learning Representations, Virtual.
  • Jang, E., A. Irpan, M. Khansari, D. Kappler, F. Ebert, C. Lynch, S. Levine, and C. Finn. 2021. BC-z: Zero-shot task generalization with robotic imitation learning. In 5th Annual Conference on Robot Learning, London, UK.
  • Jiang, Y., A. Gupta, Z. Zhang, G. Wang, Y. Dou, Y. Chen, L. Fei-Fei, A. Anandkumar, Y. Zhu, and L. Fan. 2022. VIMA: General robot manipulation with multimodal prompts.
  • Kaschak, M. P., C. J. Madden, D. J. Therriault, R. H. Yaxley, M. Aveyard, A. A. Blanchard, and R. A. Zwaan. 2005. Perception of motion affects language processing. Cognition 94 (3):B79–89. doi:10.1016/j.cognition.2004.06.005.
  • Kerzel, M., T. Pekarek-Rosin, E. Strahl, S. Heinrich, and S. Wermter. 2020. Teaching NICO how to grasp: An empirical study on crossmodal social interaction as a key factor for robots learning from humans. Frontiers in Neurorobotics 14:28. doi:10.3389/fnbot.2020.00028.
  • Kerzel, M., E. Strahl, S. Magg, N. Navarro-Guerrero, S. Heinrich, and S. Wermter. 2017. Nico—neuro-Inspired COmpanion: A developmental humanoid robot platform for multimodal interaction. In 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 113–20. IEEE.
  • Kingma, D. P., and J. Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR, San Diego, CA, USA, May 7-9.
  • Lu, J., D. Batra, D. Parikh, and S. Lee. 2019. ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In Advances in neural information processing systems, ed. H. Wallach, H. Larochelle, A. Beygelzimer, F. D Alché-Buc, E. Fox, and R. Garnett, vol. 32, 13–23. New York, US: Curran Associates, Inc.
  • Lynch, C., and P. Sermanet. 2021. Language conditioned imitation learning over unstructured data. In Robotics: Science and system XVII, D. A. Shell, M. Toussaint, and M. A. Hsieh ed, 1–18. Virtual.
  • Meteyard, L., B. Bahrami, and G. Vigliocco. 2007. Motion detection and motion verbs: Language affects low-level visual perception. Psychological Science 18 (11):1007–13. PMID: 17958716. doi:10.1111/j.1467-9280.2007.02016.x.
  • Ogata, T., M. Murase, J. Tani, K. Komatani, and H. G. Okuno. 2007. Two-way translation of compound sentences and arm motions by recurrent neural networks. In 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego, CA, USA, pp. 1858–63.
  • Özdemir, O., M. Kerzel, C. Weber, J. H. Lee, and S. Wermter. 2022. Learning flexible translation between robot actions and language descriptions. In Artificial neural networks and machine learning – ICANN 2022, ed. E. Pimenidis, P. Angelov, C. Jayne, A. Papaleonidas, and M. Aydin, 246–57. Cham: Springer Nature Switzerland.
  • Özdemir, O., M. Kerzel, and S. Wermter. 2021, Aug. Embodied language learning with paired variational autoencoders. In 2021 IEEE International Conference on Development and Learning (ICDL), Beijing, China, pp. 1–6.
  • Radford, A., J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, Virtual, pp. 8748–63. PMLR.
  • Radford, A., J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. 2019. Language models are unsupervised multitask learners.
  • Raffel, C., N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21 (140):1–67. doi:10.1214/10-BA521.
  • Reed, S., K. Zolna, E. Parisotto, S. G. Colmenarejo, A. Novikov, G. Barth-Maron, M. Gimenez, Y. Sulsky, J. Kay, J. T. Springenberg, et al. 2022. A generalist agent. Transactions on Machine Learning Research 11/2022:1–42 .
  • Shao, L., T. Migimatsu, Q. Zhang, K. Yang, and J. Bohg. 2020. Concept2robot: Learning manipulation concepts from instructions and human demonstrations. In Proceedings of Robotics: Science and Systems (RSS), Corvalis, Oregon, USA.
  • Shridhar, M., L. Manuelli, and D. Fox. 2021. Cliport: What and where pathways for robotic manipulation. In Proceedings of the 5th Conference on Robot Learning (CoRL), London, UK.
  • Shridhar, M., L. Manuelli, and D. Fox. 2022. Perceiver-Actor: A multi-task transformer for robotic manipulation. In Proceedings of the 6th Conference on Robot Learning (CoRL), Auckland, New Zealand.
  • Shridhar, M., D. Mittal, and D. Hsu. 2020. INGRESS: Interactive visual grounding of referring expressions. The International Journal of Robotics Research 39 (2–3):217–32. doi:10.1177/0278364919897133.
  • van Elk, M., H. T. van Schie, R. A. Zwaan, and H. Bekkering. 2010. The functional role of motor activation in language processing: Motor cortical oscillations support lexical-semantic retrieval. Neuroimage 50 (2):665–77. doi:10.1016/j.neuroimage.2009.12.123.
  • Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30: 5998–6008.
  • Winter, A., C. Dudschig, J. Miller, R. Ulrich, and B. Kaup. 2022. The action-sentence compatibility effect (ace): Meta-analysis of a benchmark finding for embodiment. Acta Psychologica 230:103712. doi:10.1016/j.actpsy.2022.103712.
  • Yamada, T., H. Matsunaga, and T. Ogata. 2018. Paired recurrent autoencoders for bidirectional translation between robot actions and linguistic descriptions. IEEE Robotics and Automation Letters 3 (4):3441–48. doi:10.1109/LRA.2018.2852838.
  • Zeng, A., P. Florence, J. Tompson, S. Welker, J. Chien, M. Attarian, T. Armstrong, I. Krasin, D. Duong, V. Sindhwani, et al. 2020. Transporter networks: Rearranging the visual world for robotic manipulation. In 4th Conference on Robot Learning, CoRL 2020, 16-18 November 2020, virtual Event/Cambridge, MA, USA, Volume 155 of proceedings of Machine Learning Research, ed. J. Kober, F. Ramos, and C. J. Tomlin, 726–47. PMLR.