Search in:

Advanced search

Applied Artificial Intelligence

An International Journal

Volume 37, 2023 - Issue 1

Submit an article Journal homepage

Open access

942

Views

CrossRef citations to date

Altmetric

Research Article

Learning Bidirectional Action-Language Translation with Limited Supervision and Testing with Incongruent Input

Ozan Özdemira Knowledge Technology, Department of Informatics, University of Hamburg, Hamburg, GermanyCorrespondence[email protected]

https://orcid.org/0000-0003-2410-5192 View further author information

Matthias Kerzela Knowledge Technology, Department of Informatics, University of Hamburg, Hamburg, GermanyView further author information

Cornelius Webera Knowledge Technology, Department of Informatics, University of Hamburg, Hamburg, GermanyView further author information

Jae Hee Leea Knowledge Technology, Department of Informatics, University of Hamburg, Hamburg, GermanyView further author information

Muhammad Burhan Hafeza Knowledge Technology, Department of Informatics, University of Hamburg, Hamburg, GermanyView further author information

Patrick Brunsb Biological Psychology and Neuropsychology, University of Hamburg, Hamburg, GermanyView further author information

Stefan Wermtera Knowledge Technology, Department of Informatics, University of Hamburg, Hamburg, GermanyView further author information

show all

Article: 2179167 | Received 23 Dec 2022, Accepted 02 Feb 2023, Published online: 22 Feb 2023

Cite this article
https://doi.org/10.1080/08839514.2023.2179167
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Abramson, J., A. Ahuja, A. Brussee, F. Carnevale, M. Cassin, S. Clark, A. Dudzik, P. Georgiev, A. Guy, T. Harley, et al. 2020. Imitating interactive intelligence. arXiv preprint arXiv: 201205672 abs/2012.05672:1–768.
Google Scholar
Ahn, M., A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, C. Fu, K. Gopalakrishnan, K. Hausman, et al. 2022. Do as I can and not as I say: Grounding language in robotic affordances. arXiv preprint arXiv: 220401691 abs/2204.01691:1–34.
Google Scholar
Antunes, A., A. Laflaquiere, T. Ogata, and A. Cangelosi. 2019. A bi-directional multiple timescales LSTM model for grounding of actions and verbs. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, pp. 2614–21.
Google Scholar
Aravena, P., E. Hurtado, R. Riveros, J. F. Cardona, F. Manes, and A. Ibáñez. 2010. Applauding with closed hands: Neural signature of action-sentence compatibility effects. PLoS One 5 (7):e11751. doi:10.1371/journal.pone.0011751.
PubMed Web of Science ®Google Scholar
Arevalo, J., T. Solorio, M. Montes-y Gómez, and F. A. González. 2020. Gated multimodal networks. Neural Computing & Applications 32 (14):10209–28. doi:10.1007/s00521-019-04559-1.
Web of Science ®Google Scholar
Bisk, Y., A. Holtzman, J. Thomason, J. Andreas, Y. Bengio, J. Chai, M. Lapata, A. Lazaridou, J. May, A. Nisnevich, et al. 2020, November. Experience grounds language. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Virtual, pp. 8718–35. Association for Computational Linguistics.
Google Scholar
Brown, T., B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33:1877–901.
Google Scholar
Canals, L., and Y. Mor. 2023. Towards a signature pedagogy for technology-enhanced task-based language teaching: Defining its design principles. ReCALL 35 (1):4–18. doi:10.1017/S0958344022000118.
Web of Science ®Google Scholar
Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2019, June. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–86. Association for Computational Linguistics.
Google Scholar
Eisermann, A., J. H. Lee, C. Weber, and S. Wermter. 2021, Jul. Generalization in multimodal language learning from simulation. In Proceedings of the International Joint Conference on Neural Networks (IJCNN 2021), Shenzhen, China.
Google Scholar
Glenberg, A. M., and M. P. Kaschak. 2002. Grounding language in action. Psychonomic Bulletin & Review 9 (3):558–65. doi:10.3758/BF03196313.
PubMed Web of Science ®Google Scholar
Hatori, J., Y. Kikuchi, S. Kobayashi, K. Takahashi, Y. Tsuboi, Y. Unno, W. Ko, and J. Tan. 2018. Interactively picking real-world objects with unconstrained spoken language instructions. In 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, pp. 3774–81. IEEE.
Google Scholar
Hauk, O., I. Johnsrude, and F. Pulvermüller. 2004. Somatotopic representation of action words in human motor and premotor cortex. Neuron 41 (2):301–07. doi:10.1016/S0896-6273(03)00838-9.
PubMed Web of Science ®Google Scholar
Heinrich, S., Y. Yao, T. Hinz, Z. Liu, T. Hummel, M. Kerzel, C. Weber, and S. Wermter. 2020. Crossmodal language grounding in an embodied neurocognitive model. Frontiers in Neurorobotics 14:52. doi:10.3389/fnbot.2020.00052.
PubMed Web of Science ®Google Scholar
Hochreiter, S., and J. Schmidhuber. 1997. Long short-term memory. Neural Computation 9 (8):1735–80. doi:10.1162/neco.1997.9.8.1735.
PubMed Web of Science ®Google Scholar
Irshad, M. Z., C.-Y. Ma, and Z. Kira. 2021. Hierarchical cross-modal agent for robotics vision-and-language navigation. In 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi'an, China, pp. 13238–46.
Google Scholar
Jaegle, A., S. Borgeaud, J.-B. Alayrac, C. Doersch, C. Ionescu, D. Ding, S. Koppula, D. Zoran, A. Brock, E. Shelhamer, et al. 2022. Perceiver io: A general architecture for structured inputs & outputs. In International Conference on Learning Representations, Virtual.
Google Scholar
Jang, E., A. Irpan, M. Khansari, D. Kappler, F. Ebert, C. Lynch, S. Levine, and C. Finn. 2021. BC-z: Zero-shot task generalization with robotic imitation learning. In 5th Annual Conference on Robot Learning, London, UK.
Google Scholar
Jiang, Y., A. Gupta, Z. Zhang, G. Wang, Y. Dou, Y. Chen, L. Fei-Fei, A. Anandkumar, Y. Zhu, and L. Fan. 2022. VIMA: General robot manipulation with multimodal prompts.
Google Scholar
Kaschak, M. P., C. J. Madden, D. J. Therriault, R. H. Yaxley, M. Aveyard, A. A. Blanchard, and R. A. Zwaan. 2005. Perception of motion affects language processing. Cognition 94 (3):B79–89. doi:10.1016/j.cognition.2004.06.005.
PubMed Web of Science ®Google Scholar
Kerzel, M., T. Pekarek-Rosin, E. Strahl, S. Heinrich, and S. Wermter. 2020. Teaching NICO how to grasp: An empirical study on crossmodal social interaction as a key factor for robots learning from humans. Frontiers in Neurorobotics 14:28. doi:10.3389/fnbot.2020.00028.
PubMed Web of Science ®Google Scholar
Kerzel, M., E. Strahl, S. Magg, N. Navarro-Guerrero, S. Heinrich, and S. Wermter. 2017. Nico—neuro-Inspired COmpanion: A developmental humanoid robot platform for multimodal interaction. In 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 113–20. IEEE.
Google Scholar
Kingma, D. P., and J. Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR, San Diego, CA, USA, May 7-9.
Google Scholar
Lu, J., D. Batra, D. Parikh, and S. Lee. 2019. ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In Advances in neural information processing systems, ed. H. Wallach, H. Larochelle, A. Beygelzimer, F. D Alché-Buc, E. Fox, and R. Garnett, vol. 32, 13–23. New York, US: Curran Associates, Inc.
Google Scholar
Lynch, C., and P. Sermanet. 2021. Language conditioned imitation learning over unstructured data. In Robotics: Science and system XVII, D. A. Shell, M. Toussaint, and M. A. Hsieh ed, 1–18. Virtual.
Google Scholar
Meteyard, L., B. Bahrami, and G. Vigliocco. 2007. Motion detection and motion verbs: Language affects low-level visual perception. Psychological Science 18 (11):1007–13. PMID: 17958716. doi:10.1111/j.1467-9280.2007.02016.x.
PubMed Web of Science ®Google Scholar
Ogata, T., M. Murase, J. Tani, K. Komatani, and H. G. Okuno. 2007. Two-way translation of compound sentences and arm motions by recurrent neural networks. In 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego, CA, USA, pp. 1858–63.
Google Scholar
Özdemir, O., M. Kerzel, C. Weber, J. H. Lee, and S. Wermter. 2022. Learning flexible translation between robot actions and language descriptions. In Artificial neural networks and machine learning – ICANN 2022, ed. E. Pimenidis, P. Angelov, C. Jayne, A. Papaleonidas, and M. Aydin, 246–57. Cham: Springer Nature Switzerland.
Google Scholar
Özdemir, O., M. Kerzel, and S. Wermter. 2021, Aug. Embodied language learning with paired variational autoencoders. In 2021 IEEE International Conference on Development and Learning (ICDL), Beijing, China, pp. 1–6.
Google Scholar
Radford, A., J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, Virtual, pp. 8748–63. PMLR.
Google Scholar
Radford, A., J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. 2019. Language models are unsupervised multitask learners.
Google Scholar
Raffel, C., N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21 (140):1–67. doi:10.1214/10-BA521.
PubMedGoogle Scholar
Reed, S., K. Zolna, E. Parisotto, S. G. Colmenarejo, A. Novikov, G. Barth-Maron, M. Gimenez, Y. Sulsky, J. Kay, J. T. Springenberg, et al. 2022. A generalist agent. Transactions on Machine Learning Research 11/2022:1–42 .
Google Scholar
Shao, L., T. Migimatsu, Q. Zhang, K. Yang, and J. Bohg. 2020. Concept2robot: Learning manipulation concepts from instructions and human demonstrations. In Proceedings of Robotics: Science and Systems (RSS), Corvalis, Oregon, USA.
Google Scholar
Shridhar, M., L. Manuelli, and D. Fox. 2021. Cliport: What and where pathways for robotic manipulation. In Proceedings of the 5th Conference on Robot Learning (CoRL), London, UK.
Google Scholar
Shridhar, M., L. Manuelli, and D. Fox. 2022. Perceiver-Actor: A multi-task transformer for robotic manipulation. In Proceedings of the 6th Conference on Robot Learning (CoRL), Auckland, New Zealand.
Google Scholar
Shridhar, M., D. Mittal, and D. Hsu. 2020. INGRESS: Interactive visual grounding of referring expressions. The International Journal of Robotics Research 39 (2–3):217–32. doi:10.1177/0278364919897133.
Web of Science ®Google Scholar
van Elk, M., H. T. van Schie, R. A. Zwaan, and H. Bekkering. 2010. The functional role of motor activation in language processing: Motor cortical oscillations support lexical-semantic retrieval. Neuroimage 50 (2):665–77. doi:10.1016/j.neuroimage.2009.12.123.
PubMed Web of Science ®Google Scholar
Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30: 5998–6008.
Google Scholar
Winter, A., C. Dudschig, J. Miller, R. Ulrich, and B. Kaup. 2022. The action-sentence compatibility effect (ace): Meta-analysis of a benchmark finding for embodiment. Acta Psychologica 230:103712. doi:10.1016/j.actpsy.2022.103712.
PubMed Web of Science ®Google Scholar
Yamada, T., H. Matsunaga, and T. Ogata. 2018. Paired recurrent autoencoders for bidirectional translation between robot actions and linguistic descriptions. IEEE Robotics and Automation Letters 3 (4):3441–48. doi:10.1109/LRA.2018.2852838.
Web of Science ®Google Scholar
Zeng, A., P. Florence, J. Tompson, S. Welker, J. Chien, M. Attarian, T. Armstrong, I. Krasin, D. Duong, V. Sindhwani, et al. 2020. Transporter networks: Rearranging the visual world for robotic manipulation. In 4th Conference on Robot Learning, CoRL 2020, 16-18 November 2020, virtual Event/Cambridge, MA, USA, Volume 155 of proceedings of Machine Learning Research, ed. J. Kober, F. Ramos, and C. J. Tomlin, 726–47. PMLR.
Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Learning Bidirectional Action-Language Translation with Limited Supervision and Testing with Incongruent Input

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Learning Bidirectional Action-Language Translation with Limited Supervision and Testing with Incongruent Input

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date