11,785
Views
32
CrossRef citations to date
0
Altmetric
Survey Paper

Survey on frontiers of language and robotics

, , , , , , , , & show all
Pages 700-730 | Received 06 Mar 2019, Accepted 02 Jun 2019, Published online: 24 Jun 2019

References

  • Mavridis N. A review of verbal and non-verbal human–robot interactive communication. Rob Auton Syst. 2015;63:22–35.
  • Kanda T, Ishiguro H, Imai M, et al. Body movement analysis of human–robot interaction. International Joint Conferences on Artificial Intelligence (IJCAI); Acapulco, Mexico; Vol. 3; 2003. p. 177–182.
  • Okuno Y, Kanda T, Imai M, et al. Providing route directions: design of robot's utterance, gesture, and timing. ACM/IEEE International Conference on Human Robot Interaction; San Diego, California, USA; 2009. p. 53–60.
  • Admoni H, Scassellati B. Social eye gaze in human-robot interaction: a review. J Hum Rob Interact. 2017;6(1):25–63.
  • Mutlu B, Yamaoka F, Kanda T, et al. Nonverbal leakage in robots: communication of intentions through seemingly unintentional behavior. ACM/IEEE International Conference on Human Robot Interaction; San Diego, California, USA; 2009. p. 69–76.
  • Nakadai K, Takahashi T, Okuno HG, et al. Design and implementation of robot audition system'hark'-open source software for listening to three simultaneous speakers. Adv Robot. 2010;24(5–6):739–761.
  • Kostavelis I, Gasteratos A. Semantic mapping for mobile robotics tasks: a survey. Rob Auton Syst. 2015;66:86–103.
  • Russakovsky O, Deng J, Su H, et al. Imagenet large scale visual recognition challenge. Int J Comput Vis. 2015;115(3):211–252.
  • Noda K, Yamaguchi Y, Nakadai K, et al. Audio-visual speech recognition using deep learning. Appl Intell. 2015;42(4):722–737.
  • Siagian C, Itti L. Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Trans Pattern Anal Mach Intell. 2007;29(2):300–312.
  • Wu J, Rehg JM. Centrist: a visual descriptor for scene categorization. IEEE Trans Pattern Anal Mach Intell. 2010;33(8):1489–1501.
  • Iwahashi N. Language acquisition through a human–robot interface by combining speech, visual, and behavioral information. Inf Sci. 2003;156:109–121.
  • Iwahashi N. Interactive learning of spoken words and their meanings through an audio-visual interface. IEICE Trans Inf Syst. 2008;2:312–321.
  • Hatori J, Kikuchi Y, Kobayashi S, et al. Interactively picking real-world objects with unconstrained spoken language instructions. IEEE International Conference on Robotics and Automation (ICRA); Brisbane, Australia; 2018. p. 3774–3781.
  • Anderson P, Wu Q, Teney D, et al. Vision-and-language navigation: interpreting visually-grounded navigation instructions in real environments. IEEE/CVF Conference on Computer Vision and Pattern Recognition; Piscataway, NJ; 2018. p. 3674–3683.
  • Hermann KM, Hill F, Green S, et al. Grounded language learning in a simulated 3D world. CoRR. 2017. abs/1706.06551.
  • Taniguchi T, Nagai T, Nakamura T, et al. Symbol emergence in robotics: a survey. Adv Robot. 2016;30(11–12):706–728.
  • Taniguchi T, Ugur E, Hoffmann M, et al. Symbol emergence in cognitive developmental systems: a survey. IEEE Trans Cogn Dev Syst. 2018. doi: 10.1109/TCDS.2018.2867772
  • Iwahashi N. A method for forming mutual beliefs for communication through human–robot multi-modal interaction. Proceedings of the Fourth SIGdial Workshop on Discourse and Dialogue; Sapporo, Japan; 2003. p. 79–86.
  • Harnad S. The symbol grounding problem. Phys D. 1990;42(1):335–346.
  • Plunkett K, Sinha C, Moller MF, et al. Symbol grounding or the emergence of symbols? vocabulary growth in children and a connectionist net. Conn Sci. 1992;4(3–4):293–312.
  • Steels L. The symbol grounding problem has been solved, so what's next? Symbols and embodiment: debates on meaning and cognition. Oxford, UK: Oxford University Press; 2008. p. 223–244.
  • Lakoff G, Johnson M. Philosophy in the flesh. Vol. 4. New York, USA: Basic books; 1999.
  • Gibbs RW Jr, Lima PLC, Francozo E. Metaphor is grounded in embodied experience. J Pragmat. 2004;36(7):1189–1210.
  • Feldman J. From molecule to metaphor: a neural theory of language. Cambridge, MA: MIT press; 2008.
  • Huang PY, Liu F, Shiang SR, et al. Attention-based multimodal neural machine translation. Proceedings of the First Conference on Machine Translation (WMT16); Berlin, Germany; Vol. 2; 2016. p. 639–645.
  • Kiros R, Salakhutdinov R, Zemel RS. Unifying visual-semantic embeddings with multimodal neural language models. Preprint; 2014. arXiv:14112539.
  • Vinyals O, Toshev A, Bengio S, et al. Show and tell: Lessons learned from the 2015 mscoco image captioning challenge. IEEE Trans Pattern Anal Mach Intell. 2016;39(4):652–663.
  • Antol S, Agrawal A, Lu J, et al. VQA: visual question answering. Proceedings of the IEEE International Conference on Computer Vision; Santiago, Chile; 2015. p. 2425–2433.
  • Jamone L, Ugur E, Cangelosi A, et al. Affordances in psychology, neuroscience and robotics: a survey. IEEE Trans Cogn Dev Syst. 2016;10(1):4–25.
  • Savage J, Rosenblueth DA, Matamoros M, et al. Semantic reasoning in service robots using expert systems. Rob Auton Syst. 2019;114:77–92.
  • Horn A. On sentences which are true of direct unions of algebras. J Symbolic Logic. 1951;16(1):14–21.
  • Hobbs JR, Stickel ME, Appelt DE, et al. Interpretation as abduction. Artif Intell. 1993;63(1–2):69–142.
  • Gelfond M, Lifschitz V. The stable model semantics for logic programming. In: Kowalski R, Bowen, Kenneth, editors. Proceedings of International Logic Programming Conference and Symposium. MIT Press; 1988. p. 1070–1080.
  • Sato T. A statistical learning method for logic programs with distributional semantics. The 12th International Conference on Logic Programming; Tokyo; 1995. p. 715–729.
  • Muggleton S. Stochastic logic programs. Adv Induct Logic Program. 1996;32:254–264.
  • De Raedt L, Kimmig A, Toivonen H. ProbLog: a probabilistic prolog and its application in link discovery. International Joint Conference on Artificial Intelligence; 2007. p. 2468–2473.
  • Richardson M, Domingos P. Markov Logic Networks. Mach Learn. 2006;62(1–2):107–136.
  • Bach SH, Broecheler M, Huang B, et al. Hinge-loss markov random fields and probabilistic soft logic. J Mach Learn Res (JMLR). 2017;18:1–67.
  • Van Gelder A, Ross KA, Schlipf JS. The well-founded semantics for general logic programs. J ACM. 1991;38(3):619–649.
  • Fierens D, Van den Broeck G, Renkens J, et al. Inference and learning in probabilistic logic programs using weighted boolean formulas. Theory and Pract Log Program. 2015;15:358–401.
  • Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems; Stateline, Nevada, USA; 2013. p. 3111–3119.
  • Pennington J, Socher R, Manning CD. GloVe: global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing; Doha, Qatar; 2014. p. 1532–1543.
  • Kiros R, Zhu Y, Salakhutdinov RR, et al. Skip-thought vectors. Advances in Neural Information Processing Systems; Montreal, Canada; 2015. p. 3294–3302.
  • Conneau A, Kiela D, Schwenk H, et al. Supervised learning of universal sentence representations from natural language inference data. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing; Copenhagen, Denmark; 2017. p. 670–680.
  • Cohen WW. Tensorlog: a differentiable deductive database. CoRR; 2016. abs/1605.06523.
  • Lewis M, Steedman M. Combined distributional and logical semantics. Trans Assoc Comput Linguist Action Editor. 2013;1:179–192.
  • Wang WY, Cohen WW. Learning first-order logic embeddings via matrix factorization. Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence; New York, USA; 2016. p. 2132–2138.
  • Bowman SR, Potts C, Manning CD. Learning distributed word representations for natural logic reasoning; 2014. p. 10–13.
  • Tian R, Okazaki N, Inui K. Learning semantically and additively compositional distributional representations. Annual Meeting of the Association for Computational Linguistics; 2016. p. 1277–1287.
  • Yanaka H, Mineshima K, Martínez-Gómez P, et al. Determining semantic textual similarity using natural deduction proofs. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing; Copenhagen, Denmark; 2017 Sep. Association for Computational Linguistics. p. 681–691.
  • Rocktäschel T, Riedel S. End-to-end differentiable proving. 2017. p. 3788–3800.
  • Modi A. Event embeddings for semantic script modeling. Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning; Berlin, Germany; 2016. p. 75–83.
  • Cai H, Zheng VW, Chang KCC. A comprehensive survey of graph embedding: Problems, techniques, and applications. IEEE Trans Knowl Data Eng. 2018;30:1616–1637.
  • Wang Q, Mao Z, Wang B, et al. Knowledge graph embedding: a survey of approaches and applications. IEEE Trans Knowl Data Eng. 2017;29(12):2724–2743.
  • Weber N, Balasubramanian N, Chambers N. Event representations with tensor-based compositions. CoRR; 2017. abs/1711.07611.
  • Bordes A, Usunier N, Garcia-Duran A, et al. Translating embeddings for modeling multi-relational data. Advances in Neural Information Processing Systems 26 (NIPS); Stateline, Nevada, USA; 2013. p. 2787–2795.
  • Jurafsky D, Martin JH. Speech and language processing. 2nd ed. Upper Saddle River, NJ: Prentice Hall; 2008. (Prentice hall series in artificial intelligence).
  • Kong L, Rush AM, Smith NA. Transforming dependencies into phrase structures. Denver, CO: North American Chapter of the Association for Computational Linguistics; 2015. p. 788–798.
  • Steedman M. The syntactic process. Cambridge, MA: MIT Press; 2000.
  • Shindo H, Miyao Y, Fujino A, et al. Bayesian symbol-refined tree substitution grammars for syntactic parsing. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics; Jeju Island, Korea; 2012. p. 440–448.
  • Matsuzaki T, Miyao Y, Tsujii J. Probabilistic CFG with latent annotations. Association for Computational Linguistics; Michigan, USA; 2005. p. 75–82.
  • Klein D, Manning C. Corpus-based induction of syntactic structure: Models of dependency and constituency. Annual Conference of Association for Computational Linguistics; Barcelona, Spain; 2004. p. 478–485.
  • Headden WP III, Johnson M, McClosky D. Improving unsupervised dependency parsing with richer contexts and smoothing. Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; Boulder, CO, USA; 2009. p. 101–109.
  • Spitkovsky VI, Alshawi H, Jurafsky D. From baby steps to leapfrog: how “less is more” in unsupervised pependency parsing. Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; Los Angeles, USA; 2010. p. 751–759.
  • Jiang Y, Han W, Tu K. Unsupervised neural dependency parsing. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing; Texas, USA; 2016. p. 763–771.
  • Manning CD, Schütze H. Foundations of statistical natural language processing. Cambridge, MA: MIT Press; 1999.
  • Johnson M, Griffiths T, Goldwater S. Bayesian inference for PCFGs via Markov chain Monte Carlo. Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference; Rochester, NY, USA; 2007. p. 139–146.
  • Pate JK, Johnson M. Grammar induction from (lots of) words alone. International Conference on Computational Linguistics; Osaka, Japan; 2016. p. 23–32.
  • Levy RP, Reali F, Griffiths TL. Modeling the effects of memory on human online sentence processing with particle filters. Advances in Neural Information Processing Systems 21; Vancouver, BC, Canada; 2009. p. 937–944.
  • Hockenmaier J, Steedman M. Generative models for statistical parsing with combinatory categorial grammar. Annual Meeting of the Association for Computational Linguistics; Philadelphia, Pennsylvania, USA; 2002. p. 335–342.
  • Bisk Y, Hockenmaier J. An HDP model for inducing combinatory categorial grammars. Trans Assoc Comput Linguist. 2013;1:75–88.
  • Teh YW, Jordan MI, Beal MJ, et al. Hierarchical Dirichlet processes. J Amer Statist Assoc. 2006;101(476):1566–1581.
  • Liang P, Petrov S, Jordan M, et al. The infinite PCFG using hierarchical Dirichlet processes. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL); Prague, Czech Republic; 2007. p. 688–697.
  • Martínez-Gómez P, Mineshima K, Miyao Y, et al. ccg2lambda: a compositional semantics system. ACL-2016 System Demonstrations; 2016. p. 85–90.
  • Bansal M, Matuszek C, Andreas J, et al. Proceedings of the first workshop on language grounding for robotics; 2017. Available from: https://robonlp2017.github.io.
  • Poon H. Grounded unsupervised semantic parsing. ACL 2013; 2013. p. 933–943.
  • Poon H, Domingos P. Unsupervised semantic parsing. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing; Singapore; 2009. p. 1–10.
  • Socher R, Karpathy A, Le QV, et al. Grounded compositional semantics for finding and describing images with sentences. Trans Assoc Comput Linguist. 2014;2(1):207–218.
  • Vinyals O, Toshev A, Bengio S, et al. Show and tell: a neural image caption generator. IEEE/CVF Conference on Computer Vision and Pattern Recognition; Boston, MA, USA; 2015. p. 3156–3164.
  • Xu K, Ba J, Kiros R, et al. Show, attend and tell: neural image caption generation with visual attention. International Conference on Machine Learning (ICML); Lille, France; 2015. p. 2048–2057.
  • Karpathy A, Fei-Fei L. Deep visual-semantic alignments for generating image descriptions. IEEE/CVF Conference on Computer Vision and Pattern Recognition; Boston, MA, USA; 2015. p. 3128–3137.
  • Thomason J, Sinapov J, Mooney RJ, et al. Guiding exploratory behaviors for multi-modal grounding of linguistic descriptions. AAAI; 2018.
  • Amiri S, Wei S, Zhang S, et al. Multi-modal predicate identification using dynamically learned robot controllers. Proceedings of the 25th International Joint Conference on Artificial Intelligence; Stockholm, Sweden; 2018. p. 4638–4645.
  • Attamimi M, Ando Y, Nakamura T, et al. Learning word meanings and grammar for verbalization of daily life activities using multilayered multimodal latent Dirichlet allocation and bayesian hidden markov models. Adv Robot. 2016;30(11–12):806–824.
  • Aly A, Taniguchi T, Mochihashi D. A probabilistic approach to unsupervised induction of combinatory categorial grammar in situated human-robot interaction. IEEE-RAS 18th International Conference on Humanoid Robots; Beijing, China; 2018. p. 1–9.
  • Radden G, Dirven R. Cognitive English grammar. Vol. 2. Amsterdam, The Netherlands: John Benjamins Publishing; 2007.
  • Taylor JR. Linguistic categorization. Oxford, UK: Oxford University Press; 2003.
  • Croft W, Cruse DA. Cognitive linguistics. Cambridge, UK: Cambridge University Press; 2004.
  • Gumperz JJ, Levinson SC. Rethinking linguistic relativity. Curr Anthropol. 1991;32(5):613–623.
  • Winston ME, Chaffin R, Herrmann D. A taxonomy of part–whole relations. Cogn Sci. 1987;11(4):417–444.
  • Fillmore CJ. An alternative to checklist theories of meaning. Annual Meeting of the Berkeley Linguistics Society; Vol. 1; 1975. p. 123–131.
  • Fillmore CJ. Frame semantics. Seoul: Hanshin Publishing Co.; 1982. p. 111–137.
  • Dove G. Thinking in words: language as an embodied medium of thought. Top Cogn Sci. 2014;6(3):371–389.
  • Cangelosi A, Stramandinoli F. A review of abstract concept learning in embodied agents and robots. Philos Trans R Soc B. 2018;373(1752):20170131.
  • Akira U. A distributional semantic model of visually indirect grounding for abstract words. Proceedings of NIPS 2018, Workshop on Visually Grounded Interaction and Language (ViGIL); Montreal, Canada; 2018.
  • Barsalou LW. Ad hoc categories. Mem Cognit. 1983;11(3):211–227.
  • Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems; Stateline, Nevada, USA; 2012. p. 1097–1105.
  • Zeiler MD, Fergus R. Visualizing and understanding convolutional networks. European Conference on Computer Vision; Springer; 2014. p. 818–833.
  • Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations; San Diego, CA; 2015.
  • He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Las Vegas, Nevada; 2016. p. 770–778.
  • Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning. AAAI; Vol. 4; 2017. p. 12.
  • Fergus R, Perona P, Zisserman A. Object class recognition by unsupervised scale-invariant learning. IEEE Conference on Computer Vision and Pattern Recognition; Madison, Wisconsin, USA; Vol. 2; 2003. p. 264–271.
  • Sivic J, Russell BC, Efros AA, et al. Discovering object categories in image collections. IEEE International Conference on Computer Vision; Beijing, China; 2005. p. 17–20.
  • Fei-Fei L. A Bayesian Hierarchical model for learning natural scene categories. IEEE Conference on Computer Vision and Pattern Recognition; San Diego, CA, USA; 2005. p. 524–531.
  • Wang C, Blei D, Fei-Fei L. Simultaneous image classification and annotation. IEEE Conference on Computer Vision and Pattern Recognition; Miami Beach, FL, USA; 2009. p. 1903–1910.
  • Krause A, Perona P, Gomes RG. Discriminative clustering by regularized information maximization. Advances in Neural Information Processing Systems; Vancouver, Canada; 2010. p. 775–783.
  • Zhu JY, Wu J, Xu Y, et al. Unsupervised object class discovery via saliency-guided multiple class learning. IEEE Trans Pattern Anal Mach Intell. 2015;37(4):862–875.
  • Smith L, Gasser M. The development of embodied cognition: six lessons from babies. Artif Life. 2005;11(1–2):13–29.
  • Wermter S, Weber C, Elshaw M, et al. Towards multimodal neural robot learning. Rob Auton Syst. 2004;47(2):171–175.
  • Ridge B, Skocaj D, Leonardis A. Self-supervised cross-modal online learning of basic object affordances for developmental robotic systems. IEEE International Conference on Robotics and Automation; Anchorage, Alaska, USA; 2010. p. 5047–5054.
  • Ogata T, Nishide S, Kozima H, et al. Inter-modality mapping in robot with recurrent neural network. Pattern Recognit Lett. 2010;31(12):1560–1569.
  • Lallee S, Dominey PF. Multi-modal convergence maps: from body schema and self-representation to mental imagery. Adapt Behav. 2013;21(4):274–285.
  • Mangin O, Oudeyer PY. Learning semantic components from subsymbolic multimodal perception. IEEE Third Joint International Conference on Development and Learning and Epigenetic Robotics; Osaka, Japan; 2013. p. 1–7.
  • Mangin O, Filliat D, Bosch LT, et al. Mca-nmf: multimodal concept acquisition with non-negative matrix factorization. PLoS ONE. 2015;10, e0140732(10).
  • Chen Y, Filliat D. Cross-situational noun and adjective learning in an interactive scenario. Joint IEEE International Conference on Development and Learning and Epigenetic Robotics; Providence, Rhode Island, USA; 2015. p. 129–134.
  • Yürüten O, Şahin E, Kalkan S. The learning of adjectives and nouns from affordance and appearance features. Adapt Behav. 2013;21(6):437–451.
  • Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res. 2003;3:993–1022.
  • Nakamura T, Nagai T, Iwahashi N. Multimodal object categorization by a robot. IEEE/RSJ International Conference on Intelligent Robots and Systems; San Diego, CA; 2007. p. 2415–2420.
  • Taniguchi A, Taniguchi T, Inamura T. Spatial concept acquisition for a mobile robot that integrates self-localization and unsupervised word discovery from spoken sentences. IEEE Trans Cogn Dev Sys. 2016;8(4):285–297.
  • Taniguchi A, Hagiwara Y, Taniguchi T, et al. Online spatial concept and lexical acquisition with simultaneous localization and mapping. IEEE/RSJ International Conference on Intelligent Robots and Systems; Vancouver, BC, Canada; 2017. p. 811–818.
  • Nakamura T, Araki T, Nagai T, et al. Grounding of word meanings in lda-based multimodal concepts. Adv Robot. 2012;25:2189–2206.
  • Barsalou LW. Perceptual symbol system. Behav Brain Sci. 1999;22:277–660.
  • Bergen BK. Louder than words: the new science of how the mind makes meaning. New York, NY: Basic Books; 2012.
  • Nishihara J, Nakamura T, Nagai T. Online algorithm for robots to learn object concepts and language model. IEEE Trans Cogn Dev Sys. 2017;9(3):255–268.
  • Mochihashi D, Yamada T, Ueda N. Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling. Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing; Singapore; Vol. 1; 2009. p. 100–108.
  • Yan Z, Zhang H, Piramuthu R, et al. Hd-cnn: hierarchical deep convolutional neural networks for large scale visual recognition. Proceedings of the IEEE International Conference on Computer Vision; Beijing, China; 2015. p. 2740–2748.
  • Guo Y, Liu Y, Bakker EM, et al. Cnn-rnn: a large-scale hierarchical image classification framework. Multimed Tools Appl. 2018;77(8):10251–10271.
  • Blei D, Griffiths T, Jordan M. The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. J ACM. 2010;57(2):7.
  • Ando Y, Nakamura T, Araki T, et al. Formation of hierarchical object concept using hierarchical latent Dirichlet allocation. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); Tokyo, Japan; 2013. p. 2272–2279.
  • Hagiwara Y, Inoue M, Kobayashi H, et al. Hierarchical spatial concept formation based on multimodal information for human support robots. Front Neurorobot. 2018 Mar;12(11):1–16. doi: 10.3389/fnbot.2018.00011
  • Felzenszwalb PF, Girshick RB, McAllester D, et al. Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell. 2010;32(9):1627–1645.
  • Chen X, Shrivastava A, Gupta A. Neil: extracting visual knowledge from web data. IEEE International Conference on Computer Vision; Sydney, Australia; 2013. p. 1409–1416.
  • Taigman Y, Yang M, Ranzato M, et al. Deepface: closing the gap to human-level performance in face verification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; Columbus, Ohio, USA; 2014. p. 1701–1708.
  • Cao Z, Simon T, Wei SE, et al. Realtime multi-person 2D pose estimation using part affinity fields. IEEE Conference on Computer Vision and Pattern Recognition; Honolulu, HI, USA; Vol. 1; 2017. p. 7.
  • Yang S, Luo P, Loy CC, et al. Faceness-net: face detection through deep facial part responses. IEEE Trans Pattern Anal Mach Intell. 2018;40(8):1845–1859. doi: 10.1109/TPAMI.2017.2738644.
  • Lakoff G, Johnson M. Metaphors we live by. Chicago, IL: University of Chicago press; 1980.
  • Fauconnier G, Turner M. The way we think: conceptual blending and the mind's hidden complexities. New York, NY: Basic Books; 2008.
  • Grady J. Foundations of meaning: primary metaphors and primary stress. 1997. Retrieved from https://escholarship.org/uc/item/3g9427m2
  • Deignan A. Cobuild guides to English 7: metaphor. New York, NY: Harper Collins; 1995.
  • Sommer E, Weiss D. Metaphors dictionary. Cambridge, UK: Cambridge University Press; 2001.
  • Wilkinson D. Concise thesaurus of traditional English metaphors. London, UK: Routledge; 2013.
  • Seto KI, Takeda K, Yamaguchi H, et al. Shogakukan dictionary of English lexical polysemy. Tokyo, Japan: Shogakukan; 2007.
  • Petruck MR. Introduction to metanet. Constr Frames. 2017;8(2):133–140.
  • Fillmore CJ, Johnson CR, Petruck MR. Background to framenet. Int J Lexicogr. 2003;16(3):235–250.
  • Steen GJ, Dorst AG, Herrmann JB, et al. A method for linguistic metaphor identification: from MIP to MIPVU. Vol. 14. Amsterdam, The Netherlands: John Benjamins Publishing; 2010.
  • Makino S, Oka M. A bilingual dictionary of English and Japanese metaphors. Tokyo, Japan: Kuroshio Shuppan; 2017.
  • Kövecses Z. Metaphor and emotion: language, culture, and body in human feeling. Cambridge, UK: Cambridge University Press; 2003.
  • Kövecses Z. Metaphor in culture: universality and variation. Cambridge, UK: Cambridge University Press; 2005.
  • Rizzolatti G. The mirror neuron system and its function in humans. Anat Embryol (Berl). 2005;210(5–6):419–421.
  • Rizzolatti G, Fadiga L, Gallese V, et al. Premotor cortex and the recognition of motor actions. Cogn Brain Res. 1996;3(2):131–141.
  • Lee K, Ognibene D, Chang HJ, et al. STARE: spatiooral attention relocation for multiple structured activities detection. IEEE Trans Image Process. 2015;24(12):5916–5927.
  • Fujita K. A prospect for evolutionary adequacy: merge and the evolution and development of human language. Biolinguistics. 2009;3(2–3):128–153.
  • Jamone L, Ugur E, Cangelosi A, et al. Affordances in psychology, neuroscience, and robotics: a survey. IEEE Trans Cogn Dev Syst. 2018;10(1):4–25.
  • Min H, Yi C, Luo R, et al. Affordance research in developmental robotics: a survey. IEEE Trans Cogn Dev Syst. 2016;8(4):237–255.
  • Horton TE, Chakraborty A, Amant RS. Affordances for robots: a brief survey. AVANT. 2012 Dec;3:70–84.
  • Gibson JJ. The ecological approach to visual perception. Boston, MA: Houghton Mifflin; 1979.
  • Şahin E, Çakmak M, Doğar MR, et al. To afford or not to afford: A new formalization of affordances toward affordance-based robot control. Adapt Behav. 2007;15(4):447–472.
  • Stoytchev A. Behavior-grounded representation of tool affordances. Proceedings of IEEE International Conference on Robotics and Automation; Barcelona, Spain; 2005. p. 3071–3076.
  • Stoytchev A. Learning the affordances of tools using a behavior-grounded approach. In: Rome E, Hertzberg J, Dorffner G, editors. Towards affordance-based robot control. Berlin, Heidelberg: Springer; 2008. p. 140–158.
  • Nakamura T, Nagai T. Forming object concept using bayesian network. 2010 Aug.
  • Argall BD, Chernova S, Veloso M, et al. A survey of robot learning from demonstration. Rob Auton Syst. 2009;57(5):469–483.
  • Billard A, Calinon S, Dillmann R, et al. Robot programming by demonstration. Berlin, Heidelberg: Springer; 2008. p. 1371–1394.
  • Chernova S, Thomaz AL. Robot learning from human teachers. Synth Lect Artif Intell Mach Learn. 2014;8(3):1–121.
  • Nakamura T, Nagai T, Mochihashi D, et al. Segmenting continuous motions with hidden semi-Markov models and gaussian processes. Front Neurorobot. 2017 Dec;11. doi: 10.3389/fnbot.2017.00067
  • Taniguchi T, Nagasaka S, Nakashima R. Nonparametric Bayesian double articulation analyzer for direct language acquisition from continuous speech signals. IEEE Trans Cogn Dev Syst. 2016;8(3):171–185.
  • Taniguchi T, Nagasaka S. Double articulation analyzer for unsegmented human motion using Pitman-Yor language model and infinite hidden Markov model. IEEE/SICE International Symposium on System Integration; Kyoto, Japan; 2011. p. 250–255.
  • Schaal S. Is imitation learning the route to humanoid robots? Trends Cogn Sci. 1999;3:233–242.
  • Yokoya R, Ogata T, Tani J, et al. Experience based imitation using rnnpb. 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems; Beijing, China; 2006. p. 3669–3674.
  • Ho J, Ermon S. Generative adversarial imitation learning. In: Lee DD, Sugiyama M, Luxburg UV, et al., editors. Advances in neural information processing systems 29. Curran Associates, Inc.; 2016. p. 4565–4573.
  • Kaelbling LP, Littman ML, Moore AP. Reinforcement learning: a survey. J Artif Intell Res. 1996;4:237–285.
  • Arulkumaran K, Deisenroth MP, Brundage M, et al. Deep reinforcement learning: a brief survey. IEEE Signal Process Mag. 2017;34(6):26–38.
  • Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature. 2015;518(7540):529–533.
  • Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning. CoRR; 2016. abs/1509.02971.
  • Levine S, Finn C, Darrell T, et al. End-to-end training of deep visuomotor policies. J Mach Learn Res(JMLR). 2016 Jan;17(1):1334–1373.
  • Hermann KM, Hill F, Green S, et al. Grounded language learning in a simulated 3D world. CoRR; 2017. abs/1706.06551.
  • Polydoros AS, Nalpantidis L. Survey of model-based reinforcement learning: applications on robotics. J Intell Rob Syst. 2017;86(2):153–173.
  • Greenfield PM. Language, tools and brain: the ontogeny and phylogeny of hierarchically organized sequential behavior. Behav Brain Sci. 1991;14(04):531–551.
  • Fujita K. Facing the logical problem of language evolution (l. Jenkins, variation and universals in biolinguistics). English Linguist. 2007;24:78–108.
  • Pulvermüller F. The syntax of action. Trends Cogn Sci. 2014;18(5):219–220.
  • Garagnani M, Shastri L, Wendelken C. A connectionist model of planning via back-chaining search. Proceedings of the Annual Meeting of the Cognitive Science Society; California, USA; Vol. 24; 2002.
  • Plappert M, Mandery C, Asfour T. Learning a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks. Rob Auton Syst. 2018 Nov;109:13–26.
  • Yamada T, Matsunaga H, Ogata T. Paired recurrent autoencoders for bidirectional translation between robot actions and linguistic descriptions. IEEE Rob Autom Lett. 2018;3(4):3441–3448.
  • Austin J. How to do things with words. Cambridge, MA: Harvard University Press; 1962.
  • Grice P. Studies in the way of words. Cambridge, MA: Harvard University Press; 1989.
  • Sperber D, Wilson D. Relevance: communication and cognition. Cambridge, MA: Harvard University Press; 1986.
  • Winograd T, Flores F. Understanding computers and cognition. New York: Ablex Publishing Corporation; 1986.
  • Heidegger M. Being and time. Oxford, UK: BLACI WELL; 1927.
  • Searle JR. Mind: a brief introduction. Oxford, UK: Oxford University Press; 2004.
  • Lakoff G. Woman, fire, and dangerous things. What Categories Reveal about the Mind. Chicago and London: The University of Chicago Press; 1987.
  • Langacker R. Concept, image, and symbol: the cognitive basis of grammar. Berlin: Mouton de Gruyter; 1991.
  • Iwahashi N. Robots that learn language: Developmental approach to human-machine conversations. Symbol grounding and beyond. Rome, Italy: Springer; 2006. p. 143–167.
  • Iwahashi N. Robots that learn language: a developmental approach to situated human-robot conversations. Human–robot interaction. I-Tech; 2007. Chapter 5. doi: 10.5772/5188
  • Iwahashi N, Sugiura K, Taguchi R, et al. Robots that learn to communicate: a developmental approach to personally and physically situated human-robot conversations. AAAI Fall Symposium: Dialog with Robots; 2010.
  • Sugiura K, Iwahashi N, Kashioka H, et al. Object manipulation dialogue by estimating utterance understanding probability in a robot language acquisition framework (in japanese). J Rob Soc Japan. 2010;28(8):978–988.
  • Sugiura K, Iwahashi N, Kawai H, et al. Situated spoken dialogue with robots using active learning. Adv Robot. 2011;25(17):2207–2232.
  • Nakamura T, Attamimi M, Sugiura K, et al. An extended mobile manipulation robot learning novel objects. J Intell Rob Syst. 2012;66(1):187–204.
  • Araki T, Nakamura T, Nagai T, et al. Online learning of concepts and words using multimodal LDA and hierarchical Pitman-Yor Language Model. IEEE/RSJ International Conference on Intelligent Robots and Systems; Vilamoura, Algarve, Portugal; 2012. p. 1623–1630.
  • Nakamura T, Nagai T, Funakoshi K, et al. Mutual Learning of an Object Concept and Language Model Based on MLDA and NPYLM. IEEE/RSJ International Conference on Intelligent Robots and Systems; Chicago, IL USA; 2014. p. 600–607.
  • Nakamura T, Nagai T, Taniguchi T. Serket: an architecture for connecting stochastic models to realize a large-scale cognitive model. Front Neurorobot. 2018;12. doi: 10.3389/fnbot.2018.00025
  • Hauser MD, Chomsky N, Fitch WT. The faculty of language: what is it, who has it, and how did it evolve? Science. 2002;298(5598):1569–1579.
  • Elman JL, Bates EA, Johnson MH. Rethinking innateness: a connectionist perspective on development. Vol. 10. MIT press; 1998. ISBN:026255030X, 9780262550307
  • Wu Y, Schuster M, Chen Z, et al. Google's neural machine translation system: bridging the gap between human and machine translation. Preprint; 2016. arXiv:160908144.
  • Lowe R, Pow N, Serban I, et al. Incorporating unstructured textual knowledge sources into neural dialogue systems. Neural information processing systems workshop on machine learning for spoken language understanding; Montreal, Quebec, Canada; 2015.
  • Tomasello M. Cooperation and communication in the 2nd year of life. Child Dev Perspect. 2007;1(1):8–12.
  • Carpenter M, Tomasello M, Striano T. Role reversal imitation and language in typically developing infants and children with autism. Infancy. 2005;8(3):253–278.
  • Iwahashi N. Robots that learn language: a developmental approach to situated human-robot conversations. INTECH Open Access Publisher; 2007. ISBN:9783902613134
  • Halliday MAK. Collected works of M. A. K. Halliday. Napa Valley, CA: Continuum; 2009.
  • Halliday MAK. Halliday: system and function in language: selected papers. Oxford: Oxford University Press; 1977.
  • Halliday MAK. Language as social semiotic: the social interpretation of language and meaning. London: Edward Arnold; 1978.
  • Malinowski B. The meaning of meaning. London: Kegan Paul; 1923.
  • Halliday MAK, Hasan R. Language, context, and text: aspects of language in a social-semiotic perspective. Oxford: Oxford University Press; 1991.
  • Maturana H, Varela F. Autopoiesis and cognition: the realization of the living. Dordrecht: Reidel; 1980. p. 2–62.
  • Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition; Miami, Florida; 2009. p. 248–255.
  • Torralba A, Fergus R, Freeman W. 80 Million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell. 2008;30(11):1958–1970.
  • Russell BC, Torralba A, Murphy KP, et al. LabelMe: a database and web-based tool for image annotation. Int J Comput Vis. 2008 May;77(13):157–173.
  • Lin Ty, Zitnick CL, Doll P. Microsoft COCO: common objects in context. p. 1–15.
  • Quattoni A, Torralba A. Recognizing indoor scenes. 2009 IEEE Conference on Computer Vision and Pattern Recognition; Miami, Florida; 2009. p. 413–420.
  • Xiao J, Hays J, Ehinger KA, et al. SUN database: large-scale scene recognition from abbey to zoo. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; San Francisco, CA; 2010. p. 3485–3492.
  • Zhou B, Lapedriza A, Xiao J, et al. Learning deep features for scene recognition using places database. Advances in Neural Information Processing Systems (NIPS); Montréal, Canada; 2014. p. 487–495.
  • Plappert M, Mandery C, Asfour T. The KIT motion-language dataset. Big Data. 2016;4(4):236–252.
  • Takano W. Learning motion primitives and annotative texts from crowd-sourcing. ROBOMECH J. 2015;2(1):1–9.
  • Regneri M, Rohrbach M, Wetzel D, et al. Grounding action descriptions in videos. Trans Assoc Comput Linguist. 2013;1:25–36.
  • Rohrbach A, Rohrbach M, Qiu W, et al. Coherent multi-sentence video description with variable level of detail. 2014. p. 184–195. (Lecture notes in computer science; 8753. (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
  • Sigurdsson GA, Varol G, Wang X, et al. Hollywood in homes: crowdsourcing data collection for activity understanding. 2016. p. 510–526. (Lecture notes in computer science; 9905 LNCS. (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
  • Abu-El-Haija S, Kothari N, Lee J, et al. Youtube-8m: a large-scale video classification benchmark. CoRR; 2016. abs/1609.08675.
  • Agrawal A, Lu J, Antol S, et al. VQA: visual question answering. p. 1–25.
  • Tapaswi M, Zhu Y, Stiefelhagen R, et al. MovieQA: understanding stories in movies through question-answering. IEEE Conference on Computer Vision and Pattern Recognition; Las Vegas, NV; 2016. p. 4631–4640.
  • Das A, Datta S, Gkioxari G, et al. Embodied question answering. IEEE/CVF Conference on Computer Vision and Pattern Recognition; Salt Lake City, Utah; 2018. p. 1–10.
  • Hermann KM, Hill F, Green S, et al. Grounded language learning in a simulated 3D world. CoRR; 2017. abs/1706.06551.
  • MacMahon M, Stankiewicz B, Kuipers B. Walk the talk: connecting language, knowledge, and action in route instructions. AAAI Conference on Artificial Intelligence (AAAI); Boston, MA; 2006. p. 1475–1482.
  • Mei H, Bansal M, Walter MR. Listen, attend, and walk: neural mapping of navigational instructions to action sequences. AAAI Conference on Artificial Intelligence (AAAI); Phoenix, AZ; 2016.
  • de Vries H, Shuster K, Batra D, et al. Talk the walk: navigating New York city through grounded dialogue. CoRR; 2018. abs/1807.03367.
  • Inamura T, Mizuchi Y. Robot competition to evaluate guidance skill for general users in VR environment. International Conference on Human-Robot Interaction; Daegu, Korea; 2019.
  • Beattie C, Leibo JZ, Teplyashin D, et al. Deepmind lab. CoRR; 2016. abs/1612.03801.
  • Brockman G, Cheung V, Pettersson L, et al. OpenAI gym; 2016.
  • Brodeur S, Perez E, Anand A, et al. Home: a household multimodal environment. CoRR; 2017. abs/1711.11017.
  • Kolve E, Mottaghi R, Gordon D, et al. AI2-THOR: an interactive 3D environment for visual AI. CoRR; 2017. abs/1712.05474.
  • Savva M, Chang AX, Dosovitskiy A, et al. MINOS: multimodal indoor simulator for navigation in complex environments. CoRR; 2017. abs/1712.03931.
  • Orkin J, Roy D. The restaurant game: learning social behavior and language from thousands of players online. J Game Dev (JOGD). 2007;3(1):39–60.
  • Breazeal C, Depalma N, Orkin J, et al. Crowdsourcing human-robot interaction: new methods and system evaluation in a public environment. J Hum Rob Interact. 2013;2(1):82–111.
  • Inamura T, Shibata T, Sena H, et al. Simulator platform that enables social interaction simulation – SIGVerse: SocioIntelliGenesis simulator. IEEE/SICE International Symposium on System Integration; Sendai, Japan; 2010. p. 212–217.
  • Mizuchi Y, Inamura T. Cloud-based multimodal human-robot interaction simulator utilizing ROS and unity frameworks. IEEE/SICE International Symposium on System Integration; Taipei, Taiwan; 2017. p. 948–955.
  • Quigley M, Conley K, Gerkey BP, et al. ROS: an open-source Robot Operating System; 2009.
  • Van Der Zant T, Iocchi L. RoboCup@Home: adaptive benchmarking of robot bodies and minds. 2011. p. 214–225. (Lecture notes in computer science; Vol. 7072 LNAI (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
  • Dinan E, Logacheva V, Malykh V, et al. The second conversational intelligence challenge (convai2). CoRR; 2019. abs/1902.00098.
  • Hori C, Perez J, Higashinaka R, et al. Overview of the sixth dialog system technology challenge: DSTC6. Comput Speech Lang. 2019;55:1–25.