435
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Multi Spatial Relation Detection in Images

ORCID Icon & ORCID Icon

References

  • Abdi, H., & Williams, L. J. (2010). Tukey’s honestly significant difference (HSD) test. In Salkind, N., (Ed.), Encyclopedia of research design (pp. 1–5). Sage Publications, Inc., Thousand Oaks, CA. doi:10.4135/9781412961288
  • Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W. (2015). On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLOS ONE, 10(7), e0130140. doi:10.1371/journal.pone.0130140
  • Ball, K., Smith, D., Ellison, A., & Schenk, T. (2009). Both egocentric and allocentric cues support spatial priming in visual search. Neuropsychologia, 47(6), 1585–1591. Perception and Action . doi:10.1016/j.neuropsychologia.2008.11.017
  • Belz, A., Muscat, A., Aberton, M., & Benjelloun, S. (2015). Describing spatial relationships between objects in images in English and French. In Proceedings of the 4thWorkshop on Vision and Language (pp. 104–113). Lisbon, Portugal. Association for Computational Linguistics. doi:10.18653/v1/W15-2816
  • Belz, A., Muscat, A., Anguill, P., Sow, M., Vincent, G., & Zinessabah, Y. (2018). SpatialVOC2K: A multilingual dataset of images with annotations and features for spatial relations between objects. In Proceedings of the 11th International Conference on Natural Language Generation (pp. 140–145). Tilburg University, The Netherlands. Association for Computational Linguistics. doi:10.18653/v1/W18-6516
  • Birmingham, B., & Muscat, A. (2017). The use of object labels and spatial prepositions as keywords in a web-retrieval-based image caption generation system. In Proceedings of the Sixth Workshop on Vision and Language, 11–20, Valencia, Spain. Association for Computational Linguistics.
  • Birmingham, B., & Muscat, A. (2019). Clustering-based model for predicting multi-spatial relations in images. In Proceedings of the 16th International Conference on Informatics in Control, Automation and Robotics, ICINCO 2019, Volume 2, Prague, Czech Republic, July 29-31, 2019, 147–156.
  • Birmingham, B., Muscat, A., & Belz, A. (2018). Adding the third dimension to spatial relation detection in 2D images. In Proceedings of the 11th International Conference on Natural Language Generation, 146–151, Tilburg University, The Netherlands. Association for Computational Linguistics.
  • Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford University Press, Inc., USA.
  • Bobicev, V., & Sokolova, M. (2017). Inter-annotator agreement in sentiment analysis: Machine learning perspective. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017 (pp. 97–102). Varna, Bulgaria. INCOMA Ltd. doi:10.26615/978-954-452-049-6_015
  • Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. doi:10.1023/A:1010933404324
  • Cangelosi, A., Coventry, K. R., Rajapakse, R., Joyce, D., Bacon, A., Richards, L., & Newstead, S. N. (2005). Grounding language in perception: A connectionist model of spatial terms and vague quantifiers. In Cangelosi, A., Bugmann, G., & Borisyuk, R., (Eds.), Modeling language, cognition and action: Proceedings of the 9th Neural Computation and Psychology Workshop (pp. 47–56). World Scientific, Singapore.
  • Carlson-Radvansky, L. A., & Logan, G. D. (1997). The influence of reference frame selection on spatial template construction. Journal of Memory and Language, 37(3), 411–437. doi:10.1006/jmla.1997.2519
  • Carlson-Radvansky, L. A., & Radvansky, G. A. (1996). The influence of functional relations on spatial term selection. Psychological Science, 7(1), 56–60. doi:10.1111/j.1467-9280.1996.tb00667.x
  • Chai, J. Y., Fang, R., Liu, C., & She, L. (2016). Collaborative language grounding toward situated human-robot dialogue. AI Magazine, 37(4), 32–45. doi:10.1609/aimag.v37i4.2684
  • Chesterton, A. (2017). An image of a car parked close to a street post and a garage. How close can you legally park to a driveway or corner?, 14 Mar 2017. Online; Retrieved from June 8, 2020.
  • Clark, H. H. (1973). Space, time, semantics, and the child. In Moore, T. E. (Ed.), Cognitive development and the acquisition of language (pp. 27–63). Academic Press, New York.
  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. doi:10.1177/001316446002000104
  • Coventry, K. R., Cangelosi, A., Rajapakse, R., Bacon, A., Newstead, S., Joyce, D., & Richards, L. V. (2005). Spatial prepositions and vague quantifiers: Implementing the functional geometric framework. In C. Freksa, M. Knauff, B. Krieg-Brückner, B. Nebel, & T. Barkowsky (Eds.), Spatial cognition IV. reasoning, action, interaction (pp. 98–110). Berlin, Heidelberg: Springer Berlin Heidelberg.
  • Coventry, K. R., & Garrod, S. C. (2004). Saying, seeing and acting: The psychological semantics of spatial prepositions. Psychology Press, London.
  • Coventry, K. R., Prat-Sala, M., & Richards, L. (2001). The interplay between geometry and function in the comprehension of over, under, above, and below. Journal of Memory and Language, 44(3), 376–398. doi:10.1006/jmla.2000.2742
  • Dai, B., Zhang, Y., & Lin, D. (2017). Detecting visual relationships with deep relational networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3298–3308). Los Alamitos, CA, USA. IEEE Computer Society. doi:10.1109/CVPR.2017.352
  • Day, W. H., & Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification, 1(1), 7–24. doi:10.1007/BF01890115
  • Delgado, R., Tibau, X. A., & Gu, Q. (2019). Why cohen’s kappa should be avoided as performance measure in classification. PLOS ONE, 14(9), e0222916. doi:10.1371/journal.pone.0222916
  • Dembczyński, K., Waegeman, W., Cheng, W., & Hüllermeier, E. (2012). On label dependence and loss minimization in multi-label classification. Machine Learning, 88(1–2), 5–45. doi:10.1007/s10994-012-5285-8
  • Dobnik, S. (2009). Teaching mobile robots to use spatial words. PhD thesis, University of Oxford.
  • Dobnik, S., & Kelleher, J. (2014). Exploration of functional semantics of prepositions from corpora of descriptions of visual scenes. In Proceedings of the Third Workshop on Vision and Language(pp. 33–37). Dublin, Ireland. Dublin City University and the Association for Computational Linguistics.
  • Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2008). The PASCAL visual object classes challenge 2008 (VOC2008) results. http://www.pascal-network.org/challenges/VOC/voc2008/workshop/index.html.
  • Farrugia, G., & Muscat, A. (2020). Explaining spatial relation detection using layerwise relevance propagation. In Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP (pp. 378–385). Valletta, Malta. INSTICC, SciTePress.
  • Fasola, J., & Matarić, M. J. (2012). Using spatial language to guide and instruct robots in household environments. In AAAI Fall Symposium: Robots Learning Interactively from Human Teachers. Arlington, VA.
  • Ghanimifard, M., & Dobnik, S. (2019). What goes into a word: Generating image descriptions with top-down spatial knowledge. In Proceedings of the 12th International Conference on Natural Language Generation, 540–551, Tokyo, Japan. Association for Computational Linguistics.
  • Girden, E. R. (1992). ANOVA: Repeated measures. Number 84. Sage Publications Inc., Newbury Park, CA.
  • Gottschall, J. (2008). Literature, science, and a new humanities. Cognitive Studies in Literature Performance. Palgrave Macmillan, New York.
  • Grubinger, M., Clough, P., Müller, H., & Deselaers, T. (2006). The IAPR TC-12 benchmark: A new evaluation resource for visual information systems. In Proceedings of the International Conference on Language Resources and Evaluation (LREC) (pp. 13–23), Genoa, Italy.
  • Herskovits, A. (1980). On the spatial uses of prepositions. In Proceedings of the 18th Annual Meeting on Association for Computational Linguistics, ACL ’80, 1–5, USA. Association for Computational Linguistics.
  • Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32(3), 241–254. doi:10.1007/BF02289588
  • Kelleher, J. D., & Costello, F. J. (2009). Applying computational models of spatial prepositions to visually situated dialog. Computational Linguistics, 35(2), 271–306. doi:10.1162/coli.06-78-prep14
  • Kelleher, J. D., & Kruijff, G.M. (2005). A context-dependent algorithm for generating locative expressions in physically situated environments. In Wilcock, G., Jokinen, K., Mellish, C., & Reiter, E., (Eds.), Proceedings of the Tenth European Workshop on Natural Language Generation,ENLG 2005, Aberdeen, UK. Association for Computational Linguistics.
  • Kelleher, J. D., Ross, R. J., Sloan, C., & Namee, B. M. (2011). The effect of occlusion on the semantics of projective spatial terms: A case study in grounding language in perception. Cognitive Processing, 12(1), 95–108. doi:10.1007/s10339-010-0380-x
  • Kim, J., Misu, T., Chen, Y. T., Tawari, A., & Canny, J. (2019). Grounding human-to-vehicle advice for self-driving vehicles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 10591–10599), Long Beach, CA, USA.
  • Kim, J., Rohrbach, A., Darrell, T., Canny, J., & Akata, Z. (2018). Textual explanations for self-driving vehicles. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 563–578), Munich, Germany.
  • Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Bengio, Y., & Le Cun, Y., (Eds.), 3rd International Conference on Learning Representations (ICLR), Conference Track Proceedings, San Diego, CA, USA.
  • Kordjamshidi, P., & Moens, M. F. (2015). Global machine learning for spatial ontology population. Journal of Web Semantics, 30, 3–21. Semantic Search. doi:10.1016/j.websem.2014.06.001
  • Kordjamshidi, P., van Otterlo, M., & Moens, M.F. (2017). Spatial role labeling annotation scheme. In Ide, N. & Pustejovsky, J., (Eds.), Handbook of Linguistic Annotation (pp. 1025–1052). Springer Nature, Dordrecht, The Netherlands. doi:10.1007/978-94-024-0881-2_38
  • Kulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., Berg, A. C., & Berg, T. L. (2013). Baby talk: Understanding and generating image descriptions. IEEE Transactions on Pattern Analysis & Machine Intelligence, 35(12), 2891–2903. doi:10.1109/TPAMI.2012.162
  • Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. doi:10.2307/2529310
  • Logan, G. D., & Sadler, D. D. (1996). A computational analysis of the apprehension of spatial relations (pp. 493–529). Cambridge, MA, US: The MIT Press.
  • Lu, C., Krishna, R., Bernstein, M., & Fei-Fei, L. (2016). Visual relationship detection with language priors. In Leibe, B., Matas, J., Sebe, N. & Welling, M., (Eds.), Computer Vision – ECCV 2016 (pp. 852–869), Amsterdam, The Netherlands. Springer International Publishing.
  • Martinez, G. C., Cangelosi, A., & Coventry, K. R. (2001). A hybrid neural network and virtual reality system for spatial language processing. In IJCNN’01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222), volume 1(pp. 16–21), Washington, DC, USA. doi:10.1109/IJCNN.2001.938984
  • McHugh, M. L. (2012). Interrater reliability: The kappa statistic. Biochemia Medica: Biochemia Medica, 22(3), 276–282. doi:10.11613/BM.2012.031
  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Burges, C. J. C., Bottou, L., Welling, M., Ghahramani, Z. & Weinberger, K. Q., (Eds.), Proceedings of the 26th International Conference on Neural Information Processing Systems, volume 2, NIPS'13 (pp. 3111–3119), Red Hook, NY, USA. Curran Associates Inc.
  • Murphy, E., & Ciszewska-Carr, J. (2005). Sources of difference in reliability: Identifying sources of difference in reliability in content analysis of online asynchronous discussions. International Review of Research in Open and Distance Learning, 6(2):108–119. doi:10.19173/irrodl.v6i2.233
  • Muscat, A., & Belz, A. (2017). Learning to generate descriptions of visual data anchored in spatial relations. IEEE Computational Intelligence Magazine, 12(3), 29–42. doi:10.1109/MCI.2017.2708559
  • Muscat, A., & Gatt, A. (2018). Predicting visual spatial relations in the Maltese language. In Farrugia, C. B., (Ed.), Junior College multi-disciplinary conference : research, practice and collaboration : Breaking Barriers : Conference Proceedings (pp. 414–450). Msida, Malta: University of Malta Junior College.
  • Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Moschitti, A., Pang, B., & Daelemans, W., (Eds.), Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543), Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. doi:10.1007/BF00116251
  • Rahgooy, T., Manzoor, U., & Kordjamshidi, P. (2018). Visually guided spatial relation extraction from text. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 788–794, New Orleans, Louisiana. Association for Computational Linguistics.
  • Ramisa, A., Wang, J., Lu, Y., Dellandrea, E., Moreno-Noguer, F., & Gaizauskas, R. (2015). Combining geometric, textual and visual features for predicting prepositions in image descriptions. In Proc. 20th Conf. on Empirical Methods in Natural Language Processing (EMNLP), 214–220, Lisbon, Portugal.
  • Regier, T., & Carlson, L. A. (2001). Grounding spatial language in perception: An empirical and computational investigation. Journal of Experimental Psychology. General, 130(2), 273–298. doi:10.1037/0096-3445.130.2.273
  • Retz-Schmidt, G. (1988). Various views on spatial prepositions. AI Magazine, 9(2), 95–95.
  • Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22(3), 400–407. doi:10.1214/aoms/1177729586
  • Sadeghi, M. A., & Farhadi, A. (2011). Recognition using visual phrases. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR'11 (pp. 1745–1752), Washington, DC, USA. IEEE Computer Society. doi:10.1109/CVPR.2011.5995711
  • Tellex, S., Gopalan, N., Kress-Gazit, H., & Matuszek, C. (2020). Robots that use language. Annual Review of Control, Robotics, and Autonomous Systems, 3(1), 25–55. doi:10.1146/annurev-control-101119-071628
  • Terry, R., Laura, C., & Bryce, C. (2005). Attention in spatial language: Bridging geometry and function, chapter 9 (pp. 191–204). Oxford: Oxford University Press.
  • Thrun, S. (2008). Robotics and cognitive approaches to spatial mapping, chapter simultaneous localization and mapping.
  • Tsoumakas, G., & Katakis, I. (2007). Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 3(3), 1–13. doi:10.4018/jdwm.2007070101
  • Tukey, J. W. (1949). Comparing individual means in the analysis of variance. Biometrics, 5(2), 99–114. doi:10.2307/3001913
  • Tversky, B. (1993). Cognitive maps, cognitive collages, and spatial mental models. In A. U. Frank & I. Campari (Eds.), Spatial Information Theory A Theoretical Basis for GIS (pp. 14–24). Berlin, Heidelberg: Springer Berlin Heidelberg.
  • Uwamariya, N. (2019). An image of a blind man crossing the road. How do visually impaired people live in their surroundings?, 3 Oct 2019. Online; Retrieved from June 8, 2020.
  • Yang, Y., Teo, C., Daumé, H., III, & Aloimonos, Y. (2011). Corpus-guided sentence generation of natural images. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 444–454, Edinburgh, Scotland, UK. Association for Computational Linguistics.
  • Yu, R., Li, A., Morariu, V. I., & Davis, L. S. (2017). Visual relationship detection with internal and external linguistic knowledge distillation. In 2017 IEEE International Conference on Computer Vision (ICCV) (pp, 1068–1076), Los Alamitos, CA, USA. IEEE Computer Society. doi:10.1109/ICCV.2017.121
  • Zhang, M., & Zhou, Z. (2014). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1819–1837. doi:10.1109/TKDE.2013.39

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.