Search in:

Advanced search

Spatial Cognition & Computation

An Interdisciplinary Journal

Volume 22, 2022 - Issue 3-4: Speaking of Location: Understanding, Interpreting and Generating Natural Language

Submit an article Journal homepage

435

Views

CrossRef citations to date

Altmetric

Research Article

Multi Spatial Relation Detection in Images

Brandon BirminghamDepartment of Communications and Computer Engineering, University of Malta, Msida, MaltaCorrespondence[email protected]

https://orcid.org/0000-0002-3006-3526

Adrian MuscatDepartment of Communications and Computer Engineering, University of Malta, Msida, Malta

https://orcid.org/0000-0002-9157-2818

Pages 293-327 | Published online: 04 Aug 2021

Cite this article
https://doi.org/10.1080/13875868.2021.1957897
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Reprints & Permissions

References

Abdi, H., & Williams, L. J. (2010). Tukey’s honestly significant difference (HSD) test. In Salkind, N., (Ed.), Encyclopedia of research design (pp. 1–5). Sage Publications, Inc., Thousand Oaks, CA. doi:10.4135/9781412961288
Google Scholar
Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W. (2015). On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLOS ONE, 10(7), e0130140. doi:10.1371/journal.pone.0130140
Google Scholar
Ball, K., Smith, D., Ellison, A., & Schenk, T. (2009). Both egocentric and allocentric cues support spatial priming in visual search. Neuropsychologia, 47(6), 1585–1591. Perception and Action . doi:10.1016/j.neuropsychologia.2008.11.017
PubMed Web of Science ®Google Scholar
Belz, A., Muscat, A., Aberton, M., & Benjelloun, S. (2015). Describing spatial relationships between objects in images in English and French. In Proceedings of the 4thWorkshop on Vision and Language (pp. 104–113). Lisbon, Portugal. Association for Computational Linguistics. doi:10.18653/v1/W15-2816
Google Scholar
Belz, A., Muscat, A., Anguill, P., Sow, M., Vincent, G., & Zinessabah, Y. (2018). SpatialVOC2K: A multilingual dataset of images with annotations and features for spatial relations between objects. In Proceedings of the 11th International Conference on Natural Language Generation (pp. 140–145). Tilburg University, The Netherlands. Association for Computational Linguistics. doi:10.18653/v1/W18-6516
Google Scholar
Birmingham, B., & Muscat, A. (2017). The use of object labels and spatial prepositions as keywords in a web-retrieval-based image caption generation system. In Proceedings of the Sixth Workshop on Vision and Language, 11–20, Valencia, Spain. Association for Computational Linguistics.
Google Scholar
Birmingham, B., & Muscat, A. (2019). Clustering-based model for predicting multi-spatial relations in images. In Proceedings of the 16th International Conference on Informatics in Control, Automation and Robotics, ICINCO 2019, Volume 2, Prague, Czech Republic, July 29-31, 2019, 147–156.
Google Scholar
Birmingham, B., Muscat, A., & Belz, A. (2018). Adding the third dimension to spatial relation detection in 2D images. In Proceedings of the 11th International Conference on Natural Language Generation, 146–151, Tilburg University, The Netherlands. Association for Computational Linguistics.
Google Scholar
Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford University Press, Inc., USA.
Google Scholar
Bobicev, V., & Sokolova, M. (2017). Inter-annotator agreement in sentiment analysis: Machine learning perspective. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017 (pp. 97–102). Varna, Bulgaria. INCOMA Ltd. doi:10.26615/978-954-452-049-6_015
Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. doi:10.1023/A:1010933404324
Web of Science ®Google Scholar
Cangelosi, A., Coventry, K. R., Rajapakse, R., Joyce, D., Bacon, A., Richards, L., & Newstead, S. N. (2005). Grounding language in perception: A connectionist model of spatial terms and vague quantifiers. In Cangelosi, A., Bugmann, G., & Borisyuk, R., (Eds.), Modeling language, cognition and action: Proceedings of the 9th Neural Computation and Psychology Workshop (pp. 47–56). World Scientific, Singapore.
Google Scholar
Carlson-Radvansky, L. A., & Logan, G. D. (1997). The influence of reference frame selection on spatial template construction. Journal of Memory and Language, 37(3), 411–437. doi:10.1006/jmla.1997.2519
Web of Science ®Google Scholar
Carlson-Radvansky, L. A., & Radvansky, G. A. (1996). The influence of functional relations on spatial term selection. Psychological Science, 7(1), 56–60. doi:10.1111/j.1467-9280.1996.tb00667.x
Web of Science ®Google Scholar
Chai, J. Y., Fang, R., Liu, C., & She, L. (2016). Collaborative language grounding toward situated human-robot dialogue. AI Magazine, 37(4), 32–45. doi:10.1609/aimag.v37i4.2684
Web of Science ®Google Scholar
Chesterton, A. (2017). An image of a car parked close to a street post and a garage. How close can you legally park to a driveway or corner?, 14 Mar 2017. Online; Retrieved from June 8, 2020.
Google Scholar
Clark, H. H. (1973). Space, time, semantics, and the child. In Moore, T. E. (Ed.), Cognitive development and the acquisition of language (pp. 27–63). Academic Press, New York.
Google Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46. doi:10.1177/001316446002000104
Web of Science ®Google Scholar
Coventry, K. R., Cangelosi, A., Rajapakse, R., Bacon, A., Newstead, S., Joyce, D., & Richards, L. V. (2005). Spatial prepositions and vague quantifiers: Implementing the functional geometric framework. In C. Freksa, M. Knauff, B. Krieg-Brückner, B. Nebel, & T. Barkowsky (Eds.), Spatial cognition IV. reasoning, action, interaction (pp. 98–110). Berlin, Heidelberg: Springer Berlin Heidelberg.
Google Scholar
Coventry, K. R., & Garrod, S. C. (2004). Saying, seeing and acting: The psychological semantics of spatial prepositions. Psychology Press, London.
Google Scholar
Coventry, K. R., Prat-Sala, M., & Richards, L. (2001). The interplay between geometry and function in the comprehension of over, under, above, and below. Journal of Memory and Language, 44(3), 376–398. doi:10.1006/jmla.2000.2742
Web of Science ®Google Scholar
Dai, B., Zhang, Y., & Lin, D. (2017). Detecting visual relationships with deep relational networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3298–3308). Los Alamitos, CA, USA. IEEE Computer Society. doi:10.1109/CVPR.2017.352
Google Scholar
Day, W. H., & Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification, 1(1), 7–24. doi:10.1007/BF01890115
Web of Science ®Google Scholar
Delgado, R., Tibau, X. A., & Gu, Q. (2019). Why cohen’s kappa should be avoided as performance measure in classification. PLOS ONE, 14(9), e0222916. doi:10.1371/journal.pone.0222916
PubMed Web of Science ®Google Scholar
Dembczyński, K., Waegeman, W., Cheng, W., & Hüllermeier, E. (2012). On label dependence and loss minimization in multi-label classification. Machine Learning, 88(1–2), 5–45. doi:10.1007/s10994-012-5285-8
Web of Science ®Google Scholar
Dobnik, S. (2009). Teaching mobile robots to use spatial words. PhD thesis, University of Oxford.
Google Scholar
Dobnik, S., & Kelleher, J. (2014). Exploration of functional semantics of prepositions from corpora of descriptions of visual scenes. In Proceedings of the Third Workshop on Vision and Language(pp. 33–37). Dublin, Ireland. Dublin City University and the Association for Computational Linguistics.
Google Scholar
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2008). The PASCAL visual object classes challenge 2008 (VOC2008) results. http://www.pascal-network.org/challenges/VOC/voc2008/workshop/index.html.
Google Scholar
Farrugia, G., & Muscat, A. (2020). Explaining spatial relation detection using layerwise relevance propagation. In Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP (pp. 378–385). Valletta, Malta. INSTICC, SciTePress.
Google Scholar
Fasola, J., & Matarić, M. J. (2012). Using spatial language to guide and instruct robots in household environments. In AAAI Fall Symposium: Robots Learning Interactively from Human Teachers. Arlington, VA.
Google Scholar
Ghanimifard, M., & Dobnik, S. (2019). What goes into a word: Generating image descriptions with top-down spatial knowledge. In Proceedings of the 12th International Conference on Natural Language Generation, 540–551, Tokyo, Japan. Association for Computational Linguistics.
Google Scholar
Girden, E. R. (1992). ANOVA: Repeated measures. Number 84. Sage Publications Inc., Newbury Park, CA.
Google Scholar
Gottschall, J. (2008). Literature, science, and a new humanities. Cognitive Studies in Literature Performance. Palgrave Macmillan, New York.
Google Scholar
Grubinger, M., Clough, P., Müller, H., & Deselaers, T. (2006). The IAPR TC-12 benchmark: A new evaluation resource for visual information systems. In Proceedings of the International Conference on Language Resources and Evaluation (LREC) (pp. 13–23), Genoa, Italy.
Google Scholar
Herskovits, A. (1980). On the spatial uses of prepositions. In Proceedings of the 18th Annual Meeting on Association for Computational Linguistics, ACL ’80, 1–5, USA. Association for Computational Linguistics.
Google Scholar
Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32(3), 241–254. doi:10.1007/BF02289588
PubMed Web of Science ®Google Scholar
Kelleher, J. D., & Costello, F. J. (2009). Applying computational models of spatial prepositions to visually situated dialog. Computational Linguistics, 35(2), 271–306. doi:10.1162/coli.06-78-prep14
Web of Science ®Google Scholar
Kelleher, J. D., & Kruijff, G.M. (2005). A context-dependent algorithm for generating locative expressions in physically situated environments. In Wilcock, G., Jokinen, K., Mellish, C., & Reiter, E., (Eds.), Proceedings of the Tenth European Workshop on Natural Language Generation,ENLG 2005, Aberdeen, UK. Association for Computational Linguistics.
Google Scholar
Kelleher, J. D., Ross, R. J., Sloan, C., & Namee, B. M. (2011). The effect of occlusion on the semantics of projective spatial terms: A case study in grounding language in perception. Cognitive Processing, 12(1), 95–108. doi:10.1007/s10339-010-0380-x
PubMed Web of Science ®Google Scholar
Kim, J., Misu, T., Chen, Y. T., Tawari, A., & Canny, J. (2019). Grounding human-to-vehicle advice for self-driving vehicles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 10591–10599), Long Beach, CA, USA.
Google Scholar
Kim, J., Rohrbach, A., Darrell, T., Canny, J., & Akata, Z. (2018). Textual explanations for self-driving vehicles. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 563–578), Munich, Germany.
Google Scholar
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Bengio, Y., & Le Cun, Y., (Eds.), 3rd International Conference on Learning Representations (ICLR), Conference Track Proceedings, San Diego, CA, USA.
Google Scholar
Kordjamshidi, P., & Moens, M. F. (2015). Global machine learning for spatial ontology population. Journal of Web Semantics, 30, 3–21. Semantic Search. doi:10.1016/j.websem.2014.06.001
Web of Science ®Google Scholar
Kordjamshidi, P., van Otterlo, M., & Moens, M.F. (2017). Spatial role labeling annotation scheme. In Ide, N. & Pustejovsky, J., (Eds.), Handbook of Linguistic Annotation (pp. 1025–1052). Springer Nature, Dordrecht, The Netherlands. doi:10.1007/978-94-024-0881-2_38
Google Scholar
Kulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., Berg, A. C., & Berg, T. L. (2013). Baby talk: Understanding and generating image descriptions. IEEE Transactions on Pattern Analysis & Machine Intelligence, 35(12), 2891–2903. doi:10.1109/TPAMI.2012.162
Google Scholar
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. doi:10.2307/2529310
PubMed Web of Science ®Google Scholar
Logan, G. D., & Sadler, D. D. (1996). A computational analysis of the apprehension of spatial relations (pp. 493–529). Cambridge, MA, US: The MIT Press.
Google Scholar
Lu, C., Krishna, R., Bernstein, M., & Fei-Fei, L. (2016). Visual relationship detection with language priors. In Leibe, B., Matas, J., Sebe, N. & Welling, M., (Eds.), Computer Vision – ECCV 2016 (pp. 852–869), Amsterdam, The Netherlands. Springer International Publishing.
Google Scholar
Martinez, G. C., Cangelosi, A., & Coventry, K. R. (2001). A hybrid neural network and virtual reality system for spatial language processing. In IJCNN’01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222), volume 1(pp. 16–21), Washington, DC, USA. doi:10.1109/IJCNN.2001.938984
Google Scholar
McHugh, M. L. (2012). Interrater reliability: The kappa statistic. Biochemia Medica: Biochemia Medica, 22(3), 276–282. doi:10.11613/BM.2012.031
PubMed Web of Science ®Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Burges, C. J. C., Bottou, L., Welling, M., Ghahramani, Z. & Weinberger, K. Q., (Eds.), Proceedings of the 26th International Conference on Neural Information Processing Systems, volume 2, NIPS'13 (pp. 3111–3119), Red Hook, NY, USA. Curran Associates Inc.
Google Scholar
Murphy, E., & Ciszewska-Carr, J. (2005). Sources of difference in reliability: Identifying sources of difference in reliability in content analysis of online asynchronous discussions. International Review of Research in Open and Distance Learning, 6(2):108–119. doi:10.19173/irrodl.v6i2.233
Google Scholar
Muscat, A., & Belz, A. (2017). Learning to generate descriptions of visual data anchored in spatial relations. IEEE Computational Intelligence Magazine, 12(3), 29–42. doi:10.1109/MCI.2017.2708559
Web of Science ®Google Scholar
Muscat, A., & Gatt, A. (2018). Predicting visual spatial relations in the Maltese language. In Farrugia, C. B., (Ed.), Junior College multi-disciplinary conference : research, practice and collaboration : Breaking Barriers : Conference Proceedings (pp. 414–450). Msida, Malta: University of Malta Junior College.
Google Scholar
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Moschitti, A., Pang, B., & Daelemans, W., (Eds.), Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543), Stroudsburg, PA, USA. Association for Computational Linguistics.
Google Scholar
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. doi:10.1007/BF00116251
Google Scholar
Rahgooy, T., Manzoor, U., & Kordjamshidi, P. (2018). Visually guided spatial relation extraction from text. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 788–794, New Orleans, Louisiana. Association for Computational Linguistics.
Google Scholar
Ramisa, A., Wang, J., Lu, Y., Dellandrea, E., Moreno-Noguer, F., & Gaizauskas, R. (2015). Combining geometric, textual and visual features for predicting prepositions in image descriptions. In Proc. 20th Conf. on Empirical Methods in Natural Language Processing (EMNLP), 214–220, Lisbon, Portugal.
Google Scholar
Regier, T., & Carlson, L. A. (2001). Grounding spatial language in perception: An empirical and computational investigation. Journal of Experimental Psychology. General, 130(2), 273–298. doi:10.1037/0096-3445.130.2.273
PubMed Web of Science ®Google Scholar
Retz-Schmidt, G. (1988). Various views on spatial prepositions. AI Magazine, 9(2), 95–95.
Web of Science ®Google Scholar
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22(3), 400–407. doi:10.1214/aoms/1177729586
Google Scholar
Sadeghi, M. A., & Farhadi, A. (2011). Recognition using visual phrases. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR'11 (pp. 1745–1752), Washington, DC, USA. IEEE Computer Society. doi:10.1109/CVPR.2011.5995711
Google Scholar
Tellex, S., Gopalan, N., Kress-Gazit, H., & Matuszek, C. (2020). Robots that use language. Annual Review of Control, Robotics, and Autonomous Systems, 3(1), 25–55. doi:10.1146/annurev-control-101119-071628
Google Scholar
Terry, R., Laura, C., & Bryce, C. (2005). Attention in spatial language: Bridging geometry and function, chapter 9 (pp. 191–204). Oxford: Oxford University Press.
Google Scholar
Thrun, S. (2008). Robotics and cognitive approaches to spatial mapping, chapter simultaneous localization and mapping.
Google Scholar
Tsoumakas, G., & Katakis, I. (2007). Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 3(3), 1–13. doi:10.4018/jdwm.2007070101
Google Scholar
Tukey, J. W. (1949). Comparing individual means in the analysis of variance. Biometrics, 5(2), 99–114. doi:10.2307/3001913
PubMed Web of Science ®Google Scholar
Tversky, B. (1993). Cognitive maps, cognitive collages, and spatial mental models. In A. U. Frank & I. Campari (Eds.), Spatial Information Theory A Theoretical Basis for GIS (pp. 14–24). Berlin, Heidelberg: Springer Berlin Heidelberg.
Google Scholar
Uwamariya, N. (2019). An image of a blind man crossing the road. How do visually impaired people live in their surroundings?, 3 Oct 2019. Online; Retrieved from June 8, 2020.
Google Scholar
Yang, Y., Teo, C., Daumé, H., III, & Aloimonos, Y. (2011). Corpus-guided sentence generation of natural images. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 444–454, Edinburgh, Scotland, UK. Association for Computational Linguistics.
Google Scholar
Yu, R., Li, A., Morariu, V. I., & Davis, L. S. (2017). Visual relationship detection with internal and external linguistic knowledge distillation. In 2017 IEEE International Conference on Computer Vision (ICCV) (pp, 1068–1076), Los Alamitos, CA, USA. IEEE Computer Society. doi:10.1109/ICCV.2017.121
Google Scholar
Zhang, M., & Zhou, Z. (2014). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1819–1837. doi:10.1109/TKDE.2013.39
Web of Science ®Google Scholar

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Multi Spatial Relation Detection in Images

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Multi Spatial Relation Detection in Images

References

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date