320
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Geographical and linguistic perspectives on developing geoparsers with generic resources

ORCID Icon, ORCID Icon & ORCID Icon
Received 04 Nov 2023, Accepted 14 Jun 2024, Published online: 30 Jun 2024

References

  • Acheson, E. and Purves, R.S., 2021. Extracting and modeling geographic information from scientific articles. PLOS One, 16 (1), e0244918.
  • Acheson, E., Sabbata, S.D., and Purves, R.S., 2017. A quantitative analysis of global gazetteers: patterns of coverage for common feature types. Computers, Environment and Urban Systems, 64, 309–320.
  • Amitay, E., et al., 2004. Web-a-where: geotagging web content. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, 25–29 July 2004 Sheffield, United Kingdom. New York: Association for Computing Machinery, 273–280.
  • Ardanuy, M.C., et al., 2023. The past is a foreign place: improving toponym linking for historical newspapers. In: A. Sela, F. Jannidis, and I. Romanowska, eds. Proceedings of the computational humanities research conference 2023, 6–8 December 2023, Paris, France, 368–390. Available from: https://ceur-ws.org/Vol-3558/paper4426.pdf
  • Avvenuti, M., et al., 2018. CrisMap: a big data crisis mapping system based on damage detection and geoparsing. Information Systems Frontiers, 20 (5), 993–1011.
  • Bender, E., 2011. On achieving and evaluating language-independence in NLP. Linguistic Issues in Language Technology, 6 (3), 1–26.
  • Bender, E., 2019. The #BenderRule: on naming the languages we study and why it matters. The Gradient. Available from: https://thegradient.pub/the-benderrule-on-naming-the-languages-we-study-and-why-it-matters/
  • Berragan, C., et al., 2022. Transformer based named entity recognition for place name extraction from unstructured text. International Journal of Geographical Information Science, 37 (4), 747–766.
  • Blasi, D.E., et al., 2022. Over-reliance on English hinders cognitive science. Trends in Cognitive Sciences, 26 (12), 1153–1170.
  • Conneau, A., et al., 2020. Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th annual meeting of the association for computational linguistics, 5–10 July 2020 Online. Stroudsburg: Association for Computational Linguistics, 8440–8451.
  • Devlin, J., et al., 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North, 2–7 June 2019 Minneapolis, United States. Stroudsburg: Association for Computational Linguistics, 4171–4186.
  • Dewandaru, A., Widyantoro, D.H., and Akbar, S., 2020. Event geoparser with pseudo-location entity identification and numerical argument extraction implementation and evaluation in Indonesian news domain. ISPRS International Journal of Geo-Information, 9 (12), 712.
  • Fize, J., Moncla, L., and Martins, B., 2021. Deep learning for toponym resolution: geocoding based on pairs of toponyms. ISPRS International Journal of Geo-Information, 10 (12), 818.
  • Gelernter, J. and Mushegian, N., 2011. Geo-parsing messages from microtext. Transactions in GIS, 15 (6), 753–773.
  • Gregory, I., et al., 2015. Geoparsing, GIS, and textual analysis: current developments in spatial humanities research. International Journal of Humanities and Arts Computing, 9 (1), 1–14.
  • Gritta, M., et al., 2018a. What’s missing in geographical parsing? Language Resources and Evaluation, 52 (2), 603–623.
  • Gritta, M., Pilehvar, M.T., and Collier, N., 2018b. Which Melbourne? Augmenting geocoding with maps. In: Proceedings of the 56th annual meeting of the association for computational linguistics, 15–20 July Melbourne, Australia. Stroudsburg: Association for Computational Linguistics, 1285–1296.
  • Gritta, M., Pilehvar, M.T., and Collier, N., 2020. A pragmatic guide to geoparsing evaluation: toponyms, named entity recognition and pragmatics. Language Resources and Evaluation, 54 (3), 683–712.
  • Hahmann, S. and Burghardt, D., 2013. How much information is geospatially referenced? Networks and cognition. International Journal of Geographical Information Science, 27 (6), 1171–1189.
  • Halterman, A., 2023. Mordecai 3: a neural geoparser and event geocoder. arXiv:2303.13675. https://arxiv.org/abs/2303.13675
  • Hu, X., et al., 2023a. How can voting mechanisms improve the robustness and generalizability of toponym disambiguation? International Journal of Applied Earth Observation and Geoinformation, 117, 103191.
  • Hu, X., et al., 2023b. Location reference recognition from texts: a survey and comparison. ACM Computing Surveys, 56 (5), 1–37.
  • Hu, Y. and Adams, B., 2021. Harvesting big geospatial data from natural language texts. In: M. Werner, and Y.Y. Chiang, eds. Handbook of big geospatial data. 1st ed. Cham: Springer, 487–508.
  • Hu, Y., 2018. Geo‐text data and data‐driven geospatial semantics. Geography Compass, 12 (11), e12404.
  • Hulden, M., Silfverberg, M., and Francom, J., 2015. Kernel density estimation for text-based geolocation. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, 25–30 January 2015 Austin, United States. Palo Alto: AAAI Press, 145–150.
  • Janowicz, K., et al., 2022. Six GIScience ideas that must die. AGILE: GIScience Series, 3, 1–8.
  • Karimzadeh, M., et al., 2019. GeoTxt: a scalable geoparsing system for unstructured text geolocation. Transactions in GIS, 23 (1), 118–136.
  • Kulkarni, S., et al., 2021. Multi-level gazetteer-free geocoding. In: Proceedings of second international combined workshop on spatial language understanding and grounded communication for robotics, 5–6 August 2021 Bangkok, Thailand. Stroudsburg: Association for Computational Linguistics, 79–88.
  • Laparra, E. and Bethard, S., 2020. A dataset and evaluation framework for complex geographical description parsing. In: Proceedings of the 28th international conference on computational linguistics, 8–13 December 2020 Barcelona, Spain. International Committee on Computational Linguistics, 936–948.
  • Leidner, J. and Lieberman, M., 2011. Detecting geographical references in the form of place names and associated spatial natural language. SIGSPATIAL Special, 3 (2), 5–11.
  • Leppämäki, T., 2022. Developing a Finnish geoparser for extracting location information from unstructured texts. Unpublished MA thesis. University of Helsinki.
  • Lieberman, M.D., Samet, H., and Sankaranarayanan, J., 2010. Geotagging with local lexicons to build indexes for textually-specified spatial data. In: 2010 IEEE 26th international conference on data engineering (ICDE 2010), 1–6 March 2010 Long Beach, United States. Los Alamitos: IEEE Computer Society, 201–212.
  • Liu, P., et al., 2022a. Extracting locations from sport and exercise-related social media messages using a neural network-based bilingual toponym recognition model. Journal of Spatial Information Science, 24 (24), 31–61.
  • Liu, Z., et al., 2022b. Geoparsing: solved or biased? An evaluation of geographic biases in geoparsing. AGILE: GIScience Series, 3, 1–13.
  • Luoma, J., et al., 2021. Fine-grained named entity annotation for Finnish. Proceedings of the 23rd Nordic conference on computational linguistics (NoDaLiDa), 135–144. Available from: https://aclanthology.org/2021.nodalida-main.14
  • Ma, K., et al., 2022. Chinese toponym recognition with variant neural structures from social media messages based on BERT methods. Journal of Geographical Systems, 24 (2), 143–169.
  • Middleton, S.E., et al., 2018. Location extraction from social media: geoparsing, location disambiguation, and geotagging. ACM Transactions on Information Systems, 36 (4), 1–27.
  • Moncla, L., et al., 2014. Geocoding for texts with fine-grain toponyms: an experiment on a geoparsed hiking descriptions corpus. In: Proceedings of the 22nd ACM SIGSPATIAL international conference on advances in geographic information systems, 4–7 November 2014 Dallas, United States. New York: Association for Computing Machinery, 183–192.
  • Montani, I., et al., 2023. Explosion/spaCy: V3.5.1: Spancat for multi-class labeling, fixes for textcat + transformers and more (v3.5.1) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.7715077
  • Pouliquen, B., et al., 2004. Geographical information recognition and visualization in texts written in various languages. In: Proceedings of the 2004 ACM symposium on applied computing – SAC ’04, 14–17 March 2004 Nicosia, Cyprus. New York: Association for Computing Machinery, 1051.
  • Purves, R.S., et al., 2018. Geographic information retrieval: progress and challenges in spatial search of text. Foundations and Trends® in Information Retrieval, 12 (2–3), 164–318.
  • Purves, R.S., Winter, S., and Kuhn, W., 2019. Places in information science. Journal of the Association for Information Science and Technology, 70 (11), 1173–1182.
  • Pyysalo, S., et al., 2015. Universal dependencies for Finnish. In: Proceedings of the 20th Nordic conference of computational linguistics (Nodalida 2015), 11–13 May 2015 Vilnius, Lithuania. Sweden: Linköping University Electronic Press, 163–172.
  • Tkachenko, M., et al., 2020. Label Studio: data labeling software. Available from: https://github.com/heartexlabs/label-studio
  • Tobin, R., et al., 2010. Evaluation of Georeferencing. In: Proceedings of the 6th workshop on geographic information retrieval (GIR’10), 18–19 February 2010 Zurich, Switzerland. New York: Association for Computing Machinery, 7.
  • Virtanen, A., et al., 2019. Multilingual is not enough: BERT for Finnish. arXiv Preprint arXiv:1912.07076.
  • Wallgrün, J.O., et al., 2018. GeoCorpora: building a corpus to test and train microblog geoparsers. International Journal of Geographical Information Science, 32 (1), 1–29.
  • Wang, J. and Hu, Y., 2019a. Are we there yet? Evaluating state-of-the-art neural network based geoparsers using EUPEG as a benchmarking platform. In: Proceedings of the 3rd ACM SIGSPATIAL international workshop on geospatial humanities, 5 November 2019 Chicago, United States. New York: Association for Computing Machinery, 1–6.
  • Wang, J. and Hu, Y., 2019b. Enhancing spatial and textual analysis with EUPEG: an extensible and unified platform for evaluating geoparsers. Transactions in GIS, 23 (6), 1393–1419.
  • Wang, J., Hu, Y., and Joseph, K., 2020. NeuroTPR: a neuro-net toponym recognition model for extracting locations from social media messages. Transactions in GIS, 24 (3), 719–735.
  • Weischedel, R., et al., 2013. OntoNotes release 5.0 (p. 2806280 KB) [dataset]. Linguistic Data Consortium. https://doi.org/10.35111/xmhb-2b84