CrossRef citations to date
Special Section: Social Media and Tracking Data

A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements

, & ORCID Icon
Pages 714-738 | Received 12 Aug 2017, Accepted 27 Mar 2018, Published online: 13 Apr 2018


  • Aburizaiza, A.O. and Rice, M.T., 2016. Geospatial footprint library of geoparsed text from geocrowdsourcing. Spatial Information Research, 24 (4), 409–420. doi:10.1007/s41324-016-0042-x
  • Adams, B. and Janowicz, K., 2012. On the geo-indicativeness of non-georeferenced text. In: Proceedings of the International Conference on Web and Social Media (ICWSM). Palo Alto, CA: AAAI Press, 375–378.
  • Amitay, E., et al., 2004. Web-a-where: geotagging web content. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. New york: ACM, 273–280.
  • Awamura, T., et al., 2015. Location name disambiguation exploiting spatial proximity and temporal consistency. In: SocialNLP 2015@ NAACL. Stroudsburg, PA: ACL, 1–9.
  • Brown, G., 2015. Engaging the wisdom of crowds and public judgement for land use planning using public participation geographic information systems. Australian Planner, 52 (3), 199–209. doi:10.1080/07293682.2015.1034147
  • Buscaldi, D. and Rosso, P., 2008. A conceptual density-based approach for the disambiguation of toponyms. International Journal of Geographical Information Science, 22 (3), 301–313. doi:10.1080/13658810701626251
  • Cope, A. and Kelso, N., 2015. Who’s on first. Mapzen Blog. Available from: https://mapzen.com/blog/who-s-on-first [Accessed 05 September 2017].
  • Daiber, J., et al., 2013. Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems. ACM, 121–124.
  • DeLozier, G., et al., 2016. Creating a novel geolocation corpus from historical texts. In: Proceedings of The 10th Linguistic Annotation Workshop. Stroudsburg, PA:Association for Computational Linguistics, 188–198.
  • DeLozier, G., Baldridge, J., and London, L., 2015. Gazetteer-independent toponym resolution using geographic word profiles. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). Palo Alto, CA: AAAI Press, 2382–2388.
  • Duckham, M., et al., 2008. Efficient generation of simple polygons for characterizing the shape of a set of points in the plane. Pattern Recognition, 41 (10), 3224–3236. doi:10.1016/j.patcog.2008.03.023
  • Finkel, J.R., Grenager, T., and Manning, C., 2005. Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd annual meeting on association for computational linguistics. Stroudsburg, PA: Association for Computational Linguistics, 363–370.
  • Forney, G.D., 1973. The Viterbi algorithm. Proceedings of the IEEE, 61 (3), 268–278. doi:10.1109/PROC.1973.9030
  • Gelernter, J., et al., 2013. Automatic gazetteer enrichment with user-geocoded data. In: Proceedings of the Second ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information. New York: ACM, 87–94.
  • Gelernter, J. and Mushegian, N., 2011. Geo-parsing messages from microtext. Transactions in GIS, 15 (6), 753–773. doi:10.1111/tgis.2011.15.issue-6
  • Girardin, F., et al., 2008. Digital footprinting: uncovering tourists with user-generated content. IEEE Pervasive Computing, 7 (4). doi:10.1109/MPRV.2008.71
  • Goodchild, M.F. and Hill, L.L., 2008. Introduction to digital gazetteer research. International Journal of Geographical Information Science, 22 (10), 1039–1044. doi:10.1080/13658810701850497
  • Greene, R.P. and Pick, J.B., 2012. Exploring the urban community: a GIS approach. Upper Saddle River, NJ: Prentice Hall.
  • Gregory, I., et al., 2015. Geoparsing, GIS, and textual analysis: current developments in spatial humanities research. International Journal of Humanities and Arts Computing, 9 (1), 1–14. doi:10.3366/ijhac.2015.0135
  • Grothe, C. and Schaab, J., 2009. Automated footprint generation from geotags with kernel density estimation and support vector machines. Spatial Cognition & Computation, 9 (3), 195–211. doi:10.1080/13875860903118307
  • Hecht, B. and Raubal, M., 2008. GeoSR: geographically explore semantic relations in world knowledge. In: The European Information Society: Taking Geoinformation Science One Step Further. Berlin: Springer, 95–113.
  • Hill, L.L., 2000. Core elements of digital gazetteers: placenames, categories, and footprints. In: International Conference on Theory and Practice of Digital Libraries. Berlin: Springer, 280–290.
  • Hollenstein, L. and Purves, R., 2010. Exploring place through user-generated content: using Flickr tags to describe city cores. Journal of Spatial Information Science, 2010 (1), 21–48.
  • Hu, Y., et al., 2015. A multistage collaborative 3D GIS to support public participation. International Journal of Digital Earth, 8 (3), 212–234. doi:10.1080/17538947.2013.866172
  • Hu, Y., Janowicz, K., and Prasad, S., 2014. Improving Wikipedia-based place name disambiguation in short texts using structured data from DBpedia. In: Proceedings of the 8th workshop on geographic information retrieval. New York: ACM, 1–8.
  • Inkpen, D., et al., 2015. Location detection and disambiguation from Twitter messages. Journal of Intelligent Information Systems, 49 (2): 1–17.
  • Intagorn, S. and Lerman, K., 2011. Learning boundaries of vague places from noisy annotations. In: Proceedings of the 19th ACM SIGSPATIAL international conference on advances in geographic information systems. New York: ACM, 425–428.
  • Janée, G., Frew, J., and Hill, L.L., 2004. Issues in georeferenced digital libraries. D-Lib Magazine, 10 (5), 1082–9873.
  • Janowicz, K. and Keßler, C., 2008. The role of ontology in improving gazetteer interaction. International Journal of Geographical Information Science, 22 (10), 1129–1157. doi:10.1080/13658810701851461
  • Jarvis, R.A., 1973. On the identification of the convex hull of a finite set of points in the plane. Information Processing Letters, 2 (1), 18–21. doi:10.1016/0020-0190(73)90020-3
  • Jones, C.B., et al., 2008. Modelling vague places with knowledge from the Web. International Journal of Geographical Information Science, 22 (10), 1045–1065. doi:10.1080/13658810701850547
  • Jones, C.B. and Purves, R.S., 2008. Geographical information retrieval. International Journal of Geographical Information Science, 22 (3), 219–228. doi:10.1080/13658810701626343
  • Kar, B., et al., 2016. Public participation GIS and participatory GIS in the era of GeoWeb. The Cartographic Journal, 53 (4), 296–299. doi:10.1080/00087041.2016.1256963
  • Karimzadeh, M., et al., 2013. GeoTxt: a web API to leverage place references in text. In: Proceedings of the 7th workshop on geographic information retrieval. New York: ACM, 72–73.
  • Keßler, C., et al., 2009b. Bottom-up gazetteers: learning from the implicit semantics of geotags. In: International Conference on GeoSpatial Sematics. Berlin: Springer, 83–102.
  • Keßler, C., Janowicz, K., and Bishr, M., 2009a. An agenda for the next generation gazetteer: geographic information contribution and retrieval. In: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. New York: ACM, 91–100.
  • Ladra, S., et al., 2008. A toponym resolution service following the OGC WPS standard. In: International Symposium on Web and Wireless Geographical Information Systems. Berlin: Springer, 75–85.
  • Lafferty, J., et al., 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning. Burlington, MA: Morgan Kaufmann Publishers, 282–289.
  • Larson, R.R., 1996. Geographic information retrieval and spatial browsing. In: Geographic information systems and libraries: patrons, maps, and spatial information [papers presented at the 1995 Clinic on Library Applications of Data Processing], 10–13 April,1995, Urbana-Champaign, IL.
  • Lehmann, J., et al., 2015. DBpedia–a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web, 6 (2), 167–195.
  • Leidner, J.L., 2008. Toponym resolution in text: annotation, evaluation and applications of spatial grounding of place names. Irvine, CA: Universal-Publishers.
  • Leidner, J.L. and Lieberman, M.D., 2011. Detecting geographical references in the form of place names and associated spatial natural language. SIGSPATIAL Special, 3 (2), 5–11. doi:10.1145/2047296
  • Li, H., et al., 2002. Location normalization for information extraction. In: Proceedings of the 19th international conference on Computational linguistics-Volume 1. Stroudsburg, PA: Association for Computational Linguistics, 1–7.
  • Li, L. and Goodchild, M.F., 2012. Constructing places from spatial footprints. In: Proceedings of the 1st ACM SIGSPATIAL international workshop on crowdsourced and volunteered geographic information. New York: ACM, 15–21.
  • Lieberman, M.D. and Samet, H., 2011. Multifaceted toponym recognition for streaming news. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. New York: ACM, 843–852.
  • Lieberman, M.D., Samet, H., and Sankaranarayanan, J., 2010. Geotagging with local lexicons to build indexes for textually-specified spatial data. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE). New York: IEEE, 201–212.
  • Lingad, J., Karimi, S., and Yin, J., 2013. Location extraction from disaster-related microblogs. In: Proceedings of the 22nd international conference on world wide web. New York: ACM, 1017–1020.
  • Liu, Y., et al., 2014. Analyzing Relatedness by Toponym Co-Occurrences on Web Pages. Transactions in GIS, 18 (1), 89–107. doi:10.1111/tgis.12023
  • Madden, D.J., 2017. Pushed off the map: toponymy and the politics of place in New York City. Urban Studies, p. Online First. doi:10.1177/0042098017700588
  • McCurley, K.S., 2001. Geospatial mapping and navigation of the web. In: Proceedings of the 10th international conference on World Wide Web. New York: ACM, 221–229.
  • McKenzie, G., et al., 2015. POI pulse: A multi-granular, semantic signature–based information observatory for the interactive visualization of big geosocial data. Cartographica: the International Journal for Geographic Information and Geovisualization, 50 (2), 71–85. doi:10.3138/cart.50.2.2662
  • McKenzie, G. and Adams, B., 2017. Juxtaposing thematic regions derived from spatial and platial user-generated content. In: Proceedings of the 13th International Conference on Spatial Information Theory. Wadern, Germany: Schloss Dagstuhl.
  • Medway, D. and Warnaby, G., 2014. What’s in a name? Place branding and toponymic commodification. Environment and Planning A, 46 (1), 153–167. doi:10.1068/a45571
  • Melo, F. and Martins, B., 2017. Automated geocoding of textual documents: A survey of current approaches. Transactions in GIS, 21 (1), 3–38. doi:10.1111/tgis.2017.21.issue-1
  • Molla, D. and Karimi, S., 2014. Overview of the 2014 ALTA shared task: identifying expressions of locations in tweets. In: Australasian Language Technology Association Workshop 2014. Stroudsburg, PA: ACL, 151.
  • Montello, D.R., et al., 2003. Where’s downtown?: behavioral methods for determining referents of vague spatial queries. Spatial Cognition & Computation, 3 (2–3), 185–204. doi:10.1080/13875868.2003.9683761
  • Overell, S. and Rüger, S., 2008. Using co-occurrence models for placename disambiguation. International Journal of Geographical Information Science, 22 (3), 265–287. doi:10.1080/13658810701626236
  • Paradesi, S.M., 2011. Geotagging tweets using their content. In: FLAIRS Conference. Palo Alto, CA: AAAI Press.
  • Rattenbury, T., Good, N., and Naaman, M., 2007. Towards automatic extraction of event and place semantics from flickr tags. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and development in information retrieval. New York: ACM, 103–110.
  • Rice, M.T., et al., 2012. Supporting accessibility for blind and vision-impaired people with a localized gazetteer and open source geotechnology. Transactions in GIS, 16 (2), 177–190. doi:10.1111/j.1467-9671.2012.01318.x
  • Rinner, C. and Bird, M., 2009. Evaluating community engagement through argumentation maps—a public participation GIS case study. Environment and Planning B: Planning and Design, 36 (4), 588–601. doi:10.1068/b34084
  • Salvini, M.M. and Fabrikant, S.I., 2016. Spatialization of user-generated content to uncover the multirelational world city network. Environment and Planning B: Planning and Design, 43 (1), 228–248. doi:10.1177/0265813515603868
  • Santos, J., Anastácio, I., and Martins, B., 2015. Using machine learning methods for disambiguating place references in textual documents. GeoJournal, 80 (3), 375–392. doi:10.1007/s10708-014-9553-y
  • Sheather, S.J. and Jones, M.C., 1991. A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society. Series B (Methodological), 53 (3), 683–690.
  • Southall, H., 2014. Rebuilding the Great Britain Historical GIS, Part 3: integrating qualitative content for a sense of place. Historical Methods: A Journal of Quantitative and Interdisciplinary History, 47 (1), 31–44. doi:10.1080/01615440.2013.847774
  • Stokes, N., et al., 2008. An empirical study of the effects of NLP components on Geographic IR performance. International Journal of Geographical Information Science, 22 (3), 247–264. doi:10.1080/13658810701626210
  • Tjong Kim Sang, E. and De Meulder, F., 2003. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4. Stroudsburg, PA: Association for Computational Linguistics, 142–147.
  • Twaroch, F.A. and Jones, C.B., 2010. A web platform for the evaluation of vernacular place names in automatically constructed gazetteers. In: Proceedings of the 6th Workshop on Geographic Information Retrieval. New York: ACM, 14.
  • Twaroch, F.A., Jones, C.B., and Abdelmoty, A.I., 2009. Acquisition of vernacular place names from web sources. In: I. King and R. Baeza-Yates, eds. Weaving services and people on the world wide web. Berlin: Springer, 195–214.
  • Vasardani, M., Winter, S., and Richter, K.F., 2013. Locating place names from place descriptions. International Journal of Geographical Information Science, 27 (12), 2509–2532. doi:10.1080/13658816.2013.785550
  • Wallgrün, J.O., et al., 2018. GeoCorpora: building a corpus to test and train microblog geoparsers. International Journal of Geographical Information Science, 32 (1), 1–29. doi:10.1080/13658816.2017.1368523
  • Zhang, W. and Gelernter, J., 2014. Geocoding location expressions in Twitter messages: A preference learning method. Journal of Spatial Information Science, 2014 (9), 37–70.
  • Zhu, R., et al., 2016. Spatial signatures for geographic feature types: examining gazetteer ontologies using spatial statistics. Transactions in GIS, 20 (3), 333–355. doi:10.1111/tgis.12232

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.