7,984
Views
60
CrossRef citations to date
0
Altmetric
Special Section: Crowdsourced Geospatial Data Quality

Crowdsourced geospatial data quality: challenges and future directions

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 1588-1593 | Received 04 Mar 2019, Accepted 06 Mar 2019, Published online: 23 May 2019

Introduction

A decade ago, Volunteered Geographical Information (VGI) was identified as a new source of information that would blur the traditional boundary between producers and the consumers of data (Goodchild Citation2007). This form of information has been recognised by multiple names, including crowdsourced geospatial data (Heipke Citation2010) and user-generated geographic content (Fast and Rinner Citation2014), to name but a few. Many applications and services benefit from user-generated content contributed by a wide range of users through crowdsourcing projects. VGI has made it possible for a much wider group of contributors to create and share geographical information. Despite the success and popularity of many VGI projects, such as OpenStreetMap (OSM), researchers continue questioning the reliability and fitness for use of crowdsourced data (Haklay et al. Citation2010, Koukoletsos et al. Citation2012, Arsanjani et al. Citation2015, Foody et al. Citation2015, Salk et al. Citation2015, Basiri et al. Citation2016a, Senaratne et al. Citation2017).

The belief that VGI is contributed by the public, some contributors with little experience and expertise of geospatial data, might have contributed to the perception of the unreliability of this data source. Such issues have impeded the adoption of crowdsourced geospatial data in several projects. While one can argue the importance of the individuals and their levels of expertise based on the concepts of ‘the wisdom of the crowd’ and the collective decision, some questioned the representativeness, i.e. the structure of the crowd and ‘power of the elites’, in many crowdsourcing projects (Leszczynski and Elwood Citation2015, Ballatore and De Sabbata Citation2018).

This special section of the International Journal of Geographical Information Science looks at the challenges and future directions of crowdsourced geospatial data with particular attention to the issues stemming from data quality and biases of VGI. This editorial highlights how these issues are discussed and addressed by the articles of this special issue and how the papers highlight emerging technologies, concepts, platforms, debates, and methodologies and techniques within VGI and suggest future research directions. This special issue gathered papers on the topics of crowdsourced geospatial data quality (Ballatore and Arsanjani Citation2018), thematic uncertainty and consistency across data sources (Hervey and Kuhn Citation2018), spatial biases (Millar et al., Citation2018), trust issues within VGI (Severinsen et al. Citation2019), and contributors behaviour and interactions (Truong et al. Citation2018).

Crowdsourced data quality challenges

VGI data quality issues

Crowdsourced geographic data quality has been the main core part of many research and studies; Fonte et al. (Citation2015), Senaratne et al. (Citation2017), Fonte et al. (Citation2017), Basiri et al. (Citation2016a), Antoniou and Skopeliti (Citation2015), Goodchild and Li (Citation2012) and several other studies reviewed VGI’s quality assessment and assurance methods. There are several ways to classify the quality assessment methods but the following categories are commonly mentioned in literature with different titles; (a) comparing data against “authoritative” spatial data (Koukoletsos et al. Citation2012, Dorn et al. Citation2015) (b) user’s and/or machine learnt rules and patterns for checking the entries (Neis et al. Citation2012, Jilani et al. Citation2013; Ali and Schmid Citation2014, Basiri et al. Citation2016a, Citation2016b, Leibovici et al. Citation2017) (c) gatekeeping and weighting users’ entries (e.g. with respect to the their experiences, expertise, proximity, number of their entries, history and changesets) (McGreavy et al. Citation2017, Ciampaglia et al. Citation2018). Having a better understanding of the quality of VGI may help the adoption of crowdsourced geospatial data in some projects as the perception of unreliability may impede the adoption. To address the issues on trust, transparency and reliability Truong et al. (2018) looked at the contributors behaviour and their interactions. They qualified the behaviour of contributors to OpenStreetMap (OSM) through a multigraph approach to reproduce contributor’s interactions in a more comprehensive way. Ballatore and Arsanjani (Citation2018) looked at the origin and development of Wikimapia and discussed some aspect of Wikimapia, including the project’s intellectual property and strategies for quality management. Hervey and Kuhn (2018) explored uncertainty with locational data obtained from social networks. They presented a taxonomy of things that can be located from social network posts and a means to describe them to users. Severinsen et al. (2019) present a formulaic model to addresses VGI quality issues by quantifying trust in VGI. Their ‘VGTrust’ model assessed information about a data author, and the spatial and temporal trust associated with the data they create in order to produce an overall VGTrust rating metric.

VGI biases

While quality issues of crowd-sourced data have been studied widely, the identification and estimation of biases in crowd-sourced projects have not received the same attention. This is due mainly to the lack of availability of the (geo-) demographic data of the contributors, which is either unrecorded (e.g. OpenStreetMap) or inaccessible due to commercial interest (e.g. in the now defunct Google MapMaker). Therefore understanding the impacts of demographic biases on crowdsourced maps is challenged by a lack of data on these data (Mullen et al. Citation2015, Haklay Citation2016, Basiri et al. Citation2018, Gardner and Mooney; Citation2018, Gardner et al. Citation2018). Millar et al. (Citation2018) looked at the biases and studies the lack of citizen science monitoring programs. The study focuses on natural and demographic biases related to the location, accessibility, size and general attractiveness of lakes in Ontario.

Any VGI project is biased in one or more ways (Basiri et al. Citation2018). At the first glance, it seems that all the data contributed through VGI projects are ‘voluntary response samples’, which are always biased as they only include people who have chosen to volunteer (DeMaio Citation1980). Whereas a random sample would need to include people whether or not they choose to volunteer (Goyder Citation1986). Thus inferences from a voluntary response sample are not as credible as conclusions are based on a random sample of the entire population. While crowdsourcing projects are technically open to the whole population, and of course, anyone should be able to contribute, recent studies (Mullen et al. Citation2015, Yang et al. Citation2016, Zhu et al. Citation2017, Gardner et al. Citation2018) have shown that even the most popular crowdsourced projects, such as OSM, are biased by the contribution patterns of its contributors, i.e. that a small percentage of the community contribute the greatest proportion of activity (the ‘long tail effect’ or 90–9-1 rule (Haklay Citation2016)). Ballatore and Arsanjani (Citation2018) studied the popularity of the project using behavioural data from Google Trends and compared the geography of Interest in Wikimapia with OpenStreetMap, from a temporal and spatial perspective. And found while OpenStreetMap attracts more interest in high-income countries, Wikimapia emerges as relatively more popular in low- and middle-income countries, countering the received notion of VGI as a Global North phenomenon. Therefore, we might questions the use of the terms ‘crowd’ and ‘public’ used in many crowdsourcing and public participatory projects by virtue of this skewed pattern of participation. This excludes the projects which may require a relatively higher experience level, access to some resources, or may limit participation to a specific geography or particular time interval due to the nature of the project (Morschheuser et al. Citation2018).

In addition to voluntary response bias, the volunteers, as individuals, can have different aspects and levels of quality of judgement and decision making (Hammond Citation2000). Their decisions, opinions, and preferences could be significantly represented and/or influence their contribution (e.g. data). Although there are some arguments based on the concepts of ‘the wisdom of the crowd’ trying to undermine or counter- balance the impacts of the individuals’ biases on the collective decision, there are two challenges to this notion: Firstly, representativeness, i.e. the structure of the crowd and ‘power of the elites’, in many crowdsourcing projects have been questioned (Comber et al. Citation2016), (See et al. Citation2013). For example, both Elwood (Citation2010) and Leszczynski and Elwood (Citation2015) have problematized participation biases in VGI on the grounds of a failure of crowdsourced mapping projects to represent the interests of the wider public, specifically those of women. Similarly, Ballatore and De Sabbata (Citation2018) have explored the extent to which VGI are representative of the wider population of the geospatial units they represent.

The issues of representation could, therefore, be an issue in terms of biases, however, some believe that the super active contributors are experts and so it is better to leave some decisions in their hands. While Giles (Citation2005) and Rajagopalan et al. (Citation2011) showed that collective decision-making can be more accurate than experts’ comments, accuracy does not necessarily show all the aspects of quality and might not be even loosely correlated with potential bias. In terms of biases Greenstein and Zhu (Citation2017) found the knowledge produced by the crowd are not necessarily less biased than the knowledge produced by experts. Ciampaglia et al. (Citation2018) confirmed this by using Wikipedia contents, however, they found both biases and data quality could be moderated if substantial revisions and supervisions (of the gatekeepers) were implemented.

The second challenge to the notion of the ‘wisdom of the crowd’, is the process of many VGI projects which is not based on a collective decision but instead on crowd ‘participation’. The difference is relatively implicit but highly important; the participants do not vote for/against every single decision or entry. The collection of individual decisions does not necessarily mean the collective decision making. Therefore the wisdom of the crowd may not be relevant to such projects as the individual bias can remain at micro-level. As the crowd makes decisions individually in a participatory project, the results of an individual’s contributions could be biased. Therefore for these projects the case of ‘given enough eyeballs, all bugs are shallow’ (Raymond Citation1998) is no longer valid as there is not enough revision/votes for each piece of information contributed by volunteers. Ciampaglia et al. (Citation2018) found that crowd-sourced content can also produce a large sample with a great variety of biased opinions.

Future directions

The papers within this special issue have looked at some of the challenges and issues of crowdsourced geographic data, including data quality, biases, and trust issues. They have also provided some solutions to either address or have a better understanding of the implications of these issues. It seems that research focus of research on VGI has been moving towards: the structure of the ‘crowd’ and volunteers (geo-)demographic biases, the impact of having such biases in different VGI projects, research on how to promote diversity of contributors communities, addressing the issues of transparency and trust while protecting the privacy of the contributors, and working on intellectual property of crowdsourced data and projects. It seems that future research looks beyond VGI as just a way to create maps but as a complex and more democratic, reproducible and open but reliable system engaging society, promoting diversity, collaborations, and wider engagement.

Disclosure statement

No potential conflict of interest was reported by the authors.

References

  • Ali, A.L. and Schmid, F., (2014), Data quality assurance for volunteered geographic information. In International Conference on Geographic Information Science, pp. 126–141. Springer, Cham. doi:10.1107/S205327331303091X
  • Antoniou, V. and Skopeliti, A., 2015. Measures and indicators of VGI quality: an overview. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2, 345. doi:10.5194/isprsannals-II-3-W5-345-2015
  • Arsanjani, J.J., Mooney, P.A., and Schauss, A., 2015. Quality assessment of the contributed land use information from OpenStreetMap versus authoritative datasets. In: J. Jokar Arsanjani, et al., eds. OpenStreetMap in GIScience, experiences, research, and applications (LNCS). Berlin Heidelberg: Springer, 37–58.
  • Ballatore, A. and Arsanjani, J.J., 2018. Placing Wikimapia: an exploratory analysis. International Journal of Geographical Information Science, 1–18. doi:10.1080/13658816.2018.1463441
  • Ballatore, A. and De Sabbata, S. (2018), Charting the geographies of crowdsourced information in Greater London, Proceedings of the AGILE Conference, Lund, Sweden.
  • Basiri, A., et al., 2016a. Quality assessment of OpenStreetMap data using trajectory mining. International Journal of Geospatial Information Science, 19 (1), 56–68.
  • Basiri, A., Amirian, P., and Mooney, P., 2016b. Using crowdsourced trajectories for automated OSM data entry approach. Sensors, 16 (9), 1510. doi:10.3390/s16091510
  • Basiri, A., Haklay, M., and Gardner, Z., 2018. The impact of biases in the crowdsourced trajectories on the output of data mining processes. Association of Geographic Information Laboratories in Europe (AGILE). http://www.cs.nuim.ie/~pmooney/vgi-alive2018/papers/1.3.pdf .
  • Ciampaglia, G., et al., 2018. How algorithmic popularity bias hinders or promotes quality. Scientific Reports, 8 (1), 15951. doi:10.1038/s41598-018-34203-2
  • Comber, A., et al., 2016. Crowdsourcing: it matters who the crowd are. The impacts of between group variations in recording land cover. PloS one, 11 (7), e0158329. doi:10.1371/journal.pone.0158329
  • DeMaio, T.J., 1980. Refusals: who, where and why. Public Opinion Quarterly, 44 (2), 223–233. doi:10.1086/268586
  • Dorn, H., Törnros, T., and Zipf, A., 2015. Quality evaluation of VGI using authoritative data—a comparison with land use data in Southern Germany. ISPRS International Journal of Geo-Information, 4 (3), 1657–1671. doi:10.3390/ijgi4031657
  • Elwood, S., 2010. Geographic information science: emerging research on the societal implications of the geographical web. Progress in Human Geography, 34 (3), 349–357. doi:10.1177/0309132509340711
  • Fast, V. and Rinner, C., 2014. A systems perspective on volunteered geographic information. ISPRS International Journal of Geo-Information, 3 (4), 1278–1292. doi:10.3390/ijgi3041278
  • Fonte, C. C., Antoniou, V., Bastin, L., Estima, J., Arsanjani, J. J., Bayas, J. C. L., See, L. and Vatseva, R. 2017. Assessing VGI data quality. In Foody, G., et al. (eds.), Mapping and the Citizen Sensor (pp. 137–163). London: Ubquity press.
  • Fonte, C.C., et al., 2015. Usability of VGI for validation of land cover maps. International Journal of Geographical Information Science, 29 (7), 1269–1291. doi:10.1080/13658816.2015.1018266
  • Foody, G.M., et al., 2015. Accurate attribute mapping from volunteered geographic information: issues of volunteer quantity and quality. The Cartographic Journal, 52 (4), 336–344. doi:10.1179/1743277413Y.0000000070
  • Gardner, Z., et al., (2018), Gender differences in OSM activity, editing and tagging. Proceedings of GISRUK 2018 Conference, Leicester, 17-20th April.
  • Gardner, Z. and Mooney, P., (2018). Investigating gender differences in OpenStreetMap activities in Malawi: a small case-study. Proceedings of AGILE Conference, Lund, Sweden, 12-15th June.
  • Giles, J., 2005. Internet encyclopaedias go head to head. Nature, 438 (7070), 900–901. doi:10.1038/438900a
  • Goodchild, M.F., 2007. Citizens as sensors: the world of volunteered geography. GeoJournal, 69 (4), 211–221. doi:10.1007/s10708-007-9111
  • Goodchild, M.F. and Li, L., 2012. Assuring the quality of volunteered geographic information. Spatial Statistics, 1, 110–120. doi:10.1016/j.spasta.2012.03.002
  • Goyder, J., 1986. Surveys on surveys: limitations and potentialities. Public Opinion Quarterly, 50 (1), 27–41. doi:10.1086/268957
  • Greenstein, S. and Zhu, F., (2017), Do experts or crowd- based models produce more bias? Evidence from Encyclopædia Britannica and Wikipedia. Forthcoming, Management Information Systems Quarterly.
  • Haklay, M., et al., 2010. How many volunteers does it take to map an area well? The validity of Linus’ law to volunteered geographic information. The Cartographic Journal, 47 (4), 315–322. doi:10.1179/000870410X12911304958827
  • Haklay, M., 2016. Why is participation inequality important?. Ubiquity Press. https://www.ubiquitypress.com/site/chapters/10.5334/bax.c/download/243/ .
  • Hammond, K.R., 2000. Coherence and correspondence theories in judgment and decision making. Judgment and Decision Making: an Interdisciplinary Reader, 53–65.
  • Heipke, C., 2010. Crowdsourcing geospatial data. ISPRS Journal of Photogrammetry and Remote Sensing, 65(6), 550–557. doi:10.1016/j.isprsjprs.2010.06.005
  • Hervey, T., and Kuhn, W., 2018. Using provenance to disambiguate locational references in social network posts. International journal of geographical information science, pp. 1–18. doi:10.1080/13658816.2018.1459627
  • Jilani, M., Corcoran, P., and Bertolotto, M., 2013. Automated quality improvement of road network in OpenStreetMap. In Agile Workshop (action and interaction in volunteered geographic information). 19. https://pdfs.semanticscholar.org/f57e/61c9ec141196e9229bfd117518067e3d412a.pdf .
  • Koukoletsos, T., Haklay, M., and Ellul, C., 2012. Assessing data completeness of VGI through an automated matching procedure for linear data. Transactions in GIS, 16 (4), 477–498. doi:10.1111/j.1467-9671.2012.01304.x
  • Leibovici, D., et al., 2017. On data quality assurance and conflation entanglement in crowdsourcing for environmental studies. ISPRS International Journal of Geo-Information, 6 (3), 78. doi:10.3390/ijgi6030078
  • Leszczynski, A. and Elwood, S., 2015. Feminist geographies of new special media. The Canadian Geographer, 59 (1), 12–28. doi:10.1111/cag.12093
  • McGreavy, B., et al., 2017. The power of place in citizen science. Maine Policy Review, 26.2, 94–95.
  • Millar, Edward E ., Hazell, E. C., and Melles, S. J., 2018. The ‘cottage effect’in citizen science? Spatial bias in aquatic monitoring programs. International Journal of Geographical Information Science, 1–21.
  • Morschheuser, B., Hamari, J., and Maedche, A., 2018. Cooperation or competition–when do people contribute more? A field experiment on gamification of crowdsourcing. International Journal of Human-Computer Studies. doi:10.1016/j.ijhcs.2018.10.001
  • Mullen, W.F., et al., 2015. Assessing the impact of demographic characteristics on spatial error in volunteered geographic information features. GeoJournal, 80 (4), 587–605. doi:10.1007/s10708-014-9564-8
  • Neis, P., Goetz, M., and Zipf, A., 2012. Towards automatic vandalism detection in OpenStreetMap. ISPRS International Journal of Geo-Information, 1 (3), 315–332. doi:10.3390/ijgi1030315
  • Rajagopalan, M.S., et al., 2011. Patient-oriented cancer information on the internet: a comparison of wikipedia and a professionally maintained database. Journal of Oncology Practice, 7 (5), 319–323. doi:10.1200/JOP.2010.000209
  • Raymond, E. (1998), The Cathedral and the Bazaar. First Monday. Available form: http://tinyurl.com/bqfy3s [Accessed May 2018] doi:10.5210/fm.v3i2.578
  • Salk, C.F., et al., 2015. Assessing quality of volunteer crowdsourcing contributions: lessons from the cropland capture game. International Journal of Digital Earth, 2015, 1–17.
  • See, L., et al., 2013. Comparing the quality of crowdsourced data contributed by expert and non-experts. PloS one, 8 (7), e69958. doi:10.1371/journal.pone.0069958
  • Senaratne, H., et al., 2017. A review of volunteered geographic information quality assessment methods. International Journal of Geographical Information Science, 31 (1), 139–167. doi:10.1080/13658816.2016.1189556
  • Severinsen, Jeremy, Mairead de Roiste,Femke Reitsma, and Emir Hartato, 2019. Vgtrust: measuring trust for volunteered geographic information. International Journal of Geographical Information Science, 1–19. doi:10.1080/13658816.2019.1572893
  • Truong, Quy Thy, De Runz, Cyril, and Touya, Guillaume. 2018 "Analysis of collaboration networks in OpenStreetMap through weighted social multigraph mining."International Journal of Geographical Information Science, 1–32.
  • Yang, A., et al., 2016. Temporal analysis on contribution inequality in OpenStreetMap: a comparative study for four countries. ISPRS International Journal of Geo-Information, 5 (1), 5.
  • Zhu, D., et al., 2017. Inferring spatial interaction patterns from sequential snapshots of spatial distributions. International Journal of Geographical Information Science, 32 (4), 783–805. doi:10.1080/13658816.2017.1413192

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.