Search in:

Geo-spatial Information Science Volume 20, 2017 - Issue 3: Special Issue: Volunteered Geographic Information (VGI)-Analytics. Guest Editors: Peter Mooney, Alexander Zipf, Jamal Jokar Arsanjani, and Hartwig H. Hochmair

Submit an article Journal homepage

Open access

2,354

Views

CrossRef citations to date

Altmetric

Listen

Articles

Repurposing a deep learning network to filter and classify volunteered photographs for land cover and land use characterization

Lukasz TracewskiSchool of Engineering and Applied Science, Aston University, Birmingham, UK

http://orcid.org/0000-0002-4778-4266

Lucy BastinSchool of Engineering and Applied Science, Aston University, Birmingham, UKCorrespondence[email protected]

http://orcid.org/0000-0003-1321-0800

Cidalia C. FonteDepartment of Mathematics, INESC Coimbra, University of Coimbra, Coimbra, Portugal

http://orcid.org/0000-0001-9408-8100

Pages 252-268 | Received 08 Apr 2017, Accepted 19 Jun 2017, Published online: 18 Sep 2017

Cite this article
https://doi.org/10.1080/10095020.2017.1373955
CrossMark

In this article

Abstract
1. Introduction
2. Related work
3. Specific approach for this study
4. Conclusions
Funding
Notes on contributors
Acknowledgements
References

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF

Abstract

This paper extends recent research into the usefulness of volunteered photos for land cover extraction, and investigates whether this usefulness can be automatically assessed by an easily accessible, off-the-shelf neural network pre-trained on a variety of scene characteristics. Geo-tagged photographs are sometimes presented to volunteers as part of a game which requires them to extract relevant facts about land use. The challenge is to select the most relevant photographs in order to most efficiently extract the useful information while maintaining the engagement and interests of volunteers. By repurposing an existing network which had been trained on an extensive library of potentially relevant features, we can quickly carry out initial assessments of the general value of this approach, pick out especially salient features, and identify focus areas for future neural network training and development. We compare two approaches to extract land cover information from the network: a simple post hoc weighting approach accessible to non-technical audiences and a more complex decision tree approach that involves training on domain-specific features of interest. Both approaches had reasonable success in characterizing human influence within a scene when identifying the land use types (as classified by Urban Atlas) present within a buffer around the photograph’s location. This work identifies important limitations and opportunities for using volunteered photographs as follows: (1) the false precision of a photograph’s location is less useful for identifying on-the-spot land cover than the information it can give on neighbouring combinations of land cover; (2) ground-acquired photographs, interpreted by a neural network, can supplement plan view imagery by identifying features which will never be discernible from above; (3) when dealing with contexts where there are very few exemplars of particular classes, an independent a posteriori weighting of existing scene attributes and categories can buffer against over-specificity.

Keywords:

Land cover
land use
volunteered geographic information (VGI)
photograph
convolutional neural network
machine learning

1. Introduction

In recent years, there has been an explosion in the popularity and prevalence of spatial data generation by citizens, through active collection initiatives such as OpenStreetMap, games and citizen science projects which tackle a wide range of topics, such as invasive species (Delaney et al. Citation2008), disaster response (Goodchild and Glennon Citation2010), cropland expansion (Fritz et al. Citation2012) and election violence (Meier Citation2008). This proliferation of data co-creation has been facilitated by the availability of cheaper sensors and GPS in smartphones. Web 2.0 technologies facilitate sharing, co-editing and online quality assessment of the generated information. Hand in hand with this active data generation is a rapid increase in the volume of voluntarily published resources, such as photos and reviews which are associated with some sort of locational information. Many terms have been coined to describe these types of data generated by the public but the term we will use to describe this particular mix of actively and passively published spatially referenced data is volunteered geographic information (VGI) (Goodchild Citation2007). One of the phenomena that may be mapped using such VGI sources (potentially in combination with more authoritative data) is land cover/land use. Geo-tagged photographs published to libraries, such as Flickr and Panoramio, are being increasingly investigated as potential sources of information in this context (Antoniou et al. Citation2016). If salient features can be identified and the position of the photographer is relatively certain, a subset of such photos may be useful for verifying and validating land cover/land use maps, and identifying changes in the landscape such as disturbance and vegetation change. In some cases, photographs may be presented to volunteers as part of a game (e.g. MissingMaps), which requires gamers to interpret relatively complex photographs to extract relevant facts about phenomena such as disturbance, agricultural practices or settlements (Fritz et al. Citation2012). Thus far, the primary source of imagery for such applications has been aerial or satellite photography (e.g. detailed imagery from DigitalGlobe is used in the Tomnod platform) which offers a plan view of the ground. However, there is a potential role for volunteered photographs taken for entirely different purposes and published in repositories (e.g. Flickr) to fill the gaps, where no aerial imagery is available and to add significant value in terms of identifying key landscape features. The key challenge when exploiting such a vast and heterogeneous data source is to identify the most relevant and useful photographs from the deluge of available candidates, in order to most efficiently extract useful information while maintaining the engagement and interest of any volunteers assisting with classification. One option for automating this filtering process is to apply machine learning approaches such as deep learning to the content and metadata of the photographs, in combination with a user-defined set of priorities which define fitness for a particular purpose. The priorities of the original photographer submitting photos to Flickr will rarely align with those of a scientist trying to repurpose the image, so it is vital to identify the most salient images to avoid being swamped by irrelevant information.

This paper extends recent research into the usefulness of volunteered photos for land cover extraction, and investigates whether this usefulness can be automatically assessed in order to best focus citizen science efforts. We revisit a set of photos harvested from the web which were assessed for their usefulness by experts (Antoniou et al. Citation2016) and evaluates the degree to which the rule-based classification from the experts can be replicated by a neural network on the basis of the features identified in the photos. By repurposing an existing network which had been trained on an extensive library of potentially relevant features, we were able to carry out initial assessments of the general value of this approach, pick out features which were especially salient, and identify focus areas for future neural network training and development for this specific purpose. This approach also allowed us to test methods accessible to a general audience without specialized development and coding expertise.

2. Related work

2.1. Land cover and land use mapping—the context

Land cover or land use mapping is usually performed through the classification of satellite imagery. A variety of nomenclatures can be used in Land cover and land use mapping, some corresponding to land cover data and some also including land use information, such as CORINE Land Cover (European Environmental Agency Citation2006) or the Global Monitoring for Environment and Security Urban Atlas (UA) (European Environmental Agency Citation2012). The classification of land cover corresponds to the biophysical cover of the earth’s surface, while land use is associated with arrangements, activities and inputs humans undertake for related to a certain type of land cover. The identification of different types of land cover using satellite imagery is easier than the identification of land use, since the latter often does not correspond to characteristics easily identifiable in aerial or satellite imagery using just reflectance. For example, a region covered with grass may correspond to several types of land use, such as sport fields, public or private parks or natural grassland. The opposite can also happen—a land use class, such as recreation areas, often includes several types of land cover.

Information about land use can be relatively easily provided by volunteers or be often easily identifiable in photographs taken at the earth’s surface. Therefore, information provided by volunteers may be valuable, either when they are asked to identify land use classes directly, such as when creating vector data corresponding to land use information in OpenStreetMap, or by using photographs taken by the citizens. In the second context, the volunteers do not provide the land use/cover classes directly, but provide the data from which it can be extracted. The use of different nomenclatures raises problems when several sources of land use/land cover data are to be compared or combined. This requires the establishment of a mapping between nomenclatures. Even though this mapping is not always easy, several harmonization mappings are available for different nomenclatures (Arnold et al. Citation2013; Fonte et al. Citation2017).

2.1.1. Volunteered photos—the context

A thorough summary of online photo repositories and their protocols can be found in Antoniou et al. (Citation2016). Some (currently relatively small) repositories focus specifically on land cover and land use; among these are the Degree Confluence Project and the Field Photo Library (Xiao et al. Citation2011). These data sources are well used for environmental modelling and validation (Foody and Boyd Citation2012; Iwao et al. Citation2006; Leinenkugel et al. Citation2014). These data can be assumed to be of interest for the purposes of this research. Therefore, we focus on expanding and supplementing this resource by repurposing and filtering public photographs from other domains, exploiting online repositories such as Flickr, Panoramio, Geograph and Instagram. The above-mentioned repositories host billions of photographs. These content and metadata is increasingly being analyzed to draw inferences about human social behaviour, tourism and the urban environment. The pool of photographs is rapidly expanding, with around 2 million public photographs a day being uploaded to Flickr (Franck Citation2016) and 58 million per day to Instagram (StatisticBrain Citation2016). Naturally, as alternative photo publishing platforms within a commercial market ecosystem, the repositories differ in their focus and the dominant themes of the pictures published. A rough idea of these differences can be gained by investigating the optional text tags with which images are annotated. In terms of the images published which are shared publicly.

2.1.2. Flickr

Flickr leans towards the art and landscape side of photography, with numerous comments and discussions centred on the techniques used to capture or process images: trending tags often include topics such as ‘depth of field’ and ‘exposure’. Because of this focus, many of the submitted photographs address landscapes: the all-time most popular image tags include ‘sunset’, ‘water’, ‘sky’, ‘nature’, and ‘tree’. While these themes would appear very relevant to the recognition of landscape features and land cover, it is important to remember that these landscapes are frequently long shots that give little information about the location of the photographer (i.e. the actual geotag) and may be substantially processed.

2.1.3. Panoramio

Panoramio has a fundamentally spatial focus, since it specifically aims to showcase images attached to specific locations. The acquisition of Panoramio by Google in 2007 enhanced this function by embedding Panoramio pictures directly into Google Earth and Google Maps. Photo volumes in Panoramio are more modest; an estimated 93 million photos have been uploaded to the Panorank repository, but images are frequently deleted as users replace them with better versions, so the current number is probably lower. Daily estimates are not regularly monitored, but are estimated at between 20 and 40 thousand images per day; however, the frequency of geotagging is far higher than with Flickr photos. Tags, unsurprisingly, also frequently include concepts relevant to landscape features, such as ‘mountain(s)’, ‘nature’, ‘forest’, ‘river’, and ‘urban’.

2.1.4. Geograph

The Geograph Project is an initiative designed to collect representative images across a number of sampling regions, namely, Britain and Ireland, Germany and the Channel Islands. The goal is for participants to collect at least one image for every square kilometer of these regions. Photographs tend to include architectural features or characteristic landscapes, and location and view direction are associated with the picture, since it is obligatory to report the position of the viewer and of the subject.

2.1.5. Instagram

As a social media platform, which is increasingly used for viral marketing by companies and celebrities, Instagram’s trending tags are (at the time of writing at least) dominated by terms, such as ‘cute’, ‘selfie’, ‘fashion’ and ‘best friends’. While various APIs are available for consuming georeferenced Instagram feeds (for example, Esri’s GeoEvent connector) it was felt that for the purposes of this work, the streams of photos would require too much filtering, and so we confined the analysis to Flickr, Geograph and Panoramio. This also allowed us to evaluate our results against the work of Antoniou et al. (Citation2016) by re-evaluating a set of data previously analyzed in their work.

2.2. Deep learning

Efficient filtering and classification of photographs to extract information on land use or land cover require a computer program to understand abstract concepts related to the interpretation of scene content. By using neural networks (Schmidhuber Citation2015) it is possible for software to learn these rules from a training set, without having to handcraft features (i.e. to characterize the elements of a scene in mathematical form) and provide them as inputs to a model. Neural networks are used in contexts where it is necessary to derive a relationship between variables, and where there are some observations of that relationship which can be used to train the network. A neural network is trained by propagating input data through layers of nodes to produce output values, which are then compared to the ‘truth’ to assess the goodness-of-fit. The desired output values may be continuous or discrete. Neural networks are most commonly used to assign categorical labels on the basis of continuous multivariate input data − for example, to recognize a text character from a variety of metrics derived from a pixelated image. Weightings in intermediate nodes are used to transform the input values to output, and these are iteratively adjusted to optimize the fit of the model. This process, known as ‘back-propagation’, allows the characteristics of a specific data set to be learnt. Unless the training data is extensive, with good representation of all the types of features to be distinguished, it runs a high risk of ‘over-fitting’, so that the model cannot reliably be applied to novel data. Over-represented classes in the training data can act as ‘attractors’ and significantly bias the accuracy of the final model. This is an important constraint in the context of VGI, which tends to be spatially patchy and biased towards particular themes.

The hidden layer is instantiated with values (most often random) which are later refined through back-propagation. The layer is called ‘hidden’, since it is difficult to provide an interpretation of the weights and what the network has learned.

Once there are three or more hidden layers in the neural network, we usually consider it deep—hence the term ‘deep learning’. Fully connecting all neurons of one layer with all neurons of the next layer can lead to very complex optimization problems. For example, if an image with resolution 1000 by 1000 pixels is submitted to the network with one pixel value on each input, it effectively means 10⁶ values on input. Connecting the input layer with a hidden layer of the same size generates 10¹² parameters to optimize. Adding more layers not only will lead gargantuan complexity, but will also cause severe overfitting: the neural network will do an excellent job in handling the cases it was trained on, but very poorly on new data sets (Schmidhuber Citation2015; Srivastava et al. Citation2014).

Convolutional Neural Networks (CNNs) address these problems and offer significant improvements over previous approaches (Krizhevsky, Sutskever, and Hinton Citation2012). Their architecture takes into account the spatial structure present in images, and introduces between the layers of the network an additional series of ‘convolutional layers’, each focusing on a particular region of the image. In order to further improve efficiency, parameters are shared across the network. Thus, the detection of a particular type of feature (to take a simple example, a vertical edge), once ‘learnt’ in one region, can be detected wherever it occurs in the image. As with the visual cortex of many animals, there is some overlap between the regions into which the image is divided. This region-based approach still makes local connectivity between neurons much easier to maintain, and allows the network to learn increasingly higher levels of abstractions. This makes the layers much easier to train, while having good grounds in computer vision and exploiting the phenomenon of spatial autocorrelation in imagery: distant regions within images are rarely semantically connected, but salient feature types, such as the aforementioned vertical edges, can occur anywhere in an unfamiliar image. A remarkable feature of this class of machine learning algorithms is the ability to generalize well and significantly outperform other approaches when it comes to dealing with abstract problems (Krizhevsky, Sutskever, and Hinton Citation2012; Schmidhuber Citation2015). Teaching the neural network to recognize a feature such as forest simply requires that the algorithm is shown sufficient number of photographs depicting forest, without having to explicitly define what ‘forest’ is. What constitutes ‘enough’ depends strongly on the complexity and variety of what the program tries to learn, but will require at least a few hundred labelled images per class. The requirement for a large number of training examples, as well as the computational power required for processing, can present a significant challenge, and for this reason, we took advantage of a ready-made model, which is further described below.

2.3. Identifying land cover from photos

In recent years a number of interesting initiatives have involved volunteers in identifying land cover and land use from images. In some cases, these are images taken from space or from the air, for example, the Geowiki cropland mapping initiative which asked volunteers to solve conflicts in widely used classified land cover maps (Fritz et al. Citation2015), the identification of invasive species in Hawaiian forests, the assessment of disturbance in and around protected areas (Bastin et al. Citation2013) or the recent ‘gamification’ of validation of the GlobeLand30 product (Brovelli et al. Citation2016). This ‘view-from-above’ has parallels with the classic remote sensing approach to landscape characterization, but instead of relying on spectral signatures or backscatter characteristics, land cover and land use types are identified by their characteristic shapes and patterns, easily picked out by the human eye.

Less frequently, photographs taken at ground level are used to record or verify land cover and land use maps, and in these cases many other factors come into play: for example, the focal length, orientation and viewpoint of a photograph, the accuracy of its locational information and its currency (many users of photo-sharing platforms upload scanned postcards or historic photos). Antoniou et al. (Citation2016) analyzed the types of metadata that may be available associated to geo-tagged photographs, and which are available for Flickr, Panoramio and Geograph. Among these are orientation, date of upload and acquisition, focal length, tags, descriptions, titles and information about the photographer. The metadata required (mandatory and volunteered) varies according to the initiative and therefore, the metadata available for the photographs varies with their origin.

Many analyses which using volunteered photos use information other than the image itself: for example, parsing and using the associated tags to identify features of interest and delineating the areas (sometimes fuzzily defined) which users see as belonging to a particular named location (Gao et al. Citation2014; Li and Goodchild Citation2012) or see as attractive (Hu et al. Citation2015). On occasion, information about the user’s identity is used to map trajectories (Jankowski et al. Citation2010) and or identify “localness” in shared photos and tweets (Johnson et al. Citation2016). Antoniou et al. (Citation2016) analyzed the availability of tags, descriptions and titles in a set of 1000 photographs from each of Flickr, Panoramio and Geograph in the London area (corresponding to a total of 3000 photographs). The content of the harvested resources was not analyzed in that study, but only their availability and the number of available tags and words (for the descriptions and titles). The results show that for Geograph, only 34% of the photographs had tags, and this number increased to 70 and 79%, respectively, for Flickr and Panoramio, though the mean number of tags was smaller for Panoramio than for Flickr. This shows that the use of tags to identify the content of the photographs may leave out of the analysis a large number of photographs which could be useful to extract information. Therefore, methods that allow the analysis of the photographs themselves, instead of just the associated metadata, are useful.

Visual feature matching may be performed to identify landmarks (Kisilevich et al. Citation2010) or group photographs (Kennedy et al. Citation2007), but identification of land cover or human disturbance from the photos themselves is, at the time of writing, less frequently researched. Deep learning approaches are ever more widely used to generate maps from images: for example, the Facebook initiative to map settlement configurations across 20 countries using 14.6 billion DigitalGlobe images at 50 cm resolution (350 TB data), combined with census data (Gros and Tiecke Citation2016) or the work by Castelluccio et al. (Citation2015) to delineate land use types from the characteristic features which may be seen in detailed imagery. Albert, Kaur, and Gonzalez (Citation2017) have also recently successfully classified typologies of city neighbourhoods using deep learning approaches combined with satellite imagery from Google Maps.

However, the above initiatives rely on the classic plan that delineates features from above, leaving scope for extra information to be gained from images acquired at the ground level. The closest work to what is addressed in this paper is that from Leung and Newsam (Citation2015) who derived a classification of developed vs. undeveloped land for the UK, using photographs harvested from Flickr and Geograph, and (Zhu and Newsam Citation2015) − a campus mapping exercise which derived eight land use types from volunteered photographs in combination with a shapefile representing the zones on site.

In the study by Leung and Newsam (Citation2015), the challenge of providing enough labelled images to train the machine learning algorithm was resolved by inferring the label through natural language processing from a description provided by the user. As the authors note, user-supplied text for an individual image is often not sufficient to assign it to a class, and so 1 × 1 km tiles were used to group photographs, in order to model topics efficiently. The authors use handcrafted features, namely colour histogram, edge histogram and gist descriptors, to train their model for scene recognition. Zhu and Newsam (Citation2015), on the other hand, took the same approach to classification by using an off-the-shelf model, AlexNet (Krizhevsky, Sutskever, and Hinton Citation2012).

3. Specific approach for this study

In this work, we aimed to assess how far an off-the-shelf model which had been trained on a variety of potentially useful features could be adapted for our needs. The goal is to derive useful labels for a land cover or land use context without the need for an extensive gathering of ‘ground truth’, development of significant amounts of code, or heavy computational training of the network. An equally important goal is to assess and evaluate the limitations of this ‘off-the-shelf’ approach, and to try to characterize those contexts and photograph types where it is less reliable. In this way, we aim to derive some guidelines for best practice in the use of pre-trained models for specific use cases in the exploitation of volunteered photographs.

The CNN used in this study, Places205-AlexNet (Zhou et al. Citation2014), was trained by its authors on almost 2.5 million photographs, this allowed it to achieve 50% accuracy on identifying 205 “scene categories” (this term is explained in detail below). It should be noted that the choice was made purely on the wide availability of the pre-trained models for this neural network architecture and could be replaced with more accurate models. Training of the algorithm is an iterative process that usually a requires very large number of passes over the complete data set, making it a computationally expensive process which necessitates a huge set of labelled samples for training, analogous to the ‘ground truth’ of remote sensing classifications. In addition to this significant investment of resources, the creation of such a model requires expertise in computer vision and machine learning. For that reason, a number of studies, including this one, focus on retrofitting existing models, rather than building models from scratch.

Automated classification of photographs is highly relevant for those applications which involve volunteers in games or campaigns to identify interesting features from photos, since many photos are irrelevant, and simply presenting all available material runs the risk of boring volunteers and causing them to disengage. Ideally, we would like to filter photos in order to:

(1)	Identify photos which are irrelevant or non-useful, and discard them;
(2)	Identify images from which land cover/land use can be quickly and reliably identified (for example some types of built environment), harvest the labels and discard the photos from further analysis;
(3)	Identify candidate land covers in the vicinity of remaining images for verification by volunteers
(4)	Identify challenging and interesting photos that a user may enjoy deconstructing to extract more information than a machine can do.

The developed methodologies were applied to two study areas; one is the region used in Antoniou et al. (Citation2016) that is situated in an urban area of London, UK and the other is located northwest of Paris, France, in a region which covers part of central Paris (as far South as Notre Dame) but also extends Northwards to a region with low-density urban areas and predominance of agriculture and forest. Since the London area was predominated by built environment and man-made features, the Paris region was selected in order to extend the classification challenge to a wider variety of land covers and land uses.

Within this study we aimed to assess how far we could achieve several distinct goals, as follows:

(1)	Automating the identification of photographs which are useful for land cover classification.
(2)	Extracting any information which can be immediately derived about land cover/land use.
(3)	Relating the neural network outputs to an existing land cover classification, to assess how far accepted classes can be identified from image features.

We drew on past research by Antoniou et al. (Citation2016) and specifically aimed to replicate their rule-based classification of photograph usefulness with an off-the-shelf combination of tools and a simple allocation of weights to tags which could be performed by a domain expert with no particular computational experience. The goal was to achieve a comparable stratification of images with much less investment of expert time, since the original classification of usefulness involved 7 experts each classifying 3 thousand photographs − a slow and tedious task.

3.1. Selection and setting up of algorithm and model

The Places205-AlexNet neural network (Zhou et al. Citation2014) has nine layers: an input layer, seven hidden layers and an output layer. The output layer consists of 205 scene categories, such as abbey, bedroom or mountain; a full list is provided in Table . The model outputs a value for each category, indicating the probability that a photo belongs to a certain class. This capacity is the result of training on MIT’s Places database (http://places.csail.mit.edu/), a set of 2.5 million images, each labelled with a scene category. A novel image classified with this pre-trained network will usually belong to many categories, with varying scores. The last hidden layer of the neural network also provides valuable information about the photo content: a set of 4096 values that, in combination, form a ‘signature’ of the image. They represent high-level features of the image and it is from these values that scene categories are derived. A user can use this penultimate layer to build their own classifiers, and the authors of the Places205 network did just this, creating a set of 102 “scene attributes” which we have also used in our study. The scene attributes consist of classes like ‘ice’, ‘working’ or ‘trees’, with a full list to be found in Table . All of the mentioned values i.e. scene categories, scene attributes and the output of the last hidden layer, are used for classification in this work.

(1)	It is simple to apply, requiring only some investment of time by one or more experts and some simple post-processing in a spreadsheet;
(2)	In theory, it should be proof against overfitting, since the weights are assigned independently of any image training set;
(3)	It should allow land covers which are rare in the training set to be adequately identified, since a user has independently flagged the tags which they consider to be indicative of those land covers.
(4)	It allows derivation of a score for all images in the set, unlike a training/validation approach which requires a portion of the data to be set aside.

•	bi = Built environment, indoors.
•	b = Built environment, may be indoors or outdoors.
•	hf = Human feature (e.g. a bridge, railway line, fountain, windmill) May be placed in a natural landscape.
•	hl = Human land use (e.g. agriculture, gardens, golf course). Landscape may be vegetated but human influence is expected to affect a substantial area of the scene.
•	n = Natural environment (and note that u = unknown)

(1)	Find classes with score above predefined threshold. The threshold is calculated as nth percentile of the highest scoring class. For our experiments, we used the 70th, 80th and 95th percentiles. The particular values were selected arbitrarily as means to tune the method.
(2)	If ‘bi’ class was found, ‘b’ was added, as ‘built environment, indoors’ is a subset of ‘built environment’.
(3)	If ‘b’ class was found, ‘hf’ was added, since buildings and other features characteristic of the urban environment are man-made features.

Repurposing a deep learning network to filter and classify volunteered photographs for land cover and land use characterization

Abstract

1. Introduction

2. Related work

2.1. Land cover and land use mapping—the context

2.1.1. Volunteered photos—the context

2.1.2. Flickr

2.1.3. Panoramio

2.1.4. Geograph

2.1.5. Instagram

2.2. Deep learning

2.3. Identifying land cover from photos

3. Specific approach for this study

3.1. Selection and setting up of algorithm and model

Table 1. Scene categories from the Places205 project (Zhou et al. Citation2014).

Table 2. Scene attributes from the Places205 project (Zhou et al. Citation2014).

Table 3. Rules used to assist in the classification of the photographs as useful, from Antoniou et al. (Citation2016).

Table 4. UA classes of levels 2 and 4 (European Union, 2011, https://cws-download.eea.europa.eu/local/ua2006/Urban_Atlas_2006_mapping_guide_v2_final.pdf).

Table 5. Frequency of UA level 2 classes in the data set at the specific location of each photo (point) and in the area around it (buffer).

Table 6. Frequency of UA level 4 classes in the data set at the specific location of each photo (point) and in the area around it (buffer).

3.2. Result analysis

3.2.1. Goal 1: identifying human impact in a landscape

Table 7. Confusion matrix, precision, recall and F1-score produced on a stratified test set, with number of photographs equal to 25% of all photographs in the Paris data set (the remainder were used for training the model) for the DT approach.

Table 8. Average precision, recall, and F1-score for the DT approach.

3.2.2. Goal 2: filtering photos by usefulness based on perceived land cover

Table 9. Confusion matrix, and the precision, recall, and F1-score for the “usefulness” prediction with the DT approach.

Table 10. Confusion matrix, and the precision, recall, and F1-score for the ‘usefulness’ prediction with the DT approach by tuning the weights such that we limit the number of false positives (increasing precision at the expense of recall).

Table 11. Confusion matrix, precision, recall and F1-score for the “usefulness” prediction with the UW approach.

3.2.3. Goal 3: identifying land cover as defined by UA

Table 12. Confusion matrix for UA level 2 and the class prediction based on the scene attributes.

Table 14. Average unweighted and weighted average of the precision, recall, and F1-score for the level 2 UA classes presented in Table 11.

3.2.4. Goal 4: identifying UA land cover classes in the surrounding area

Table 15. Prediction accuracy of land cover with the DT approach for levels 2 and 4 of UA classes within 20, 50, and 100 m buffers defined around the photographs’ locations.

3.3. Discussion

4. Conclusions

Funding

Notes on contributors

Acknowledgments

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date