593
Views
4
CrossRef citations to date
0
Altmetric
Research Article

Application of feature selection methods and machine learning algorithms for saltmarsh biomass estimation using Worldview-2 imagery

, , , &
Pages 1075-1099 | Received 21 Jan 2019, Accepted 20 May 2019, Published online: 11 Jun 2019
 

Abstract

Assessing large scale plant productivity of coastal marshes is essential to understand the resilience of these systems to climate change. Two machine learning approaches, random forest (RF) and support vector machine (SVM) regression were tested to estimate biomass of a common saltmarshes species, salt couch grass (Sporobolus virginicus). Reflectance and vegetation indices derived from 8 bands of Worldview-2 multispectral data were used for four experiments to develop the biomass model. These four experiments were, Experiment-1: 8 bands of Worldview-2 image, Experiment-2: Possible combination of all bands of Worldview-2 for Normalized Difference Vegetation Index (NDVI) type vegetation indices, Experiment-3: Combination of bands and vegetation indices, Experiment-4: Selected variables derived from experiment-3 using variable selection methods. The main objectives of this study are (i) to recommend an affordable low cost data source to predict biomass of a common saltmarshes species, (ii) to suggest a variable selection method suitable for multispectral data, (iii) to assess the performance of RF and SVM for the biomass prediction model. Cross-validation of parameter optimizations for SVM showed that optimized parameter of ɛ-SVR failed to provide a reliable prediction. Hence, ν-SVR was used for the SVM model. Among the different variable selection methods, recursive feature elimination (RFE) selected a minimum number of variables (only 4) with an RMSE of 0.211 (kg/m2). Experiment-4 (only selected bands) provided the best results for both of the machine learning regression methods, RF (R2= 0.72, RMSE= 0.166 kg/m2) and SVR (R2= 0.66, RMSE = 0.200 kg/m2) to predict biomass. When a 10-fold cross validation of the RF model was compared with a 10-fold cross validation of SVR, a significant difference (p = <0.0001) was observed for RMSE. One to one comparisons of actual to predicted biomass showed that RF underestimates the high biomass values, whereas SVR overestimates the values; this suggests a need for further investigation and refinement.

Acknowledgements

The author would like to greatly thank the Department of Environment and Heritage, NSW, Jeffrey Kellaway and his students of ENVS 270 to help in part of biomass sample collection.

Disclosure statement

No potential conflict of interest was reported by the authors.

Funding

Sikdar M. M. Rasel is supported during this study by a grant [grant no 43500870] from International Macquarie Research Fund, Macquarie University, Australia.

Glossary

Cost (C): A standard SVM seeks to find a margin that separates all positive and negative examples. Sometimes this can lead to poorly fit models if any examples are mislabelled or extremely unusual. Cortes and Vapnik (Citation1995) proposed the idea of a ‘soft margin’ SVM that allows some examples to be ‘ignored’ or placed on the wrong side of the margin. C is the parameter for the soft margin cost function, which controls the influence of each individual support vector; this process involves trading error penalty for stability.

Epsilon SVR (ε-SVR): However, in ϵ-SVR you have no control on how many data vectors from the dataset become support vectors, it could be a few, it could be many. Nonetheless, you will have total control of how much error you will allow your model to have, and anything beyond the specified ϵ will be penalized in proportion to C, which is the regularization parameter.

Gamma: The gamma parameter is the inverse of the standard deviation of the RBF (Radial Bias Function) kernel, which is used as similarity measure between two points. It defines how far the influence of a single training example reaches, where low values indicates ‘far’ and high values indicates ‘close’. A small gamma value define a Gaussian function with a large variance.

Mtry: Mtry is the number of variables available for splitting at each tree node. For classification models, the default is the square root of the number of predictor variables. For regression model, this value need to be fine-tuned based on the training data set.

νu (ν) Support vector regression: In ν-SVR, the parameter ν is used to determine the proportion of the number of support vectors user desires to keep in the solution with respect to the total number of samples in the dataset. In ν-SVR the parameter ϵ is introduced into the optimization problem formulation and it is estimated automatically (optimally) for the user.

Out-of-bag (OOB) error: Bagging is the process of taking bootstraps & then aggregating the models learned on each bootstrap. OOB estimate for the generalization error is the error rate of the OOB classifier on the training set. OOB ERROR is the rate of whole predictive model error rate where less is more accurate in prediction.

Recursive feature elimination (RFE): Recursive feature elimination (RFE) is a feature selection method that fits a model and removes the weakest feature (or features) until the specified number of features is reached.

Variable selection using random forest (VSURF): This R package VSURF returns two subsets of variables to deal with both classification and regression problem. The first subset is a subset of important variables including some redundancy, and the second one is a smaller subset corresponding to a model trying to avoid redundancy focussing more closely on prediction objective.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access
  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart
* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.