339
Views
15
CrossRef citations to date
0
Altmetric
Technical Note

Impact of reference datasets and autocorrelation on classification accuracy

, &
Pages 5321-5330 | Received 31 Jan 2008, Accepted 28 Sep 2009, Published online: 12 Jul 2011
 

Abstract

Reference data and accuracy assessments via error matrices build the foundation for measuring success of classifications. An error matrix is often based on the traditional holdout method that utilizes only one training/test dataset. If the training/test dataset does not fully represent the variability in a population, accuracy may be over – or under – estimated. Furthermore, reference data may be flawed by spatial errors or autocorrelation that may lead to overoptimistic results. For a forest study we first corrected spatially erroneous ground data and then used aerial photography to sample additional reference data around the field-sampled plots (Mannel et al. Citation2006). These reference data were used to classify forest cover and subsequently determine classification success. Cross-validation randomly separates datasets into several training/test sets and is well documented to perform a more precise accuracy measure than the traditional holdout method. However, random cross-validation of autocorrelated data may overestimate accuracy, which in our case was between 5% and 8% for a 90% confidence interval. In addition, we observed accuracies differing by up to 35% for different land cover classes depending on which training/test datasets were used. The observed discrepancies illustrate the need for paying attention to autocorrelation and utilizing more than one permanent training/test dataset, for example, through a k-fold holdout method.1

Now at: Cottey College, 6000 W. Austin, Nevada, MO 64772, USA.

Acknowledgements

The authors would like to acknowledge the efforts of Mark Rumble and the USDA Rocky Mountain Forest Service Research Station for providing vegetation data and support with the reference data collection, Bruce Wylie and Chengquan Huang for assistance with the decision tree analysis, and Doug Baldwin and Krystal Price for collecting and processing field data. Furthermore, we would like to thank Keith Weber for his review and suggestions and Teri Peterson for her help with the statistical analysis. This research was supported by the National Aeronautics and Space Administration (NASA) Food and Fiber Applications of Remote Sensing Program, grant NAG13-99021, South Dakota School of Mines and Technology, National Science Foundation/ Experimental Program to Stimulate Competitive Research (EPSCoR) grant EPS-0091948, EPS-9720642 and the South Dakota Space Grant Consortium.

Notes

Now at: Cottey College, 6000 W. Austin, Nevada, MO 64772, USA.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 61.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 689.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.