Abstract
Many factors influence the quality and value of a classification accuracy assessment and evaluation programme. This paper focuses on the size of the testing set(s) used with particular regard to the impacts on accuracy assessment and comparison. Testing set size is important as the use of an inappropriately large or small sample could lead to limited and sometimes erroneous assessments of accuracy and of differences in accuracy. Here, some of the basic statistical principles of sample size determination are outlined, including a discussion of Type II errors and their control. The paper provides a discussion on some of the basic issues of sample size determination for accuracy assessment and includes factors linked to accuracy comparison. With the latter, the researcher should specify the effect size (minimum meaningful difference in accuracy), significance level and power used in an analysis and ideally also fit confidence limits to derived estimates. This will help design a study and aid the use of appropriate sample sizes, as well as facilitate interpretation of results. In particular, it will help avoid problems, such as under-powered analyses, and provide a richer information base for classification evaluation. The paper includes equations that could be used to determine sample sizes for common applications in remote sensing, using both independent and related samples.
Acknowledgements
This is an expanded version of a paper presented at the ‘Accuracy 2008’ conference in Shanghai, July 2008. I am grateful to the referees for their helpful comments that helped to enhance this article.