Abstract
The family of Kappa indices of agreement claim to compare a map's observed classification accuracy relative to the expected accuracy of baseline maps that can have two types of randomness: (1) random distribution of the quantity of each category and (2) random spatial allocation of the categories. Use of the Kappa indices has become part of the culture in remote sensing and other fields. This article examines five different Kappa indices, some of which were derived by the first author in 2000. We expose the indices' properties mathematically and illustrate their limitations graphically, with emphasis on Kappa's use of randomness as a baseline, and the often-ignored conversion from an observed sample matrix to the estimated population matrix. This article concludes that these Kappa indices are useless, misleading and/or flawed for the practical applications in remote sensing that we have seen. After more than a decade of working with these indices, we recommend that the profession abandon the use of Kappa indices for purposes of accuracy assessment and map comparison, and instead summarize the cross-tabulation matrix with two much simpler summary parameters: quantity disagreement and allocation disagreement. This article shows how to compute these two parameters using examples taken from peer-reviewed literature.
Acknowledgements
The United States' National Science Foundation (NSF) supported this work through its Coupled Natural Human Systems program via grant BCS-0709685. NSF supplied additional funding through its Long Term Ecological Research network via grant OCE-0423565 and a supplemental grant DEB-0620579. Any opinions, findings, conclusions or recommendation expressed in this article are those of the authors and do not necessarily reflect those of the funders. Clark Labs produced the GIS software Idrisi, which computes the two components of disagreement that this article endorses. Anonymous reviewers supplied constructive feedback that helped to improve this article.