Abstract
During the past three decades a large body of research has investigated the problem of specifying class intervals for choropleth maps. This work, however, has focused almost exclusively on placing observations in quasi-continuous data distributions into ordinal bins along the number line. All enumeration units that fall into each bin are then assigned an areal symbol that is used to create the choropleth map. The geographical characteristics of the data are only indirectly considered by such approaches to classification. In this article, we design, implement, and evaluate a new approach to classification that places class-interval selection into a multicriteria framework. In this framework, we consider not only number–line relationships, but also the area covered by each class, the fragmentation of the resulting classifications, and the degree to which they are spatially autocorrelated. This task is accomplished through the use of a genetic algorithm that creates optimal classifications with respect to multiple criteria. These results can be evaluated and a selection of one or more classifications can be made based on the goals of the cartographer. An interactive software tool to support classification decisions is also designed and described.
Key Words:
Acknowledgments
We wish to thank Ronghai Sa, R. Rajagopal, and the reviewers for their comments on previous drafts of this article.
Notes
Note: Only one author discusses geographically informed approaches, while four of the five texts mention or describe the “optimal” approach.
aThe previous edition of this title was published in 1984 and contains a more elaborate discussion of both the “optimal” and ideas related to the consideration of geographical characteristics: it discusses a geographical quantiles (equalization of area in each class) approach.
aThe objectives used for a subpopulation are indicated by a binary string, where a 1 on i-th element means that the i-th objective is applied, and otherwise it is 0. For example, 0111 means this subpopulation is specialized to find solutions with respect to the first, second, and third objectives.
bA positive integer indicates the index of the destination subpopulation, and –1 means to randomly migrate to all other subpopulations.
Note: In all cases considered, the GA found the best solution, obtained either by brute-force enumeration (IL59 and 5State90) or by Fisher's algorithm (USA90).
aFor the IL59 dataset, the “best” TAI value in CitationJenks and Caspall (1971) is 0.73455, which has classes of (15.57–41.20, 41.21–58.50, 58.51–75.51, 75.52–100.10, 100.10–155.30). In terms of EVF, this classification yields 1-GVF=0.705823. The intervals found by our GA are (15.57–41.20, 41.21–60.66, 60.67–77.29, 77.30–100.10, 100.10–155.30).
bFisher's algorithm is used to obtain the optimal values for EVF (CitationFisher 1958). Many implementations of this algorithm are available (see CitationHartigan 1975; CitationLindberg 1990); we used a Fortran program provided by Hartigan (1975, 130–42). For the small (IL59) and medium (5State90) datasets, the results from Fisher's algorithm are identical to the results found by using brute force enumerations.
TGA=time in seconds used to compute GA with the configuration listed in .
TES=time in seconds used to complete an exhaustive search.
TMC=time in seconds used to compute 10,000 Monte Carlo solutions.
aEstimated value; see text for calculation.