766
Views
8
CrossRef citations to date
0
Altmetric
Original Articles

Land cover classification of the Lake of the Woods/Rainy River Basin by object-based image analysis of Landsat and lidar data

&

ABSTRACT

Olmanson LG, Bauer ME. 2017. Land cover classification of the Lake of the Woods/Rainy River Basin by object-based image analysis of Landsat and lidar data. Lake Reserv Manage. 33:335–346.

The recent availability of lidar data throughout Minnesota, USA has opened up many opportunities for improved land cover classification and mapping. To integrate spectral and spatial information from Landsat imagery and lidar point cloud and topographic metrics, we utilized object-based image analysis (OBIA) with random forest classification. By classifying objects instead of pixels, we were able to use multispectral data along with spatial and contextual information of objects such as shape, size, texture, and lidar-derived metrics to distinguish different land cover types. These methods were used to create land cover maps and land cover change maps for the ∼1990 and ∼2010 time periods of the Lake of the Woods/Rainy River Basin for use as inputs to hydrologic models and analyses of land cover and land cover change. The overall accuracy for the general level 1 classification was over 95% and over 90% for the more detailed level 2 classification. The basin is dominated by forests, wetlands, and lakes that comprise 96.3% of the basin. Developed areas had a slight increase of 2650 ha (2.9%) from 1990 to 2010 at the basin level. The primary changes were due to forest disturbance from harvesting and fire and regeneration of the forest in disturbed areas. While areas where forests have been disturbed changed between the time periods, there was also an increase of forest disturbance to 6.5% of the basin in 2010 from 5.2% in 1990. There were no changes detected from 1990 to 2010 for 88% of the basin.

With a perceived increase in the frequency and intensity of cyanobacterial algal blooms in Lake of the Woods (LOW), there has been an increased effort to collect information about the nature of algal blooms, nutrient concentrations, and sources of nutrients to the LOW (DeSellas et al. Citation2009). As part of this effort, land cover maps of the Lake of the Woods/Rainy River Basin () that are consistent across the USA and Canadian International border are needed as inputs to hydrologic models and analyses of land use and land use change.

Figure 1. The Lake of the Woods watershed shown using European Space Agency MERIS imagery from 25 Aug 2008.

Figure 1. The Lake of the Woods watershed shown using European Space Agency MERIS imagery from 25 Aug 2008.

Historically, remote sensing in the form of aerial photography has been an important source of land use–land cover information. However, aerial photography acquisition and interpretation and subsequent digitization of cover types is prohibitively expensive for large geographic areas. An alternative is to acquire the needed information from digital satellite imagery such as from the Landsat Thematic Mapper. This approach has several advantages: (1) the synoptic view of the sensor provides coverage of large geographic areas (e.g., an individual image covers 106 × 114 miles), (2) the digital form of the data lends itself to more efficient analysis, (3) the classified data are compatible with geographic information systems, and (4) land cover maps can be generated at considerably less cost than by other methods (albeit at 30 m spatial resolution).

This project used a combination of multitemporal Landsat imagery, lidar data (Minnesota), and object-based image analysis to cover the entire extent of the Lake of the Woods/Rainy River Basin using consistent methods for the ∼1990 and ∼2010 time periods so that land cover and changes over time can be quantified and used for hydrologic modeling.

Methods

Landsat data acquisition and processing

To complete the land cover classification for 1990 and 2010 time periods for the entire 70,000 km2 Lake of the Woods/Rainy River Basin, 44 Landsat images were used. These images included Landsat 5 Thematic Mapper images from paths 26, 27, 28, and 29, and rows 25, 26, and 27, and two Landsat 8 Operational Land Imager (OLI) images from path 29 (). The images were selected from different seasons and vegetation development stages to distinguish different kinds of vegetation and other cover types. Further information about Landsat data is available at http://landsat.usgs.gov/.

Figure 2. Multitemporal Landsat paths of images (3-band false color composite) used for land cover classification.

Figure 2. Multitemporal Landsat paths of images (3-band false color composite) used for land cover classification.

A flow chart () illustrates the image processing and classification approach discussed below. With multiple dates of imagery per Landsat path (), each with 7 to 10 spectral bands, it is useful to compress the images using the principal components. Principal component analysis (PCA) is a mathematical procedure that uses an orthogonal transformation to convert observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components (Jensen Citation2005). Eight to ten independent principal components were derived for each path of imagery. An example of the first 4 are shown, along with a composite image of the first 3 that clearly shows separability of major cover type classes ().

Figure 3. Flowchart of image processing and classification approach.

Figure 3. Flowchart of image processing and classification approach.

Figure 4. Principal components 1–4 and composite image of first 3 principal components of the International Falls, Minnesota, USA/Fort Frances, Ontario, Canada area.

Figure 4. Principal components 1–4 and composite image of first 3 principal components of the International Falls, Minnesota, USA/Fort Frances, Ontario, Canada area.

Additional transformations of the Landsat data used were the “tasseled cap transformation” (T-Cap; Jensen Citation2005). The first component is brightness, which is related to the amplitude of responses. The second, orthogonal to brightness, is known as greenness because of its sensitivity to the amount of green vegetation. The third, called wetness, is related to moisture content. Another useful and frequently used transformation is the normalized difference vegetation index (NDVI), which is the ratio of the difference to the sum of the near infrared and red spectral band responses (Jensen Citation2005).

Lidar data acquisition and processing

Lidar data acquired for the Minnesota portion of the basin provided additional information on height and elevation. Lidar LAS files were acquired from the Minnesota Department of Natural Resources (DNR) for the tiles within the areas covering the Minnesota portion of the Lake of the Woods/Rainy River Basin. The LAS tiles were used to generate mean and maximum vegetation height rasters at 20 m spatial resolution (). The DNR-provided 1 m bare earth Digital Elevation Model (DEM) was also used to create additional lidar-derivative layers at 10 m spatial resolution, including the compound topographic index (CTI), slope, and dissection (Evans Citation1972). These DEM-derived variables were especially useful for wetland identification. More information about the lidar data is available at http://www.mngeo.state.mn.us/chouse/elevation/lidar.html.

Figure 5. Lidar derived variables.

Figure 5. Lidar derived variables.

Additional GIS maps and data

Additional GIS layers used for image segmentation included major roads, other roads, and railroads. These layers were acquired from the Minnesota Department of Transportation, Ontario Ministry of Natural Resources and Forestry, and Manitoba Infrastructure Department and were combined to cover the entire basin. Once the classification was finalized and accuracies were calculated these layers were also overlaid on the maps so that roads narrower than the 30 m Landsat data could be included in the 2010 map. For the 1990 map an edited version of the roads layer was used which excluded roads that were identified in the imagery as being constructed since 1990.

Classification scheme and reference data

A critical element of any image classification project is the classification scheme—a systematic listing of the classes of interest. It should be exhaustive (there is a class for everything), mutually exclusive (each cover type is a member of only one class), and hierarchical (so that more detailed classes (e.g., level 2) can be collapsed into more general classes (e.g., level 1). The classes are similar to those used for previous classifications in Minnesota, although modified to reflect the unique characteristics of the Lake of the Woods/Rainy River Basin ().

Table 1. Classification scheme with level 1 and 2 classes.

A second critical aspect of successfully classifying remote sensing data is the availability of accurate reference data that can be used to associate land cover/land use classes with the spectral–radiometric–temporal classes from the imagery and for accuracy assessment. Reference data used for classifier training and accuracy assessment were created by identifying objects of representative land cover types using the Landsat imagery and derivatives, lidar derivatives, and high-resolution aerial imagery. High-resolution aerial imagery was available through Google Earth, Bing Maps, and for the Minnesota portion the MnGeo Geospatial Image Service (http://www.mngeo.state.mn.us/chouse/wms/geo_image_server.html), which distributes ortho-rectified aerial digital imagery, particularly from the USDA National Agricultural Program Imagery (NAIP), 1-meter, 3-band natural color and 4-band (including near-infrared) summer and some spring leaf-off imagery. An example of NAIP imagery with examples of different level 1 cover type classes is presented (). For classes, such as agriculture and wetlands, ancillary datasets were also used to identify characteristic areas for training.

Figure 6. NAIP image with examples of cover type classes.

Figure 6. NAIP image with examples of cover type classes.

Image classification

To create land cover maps for 1990 and 2010 for change detection, we classified the periods simultaneously. By using this approach and identifying change between the periods, areas that did not change are the same for both maps, and this reduced the error for change detection by not compounding errors for 2 classifications. Independent classifications would have different errors in different locations, and these errors will be compounded when used for change detection because locations classified as different land cover types that actually did not change will be included as errors.

Previous Minnesota land cover classifications have used the maximum likelihood classifier based on the spectral responses of individual pixels (Yuan et al. Citation2005a, Citation2005b). Although multispectral and temporal information was integrated by including 2 or 3 image dates, spatial information is not included with pixel-level classifications. More recently object-based image analysis (OBIA; Platt and Rapoza Citation2008, Blaschke Citation2010) has become the standard method for classification of high-resolution imagery, but it can also be effectively used for moderate resolution data such as Landsat (Bauer et al. Citation2013). There is much more information available when the spatial information in imagery is considered and it generally increases the classification accuracy. Objects enable taking advantage of all the elements of image interpretation, particularly spatial information, including shape, size, pattern, texture, and context. Context is especially useful. Humans intuitively integrate “pixels” into objects and use contextual relationships to interpret images and draw intelligent inferences from them.

The OBIA approach using eCognition (http://www.ecognition.com/), the leading OBIA software system, includes 3 main steps: (1) segmentation of the image into objects, (2) extraction of the object features, and (3) classification of the objects.

Segmentation

The imagery is first segmented into objects with similar pixels based on the spatial, as well as the spectral–radiometric (color), attributes. Segmentation primarily uses spectral information about individual pixels in the imagery to combine them into larger image objects or segments. As an example, individual pixels which comprise a crop field with similar spectral response values are combined to form an image object that represents the field. Other scaling information can be specified to regulate the size range of the desired objects. The goal of segmentation is to minimize within-object heterogeneity and maximize the variance among objects, subject to user-defined parameters. A scale parameter is specified to control the size of objects and there can be a nested hierarchy of objects with bigger objects containing smaller objects. Examples of segmentation results are presented ( and ).

Figure 7. Segmentation objects of uplands and wetlands using principal components and major roads.

Figure 7. Segmentation objects of uplands and wetlands using principal components and major roads.

Figure 8. Objects over 2008 NAIP false color imagery for the same area as in .

Figure 8. Objects over 2008 NAIP false color imagery for the same area as in Figure 7.

Since we wanted to utilize the same segmentations for 1990 and 2010 we determined that using the concentrated spectral data in the PCA image created using Landsat data from both time periods for each Landsat path along with the roads and railway layer produced the best segmentations. These segmentations () were suitable for both time periods and could be used to classify change between 1990 and 2010.

Extraction of object features

Once image objects were created, a large number (>200) of features could be derived and potentially used for classification. The primary features included spectral data from the multitemporal imagery (including means, modes, quantiles, and standard deviations of individual bands and several transformations), spatial and geometric features (including asymmetry, compactness, density, rectangular fit, roundness, and shape index), and texture (including homogeneity and dissimilarity).

Classification

To decrease the error between the 1990 and 2010 classification we classified the time periods together and added classes to identify areas of change. These included areas that were developed, changes in agriculture, barrier and shore land areas that eroded away or were created, and forested areas that were harvested/burned or regenerated. With 4 Landsat paths of imagery each had to be classified separately since images from different dates would have different vegetation phenologies, which would increase classification error. Also, to take advantage of the lidar data that were available in Minnesota but not Ontario or Manitoba, the Minnesota sections of each Landsat path were classified separately. To cover the entire basin using the best available data sources, 8 separate classifications were completed and mosaicked to create the final land cover maps and change map.

Random forest, a state-of-the-art approach which can handle and take advantage of a large number of features, was used for the classification of the objects. It is an ensemble learning method for classification that operates by constructing multiple decision trees. Each tree is grown from different random subsamples of the training data and during the split selection process by using a subsample of the available features. It allows for the use of a large number of features or variables and identifies the important predictors (Liaw and Wiener Citation2002).

The Gini index, a measure of entropy (Gini Citation1997), was used to compute how often a particular variable was used and how “early” it was used in the trees in the forest. A higher Gini score indicates a more influential variable. The contributions of the most important variables to the classification of wetlands, forest, developed, and agriculture with and without lidar data indicate that the utilization of the lidar data substantially improved classification of wetlands and forest. This is potentially due to the improved separation of forested wetlands from upland forest, which is important since forested wetlands and forest cover a large percentage of the Lake of the Woods/Rainy River Basin. Having lidar data also improved the classification of agricultural areas somewhat potentially due to artificial drainage and the farming of areas that were historically wetlands and improved separation of wetlands from grassland areas. Lidar data did not appear to improve the separation of developed areas from other classes. Building footprints that were created from the lidar point cloud could improve separation and were utilized in Bauer et al. (Citation2013), however they were deemed to be of inconsistent quality over this region and thus were not used.

Accuracy assessment

A key part of the project is accuracy assessment. We evaluated classification accuracy by comparing the classification results to an independent stratified (by class) random reference sample of 6610 objects (20% of the reference data that were withheld from classifier training) and reporting the error matrix and statistics derived from it including overall accuracy and user and producer accuracies (Congalton and Green Citation2009, Foody Citation2002).

Results

A legend of the land cover maps was created for the Lake of the Woods/Rainy River Basin for 1990 and 2010 (). The final 1990 and 2010 level 2 classifications are presented ( and , respectively). Qualitatively, the Landsat/lidar land cover classifications show a high correspondence with the aerial imagery.

Figure 9. Lake of the Woods/Rainy River Basin level 2 land cover classification legend.

Figure 9. Lake of the Woods/Rainy River Basin level 2 land cover classification legend.

Figure 10. 1990 Lake of the Woods/Rainy River Basin level 2 land cover classification. The legend is shown in .

Figure 10. 1990 Lake of the Woods/Rainy River Basin level 2 land cover classification. The legend is shown in Figure 9.

Figure 11. 2010 Lake of the Woods/Rainy River Basin level 2 land cover classification. The legend is shown in .

Figure 11. 2010 Lake of the Woods/Rainy River Basin level 2 land cover classification. The legend is shown in Figure 9.

Quantitative assessment of the agreement between the validation data and the classification for levels 1 and 2 with and without lidar data by Landsat path is presented (). Example error matrices for level 1 with and without lidar data are shown for 1990 ( and ) and 2010 ( and ). On average, all are greater than 90% accurate, with higher accuracy (>95%) for the fewer, more general level 1 classes. The difference, however, is smaller than our previous experience with per-pixel maximum likelihood classifications (Yuan et al. Citation2005a, Citation2005b). We attribute this to the use of the object-based classification, random forest classifier, and the inclusion of lidar data features. The Minnesota classification using lidar data had 1.5% higher accuracy than the Canadian classification without lidar, but the increase was less than expected. Having classification accuracy that is uniform across the basin with and without lidar data is important for consistency as the data are used for hydrologic modeling and other purposes.

Table 2. Overall accuracies for level 1 and 2 classifications by Landsat path with and without lidar data.

Table 3. Classification error matrix and accuracies for 1990 level 1 with lidar data.

Table 4. Classification error matrix and accuracies for 1990 level 1 without lidar data.

Table 5. Classification error matrix and accuracies for 2010 level 1 with lidar data.

Table 6. Classification error matrix and accuracies for 2010 level 1 without lidar data.

With the finalized maps, we calculated percent of area by class at the sub-basin and total basin for 1990 and 2010. The Lake of the Woods/Rainy River Basin is dominated by forest, wetlands, and lakes, which comprise 96.3% of the basin. Developed areas increased by 2650 ha, an increase of 2.9% from 1990 to 2010, which is not significant at the basin level and is less than more developed areas of Minnesota. The largest changes were due to forest harvesting, which is most apparent in the land cover maps ( and ) and the land change map (). There were no changes from 1990 to 2010 for ∼88% of the basin. While forests and wetland forests that have been harvested changed between the time periods there was also an increase to 6.5% of the basin in 2010 from 5.2% in 1990.

Figure 12. Land cover change from 1990 to 2010 for Lake of the Woods/Rainy River Basin the with sub-basin boundaries.

Figure 12. Land cover change from 1990 to 2010 for Lake of the Woods/Rainy River Basin the with sub-basin boundaries.

Discussion

For accurate land cover/use maps using remote sensing it is important to have clear imagery that is not affected by haze and cloud cover that can change the spectroradiometric response and reduce classification accuracy. For this project, we were able to find clear imagery within 1 yr of our target of 1990 and 4 yr for our target of 2010 for spring and late summer. Imagery from other seasons was more limited with only one clear early summer and one fall image available for the area (). This illustrates the point that for areas where cloud cover is more persistent the relatively low 16 d temporal resolution of the Landsat satellites is too low to acquire clear imagery for any given year. It should be noted that Landsat 7 imagery was not utilized for this project due to the missing data lines caused by the malfunction of the scan line corrector (SLC), which limits its functionality for land cover mapping. The significant advancements of Landsat 8 and the new Landsat 9 and the European Space Agency (ESA) Sentinel-2 satellites, which have improvedspatial, spectral, radiometric, and temporal resolution (every 3–5 d), will greatly enhance the capabilities to map land cover in the near future.

As previously mentioned, the primary reasons these maps were created was to have an accurate, consistent land cover/use map for the entire basin for hydrologic modeling and analyses of land cover and land cover change. Having accurate land cover data enables estimation of pollution loads where direct measurement of pollution is impractical. Direct measurement of nutrient loadings to waterbodies is difficult, expensive, and time-consuming, especially in complicated remote watersheds like the Lake of the Woods/Rainy River Basin. Consequently, estimates of nutrient loadings, especially for catchment areas with ungauged tributaries or with diffuse drainage (rather than well-defined streams), often are made indirectly by determining the watershed sub-areas containing various kinds of land cover conditions (e.g., regenerating forest, forest, row-crop agriculture, pasture, urban land) and multiplying these areas by a nutrient (P or N) export coefficient. The latter coefficients, obtained from the literature, are based on values from detailed measurements of flows and concentrations.

Conclusions

A combination of multi-temporal Landsat data which provided synoptic views of the entire area and lidar data for the Minnesota portion along with object-based image analysis and the random forest classifier enabled accurate level 1 and 2 land cover classifications for the Lake of the Woods/Rainy River Basin for 1990 and 2010. The digital format of the classifications makes it possible to easily include them with other digital maps and data in a GIS for further analysis and hydrologic modeling. The classification results are being used for hydrologic modeling of the basin and are available in a web-based format that allows users to zoom into areas of interest and compare the classifications to recent high resolution imagery. The data can also be downloaded in a raster format at http://portal.rs.umn.edu.

Acknowledgments

We thank Nolan Baratono, Jesse Anderson, and Cary Hernandez for project assistance and acknowledge support from the Clean Water Fund administered by the Minnesota Pollution Control Agency, the U-Spatial: Spatial Science and Systems Infrastructure project at the University of Minnesota, and the Minnesota Agricultural Experiment Station.

References

  • Bauer M, Knight J, Olmanson L, Voth M, Dunsmoor J. 2013. Twin Cities metropolitan area land cover and impervious surface area classification by remote sensing: 2011 update. final report to the Metropolitan Council. St. Paul (MN).
  • Blaschke T. 2010. Object based image analysis for remote sensing. ISPRS J Photogramm. 65:2–16.
  • Congalton R, and K. Green. 2009. Assessing the accuracy of remotely sensed data: principles and practices. 2nd ed. Boca Raton (FL): CRC/Taylor & Francis.
  • DeSellas AM, Paterson AM, Clark BJ, Baratono NG, Sellers TJ. 2009. State of the Basin report: for the Lake of the Woods and Rainy River Basin. Prepared in cooperation with Lake of the Woods Water Sustainability Foundation, Ontario Ministry of the Environment, Environment Canada, and Minnesota Pollution Control Agency.
  • Evans IS. 1972. General geomorphometry, derivatives of altitude, and descriptive statistics. In: Chorley RJ. Spatial analysis in geomorphology. New York (NY): Harper & Row. pp. 17–90.
  • Foody G. 2002. Status of land cover classification accuracy assessment. Remote Sens Environ. 80:185–201.
  • Gini C. 1997. Concentration and dependency ratios. Riv Polit Econ. 87:769–789.
  • Jensen J R. 2005. Introductory digital image processing. A remote sensing perspective. 3rd ed. Upper Saddle River (NJ): Prentice-Hall.
  • Liaw A, Wiener M. 2002. Classification and regression by randomForest. R News 2/3:18–22. Available from http://CRAN.R-project.org/doc/Rnews/.
  • Platt RV, Rapoza L. 2008. An evaluation of an object-oriented paradigm for land use/land cover classification. Prof Geogr. 60:87–100.
  • Yuan F, Bauer ME, Heinert NJ, Holden GR. 2005a. Multi‐level land cover mapping of the Twin Cities (Minnesota) metropolitan area with multi‐seasonal Landsat TM/ETM+ data. Geocarto Int. 20:5–13.
  • Yuan F, Sawaya KE, Loeffelholz B, Bauer ME. 2005b. Land cover classification and change analysis of the Twin Cities (Minnesota) metropolitan area by multitemporal Landsat remote sensing. Remote Sens Environ. 98:317–328.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.