Search in:

Geo-spatial Information Science Volume 19, 2016 - Issue 2: Big Data in Geo-spatial Information Science

Submit an article Journal homepage

Open access

2,706

Views

CrossRef citations to date

Altmetric

Articles

Spatially simplified scatterplots for large raster datasets

Bin LiDepartment of Geography, College of Science and Engineering, Central Michigan University, Mount Pleasant, MI, USACorrespondence[email protected]

Daniel A. GriffithSchool of Economic, Political and Policy Sciences, University of Texas at Dallas, Richardson, TX, USA

Brian BeckerDepartment of Geography, College of Science and Engineering, Central Michigan University, Mount Pleasant, MI, USA

Pages 81-93 | Received 10 Feb 2016, Accepted 13 Apr 2016, Published online: 24 May 2016

Cite this article
https://doi.org/10.1080/10095020.2016.1179441
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF

Figures & data

Figure 1. Scatterplot and a GAM fitted line for Normalized Difference Vegetation Index (NDVI) (Y) and ETM + Band 4 (X). Data are standardized. The R package mgcv was used to fit the GAM (Wood Citation2006; Hastie and Tibshirani Citation1990).

Figure 2. Alpha blending scatterplot of a subset (1000 × 1000) of the same data in Figure 1, with alpha value set to 0.05. The alpha value specifies the level of transparency for the data points, ranging from 1 for opaque to 0 for black. Even at such a low alpha level, excessive overplotting remains a problem.

Figure 3. Two-dimensional binned kernel smoothing scatterplot.

Notes: Black dots are bins with a single data point. Data are the same as Figure .

Figure 4. Scatterplot with contours. Due to the lack of sufficient memory to render the plot with the large data-set in Figure 2, 10,000 data points are randomly generated from a standard normal distribution.

Figure 5. Binned scatterplot with the same data in Figure 2.

Notes: A 30-by-30 grid was used to bin the original data (1000 × 1000). Brightness symbolizes the counts of data points in each bin. Changing the bin size can achieve different levels of generalization. The shape of the bins can be either rectangular or hexagonal.

Figure 6. Nested lattice hexagon binning with the same data in Figure 2.

Notes: For the hexagons, size of the inner point represents counts, while hue represents the hierarchical categories and serves as borders between the hexagons. The graph was generated with the R package hexbin.

Table 1. Summary statistics of the simulated data-set.

Download CSV Display Table

Figure 7. Simulated data-set X (left) and Y (right).

Notes: Y = 0.5X + X² + X³. The size of the image is 100 × 100 cells. See Table for the statistical summaries.

Figure 8. Scatterplot with a local regression fit for the simulated data-set X and Y.

Figure 9. Remote sensing data-set used in the experiment.

Notes: Landsat ETM Band 4 (left) and derived NDVI (right). Image dimension is 3000 rows by 3000 columns, with a spatial resolution of 30 m. Data were obtained on 6 July 2002, near Palisades, Idaho. Figures are various scatterplots for ETM Band 4 and NDVI.

Table 2. Summary statistics of the remote sensing data-set.

Download CSV Display Table

Figure 10. Illustrations of sample sites for three sampling schemes: random (left), regular (center), and hexagon stratified random (right). The background image is for the variable X, with 100 rows by 100 columns. Variable Y is not shown here. Effective sample size () was calculated based on ρ_x = 0.9658, ρ_y = 0.5136, ρ_xy = 0.4688. Due to geometric restrictions, actual sample sizes are 2809, for regular sampling, and 2438, for hexagon stratified random sampling.

Table 3. Summary statistics for simulated variables from the resampled sets.

Display Table

Figure 11. Spatially simplified scatterplot consisting of the point cloud and local/global fitting lines overlays on the same graphic features from the original data.

Notes: Grey dots are the original data (n = 10,000 data points). Black dots are data points from the resampled sets (left: random sampling, n^* = 2785; center: regular sampling, n^* = 2809; right: hexagon stratified random sampling, n^* = 2438). The red curve (with a yellow error band) and the red dashed line are from the sampled set. The white curve (with a blue error band that appear grey due to color mixing with the yellow band) and the green line are from the original data.

Figure 12. Spatially simplified scatterplots (point clouds with local/global fits) from three resampling schemes (left: random sampling; center: regular sampling; right: hexagon stratified sampling).

Figure 13. A random sample (n = 1326, right) of the hexagon stratified random sample set (n = 2438, left).

Note: Sample sites are shown as yellow dots.

Figure 14. Spatially simplified scatterplot of the simulated data using a two-stage resampling scheme.

Notes: In the left graph, the fitted curves for the original data are colored in white (with a blue error band) and green. The red curve (with a yellow error band) and the red dashed line are from the sampled set. The original data points are colored in light grey. In the right graph, the LOESS line is red with a blue error band. The dashed line is the OLS fit.

Table 4. Summary of resampling statistics for the remote sensing data-set.

Display Table

Figure 15. An overlay of scatterplots from the original data and the resampled set.

Notes: Light grey point cloud: original data (3000 × 3000 points). Darker grey point cloud: hexagon stratified random sampled data (174,212 points). Black wavy curve: GAM fitted line from the original data. Violet wavy curve: GAM fitted line from sample data. Blue curve: a cubic polynomial fit to the sample data. Red curve: a quartic polynomial fit to the sample data. Green curve: a quartic polynomial fit to the original data. Black dash: linear fit to the original data. Violet dash: linear fit to the sample data.

Figure 16. Scatterplot of data from the stage-two regular resampling.

Notes: Light grey point cloud: stage-one hexagon random sampled data (174,212 data points). Darker grey point cloud: stage-two regular sampled data (42,025 data points). Black wavy curve: GAM fitted line from stage-one data. Violet wavy curve: GAM fitted line from stage-two resampled data. Blue curve: a cubic polynomial fit to the stage-two resampled data. Red curve: a quartic polynomial fit to the stage-two resampled data. Green curve: a quartic polynomial fit to the stage-one data. Black dash: linear fit to the stage-one data. Violet dash: linear fit to the stage-two resampled data. X axis = normalized Band 4 radiance, Y axis = normalized NDVI, NDVI = [(Band 4 − Band 3)/(Band 4 + Band 3)]. NDVI is a simple algebraic combination of remotely sensed spectral information (eg Bands 4 and 3 from Landsat Enhanced Thematic Mapper) that provides meaningful information about vegetative structure (leaf/cellular structure) and condition (chlorophyll content). Generally, high NDVI values indicate that photosynthetically active plant biomass is present.

Wood, S. 2006. Generalized Additive Models: An Introduction with R. London: CRC Press.

Google Scholar

Hastie, T. J., and R. J. Tibshirani. 1990. Generalized Additive Models. vol. 43. London: CRC Press.

Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Spatially simplified scatterplots for large raster datasets

Table 1. Summary statistics of the simulated data-set.

Table 2. Summary statistics of the remote sensing data-set.

Table 3. Summary statistics for simulated variables from the resampled sets.

Table 4. Summary of resampling statistics for the remote sensing data-set.

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Spatially simplified scatterplots for large raster datasets

Figures & data

Table 1. Summary statistics of the simulated data-set.

Table 2. Summary statistics of the remote sensing data-set.

Table 3. Summary statistics for simulated variables from the resampled sets.

Table 4. Summary of resampling statistics for the remote sensing data-set.

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date