828
Views
0
CrossRef citations to date
0
Altmetric
Original Articles

Deriving nutrient criteria to minimize false positive and false negative water use impairment determinations

, &

ABSTRACT

Smeltzer E, Kamman NC, Fiske S. 2016. Deriving nutrient criteria to minimize false positive and false negative water use impairment determinations. Lake Reserv Manage. 32:182–193.

Nutrient water quality criteria for lakes and wadeable streams in Vermont were derived using receiver operating characteristic (ROC) analysis to evaluate false positive and false negative error rates in making water use impairment determinations. Numeric nutrient criteria for total phosphorus (TP), chlorophyll a (Chl-a), and Secchi depth were derived to protect aesthetic uses in lakes in a manner that balanced and minimized these 2 types of errors. TP and total nitrogen (TN) criteria were derived to protect aquatic life uses in wadeable streams. The resulting criteria for lakes were TP = 17−18 µg/L, Chl-a = 3.8−5.2 µg/L, and Secchi depth = 2.6−3.2 m, depending on the applicable tiered water use objective defined in the state's water quality standards. Wadeable stream criteria were TP = 9−27 µg/L and TN = 0.25−1.3 mg/L, depending on the designated water use and the stream macroinvertebrate community type. These criteria, with some modifications and with the exception of TN, were incorporated into the Vermont Water Quality Standards as part of an integrated nutrient criteria framework.

Pollution of lakes, rivers, estuaries, and wetlands by excess levels of nutrients such as phosphorus and nitrogen is a serious water quality problem globally. Adoption of nutrient criteria is one tool that environmental management agencies can use to control nutrient concentrations and protect the designated uses of those waters from the harmful effects of excessive eutrophication. The US Environmental Protection Agency (USEPACitation1998) issued a National Strategy for the Development of Regional Nutrient Criteria intended to set states and tribes on course to adopt numeric nutrient criteria in their water quality standards by 2003. However, only 7 states (Hawaii, Minnesota, New Jersey, Rhode Island, Florida, West Virginia, and Wisconsin) have succeeded in adopting phosphorus or nitrogen criteria that completely cover one or more waterbody types within their jurisdictions (USEPA Citation2015).

Progress has been slow because nutrient criteria present significant scientific and policy challenges. Criteria derived from various percentiles of the distributions of nutrient variables within an ecoregion have been proposed (e.g., USEPA Citation2000a, Citation2000b), but this approach yields inconsistent levels of protection when applied to different ecoregions (Suplee et al. Citation2007), produces criteria that differ from values derived from other methods (Herlihy et al. Citation2013), and may inappropriately presume impairment of a certain fraction of lakes within an ecoregion by cultural eutrophication (Bachmann et al. Citation2013). The percentile method also lacks the desirable feature of providing a direct predictor of water use attainment (Reckhow et al. Citation2005).

Relationships between nutrient concentrations and eutrophication response variables are often weak because of large temporal variability (Knowlton and Jones Citation2006) and multiple environmental factors affecting these relationships (Dodds and Welch Citation2000). Efforts to derive nutrient criteria have dealt with this variability by quantifying stressor–response relationships using multiple variables and statistical methods and using a weight-of-evidence approach to select criteria values (Heiskary and Wilson Citation2008, Stevenson et al. Citation2008, Smith and Tran Citation2010, Suplee and Watson Citation2013, Heiskary and Bouchard Citation2015).

Even when statistically valid nutrient stressor–response relationships are developed and applied narrowly to specific waterbody types and individual ecoregions, any single nutrient concentration value still runs the inherent risk of being either under-protective or over-protective for a proportion of the lakes or streams subject to the criterion. Medical researchers developing diagnostic tests to identify the presence of disease face similar concerns about false positive and false negative errors and have applied methods known as receiver operating characteristic (ROC) analysis for evaluating these tests (Obuchowski Citation2003, Zou et al. Citation2007). Medical applications of ROC analysis include comparison of candidate tests (e.g., blood markers for disease) with respect to their diagnostic accuracy and selection of test criteria values that optimally balances sensitivity (the probability that the test correctly identifies the presence of disease) and specificity (the probability that the test correctly indicates the absence of disease). Related methods were applied in a nutrient criteria context by McLaughlin (Citation2012) to analyze use attainment decision error rates in Florida lakes and by Shine et al. (Citation2003) to compare sediment quality guidelines for toxic metals. In this paper, we present a method using aspects of ROC analysis for quantifying and minimizing false positive and false negative use impairment determinations from nutrient criteria and describe the application of this method to derive nutrient criteria to protect aesthetic uses in lakes and aquatic life uses in wadeable streams in the State of Vermont.

Methods

Lake nutrient data

Lake data for 1987–2013 were obtained from the database developed by the Vermont Lay Monitoring Program (Sargent Citation2014), a citizen volunteer effort supported by the Vermont Department of Environmental Conservation (DEC) according to a USEPA-approved quality assurance project plan. Sampling and laboratory analytical methods were unchanged during this time period. Weekly summer (Jun–Aug) sampling was conducted for total phosphorus (TP), chlorophyll a (Chl-a), and Secchi disk transparency on 87 Vermont inland lakes, although not all variables were measured on each lake every year. Data from 2 large transboundary lakes, Champlain and Memphremagog, were excluded from the analysis because state water quality criteria for phosphorus already exist for those lakes. TP and Chl-a samples were obtained as vertically integrated composites of the photic zone, defined as twice the Secchi depth on the day of sampling. Total nitrogen (TN) was not measured as part of this program.

Lake user survey

A lake user survey (Smeltzer and Heiskary Citation1990) was conducted in conjunction with the Vermont Lay Monitoring Program during 1987–1991 and again during 2006–2013. A survey form was used to elicit opinions from the trained citizen monitors on the suitability of the lake water for recreation and aesthetic enjoyment on the day of sampling. Five alternative survey response choices were used to determine which of 3 different tiers of aesthetic use support defined in the Vermont Water Quality Standards (Vermont DEC Citation2014a) existed at the time of observation (). The survey results included 5073 user responses paired with simultaneous measurements of one or more nutrient criteria variables (TP, Chl-a, and/or Secchi depth) from 87 Vermont inland lakes.

Table 1. Three tiers of aesthetic use support defined in the Vermont Water Quality Standards (emphasis added in bold) and the corresponding Vermont lake user survey response choices matched to each tier.

Stream nutrient and biological data

Biological and nutrient data from wadeable streams were obtained by the Vermont Ambient Biomonitoring Program (Vermont DEC Citation2004a) during 2002–2011. Macroinvertebrate community assessments were conducted during late summer to early fall when periphyton accrual and associated responses by the macroinvertebrate community would be evident. Samples were collected under base flow conditions when the stream was not influenced by a concurrent precipitation event or by a rain event exceeding 1 inch during the previous 48 hours. Single grab samples for TP and TN analysis were collected concurrently with the macroinvertebrate community sampling. Sites with known impacts from toxins or other non-nutrient pollutants were excluded from the nutrient criteria analysis. These included sites on the state's Section 303d list of waters impaired by non-nutrient pollutants and sites with extremely low macroinvertebrate density or taxa richness indicative of toxic effects. Sites with >80% forest canopy were also excluded from the analysis in an effort to minimize the potentially confounding effects of light limitation of periphyton growth. Nutrient concentration and biological assessment results obtained at the same site on multiple dates were averaged to produce a single value for each site.

The following 8 metrics of macroinvertebrate community structure and function were used to assess the biological condition of wadeable streams (Resh and Jackson Citation1993, Vermont DEC Citation2004b):

1.

Density

2.

Richness

3.

Ephemeroptera-Plecoptera-Trichoptera (EPT) index

4.

Percent Model Affinity of Orders (PMA-O) (Novak and Bode Citation1992)

5.

Hilsenhoff Biotic Index (Hilsenhoff Citation1987)

6.

Percent Oligochaeta

7.

EPT/(EPT+Chironomidae) ratio

8.

Pinkham–Pearson Coefficient of Similarity –Functional Groups (Pinkham and Pearson Citation1976, Merrit et al. Citation2008)

The biological condition assessment method (Vermont DEC Citation2004b) was applied at each stream site to classify the site by attainment level along a tiered aquatic life use gradient defined in the Vermont Water Quality Standards (Vermont DEC Citation2014a). The 4 tiers of aquatic life use support were differentiated in the standards by the extent to which the stream macroinvertebrate community departed from the reference condition (). In Vermont, the macroinvertebrate biometrics of invertebrate density and biotic index respond to phosphorus-induced periphyton growth in a predictable fashion, which can result in loss of attainment of aquatic life use standards (Hilsenhoff Citation1987, Bryce and Hughes Citation2003, Vermont DEC Citation2004b); however, responses of the macroinvertebrate community to co-occurring pollutants such as sediment cannot be completely distinguished by these metrics.

Table 2. Four tiers of aquatic life use support defined in the Vermont Water Quality Standards based on the extent of change from the reference biological condition (emphasis added in bold).

These metric thresholds were derived separately for 3 macroinvertebrate wadeable stream types differentiated by natural (i.e., reference) macroinvertebrate community taxa compositions, including small, high-gradient (SHG) streams; medium, high-gradient (MHG) streams; and warm-water, moderate gradient (WWMG) streams (Vermont DEC Citation2004a). The geophysical factors used to separate the stream types included alkalinity, slope, elevation, percent canopy, and drainage area. Nutrient and macroinvertebrate community assessment data were available from 385 sites, including 130 SHG sites, 158 MHG sites, and 97 WWMG sites.

Receiver-operating characteristic analysis

ROC analysis (Obuchowski Citation2003, Zou et al. Citation2007) involved the creation of a 2 × 2 contingency table of site counts ()for lake or stream nutrient criterion candidate values (e.g., TP, TN, Chl-a, or Secchi depth) at increments over the range of the data. Each site was classified as either supporting or not supporting the applicable use, depending on whether the site nutrient indicator was below or above the candidate value. These nutrient-based use support classifications were compared in the contingency table with the determinations of use support provided by the lake user survey or the stream macroinvertebrate community assessments, which were direct measures of the level of attainment of the designated uses (i.e., the “gold standard” determination of disease status in the medical analogy).

Table 3. Contingency table of site counts used to calculate false positive and false negative error rates for a specific nutrient criterion candidate value.

The false positive error rate (FPR: the probability that the nutrient test value incorrectly classified a site as having a use impairment) and the false negative error rate (FNR: the probability that the nutrient candidate value incorrectly classified a site as attaining the use) were calculated for each nutrient candidate value from equations Equation1 and Equation2. The related characteristics of sensitivity (the probability that the nutrient test value correctly identified a site as having a use impairment) and specificity (the probability that the nutrient test value correctly classified a site as attaining the use) were calculated from equations Equation3 and Equation4. These nonparametric tabulations were automated by statistical software tools (Systat Software Inc. Citation2015), and the results were analyzed graphically. (1) (2) (3) (4) where NFP = number of false positives, NTN = number of true negatives, NFN = number of false negatives, and NTP = number of true positives.

Areas under the ROC curves (plots of sensitivity vs. 1–specificity) were calculated (Systat Software Inc. Citation2015) and used to compare the overall predictive accuracy of the various nutrient variables across the range of potential criteria values (Obuchowski Citation2003). ROC curve areas normally vary between 1.0 representing perfect diagnosis and 0.5 representing that obtained by random guessing.

Correction for sampling bias

Representativeness of the sample with respect to the population is a concern when evaluating diagnostic tests (Swets Citation1988). Estimates of false positive and false negative error rates calculated according to equations Equation1 and Equation2 are potentially subject to bias when the distribution of nutrient criteria variables in the sampled lakes or streams differs from the true population distribution. To assess the potential for this type of bias in the lake data, the distributions of mean TP, Chl-a, and Secchi depth from the Vermont Lay Monitoring Program lakes were compared () with the population distributions for these variables in all Vermont lakes >20 ac (8 ha) in surface area, as determined from the National Lakes Assessment (NLA) in Vermont (USEPA Citation2009, Vermont DEC Citation2013). The NLA used a probability-based sampling design to produce distributions of the water quality variables representative of the true population distribution for the region.

Figure 1. Cumulative frequency distributions of lake and stream nutrient criteria variables. Population distributions derived from probability-based sampling for the National Lakes Assessment (NLA) in Vermont compared with sampled distributions of summer lake means from the Vermont Lay Monitoring Program (LMP). Cumulative frequency distributions of low-flow TP and TN derived from probability-based sampling for the National Wadeable Streams Assessment (NWSA) in Vermont compared with TP and TN distributions from small, high-gradient (SHG); medium, high-gradient (MHG); and warm-water, moderate-gradient (WWMG) streams sampled for Vermont nutrient criteria development.

Figure 1. Cumulative frequency distributions of lake and stream nutrient criteria variables. Population distributions derived from probability-based sampling for the National Lakes Assessment (NLA) in Vermont compared with sampled distributions of summer lake means from the Vermont Lay Monitoring Program (LMP). Cumulative frequency distributions of low-flow TP and TN derived from probability-based sampling for the National Wadeable Streams Assessment (NWSA) in Vermont compared with TP and TN distributions from small, high-gradient (SHG); medium, high-gradient (MHG); and warm-water, moderate-gradient (WWMG) streams sampled for Vermont nutrient criteria development.

Lake mean TP and Chl-a in the Vermont Lay Monitoring Program dataset were significantly higher than in the entire Vermont lake population (2-tailed t-test for independent means on log-transformed data, α = 0.05), and the difference in mean Secchi depth was marginally significant (P = 0.06). The unexpected positive Secchi depth bias relative to the NLA population distribution can be explained by different sets of lakes sampled for TP, Chl-a, and Secchi depth during the duration of the Lay Monitoring Program.

A similar evaluation for sampling bias was not possible for wadeable streams because the probability-based sampling conducted for the National Wadeable Streams Assessment (NWSA) in Vermont (USEPA Citation2006, Vermont DEC Citation2008) did not distinguish between stream types. A comparison of the sampled TP and TN distributions for SHG, MHG, and WWMG streams with the population distributions for all wadeable streams () suggested that differences may have existed for some stream types, but the extent of the bias is unknown.

To provide a basis for calculating false positive and false negative impairment determination rates from representative population distributions of the lake data, the design weights assigned to each lake included in the NLA within Vermont were used to synthesize a larger set of TP, Chl-a, and Secchi depth values (n > 7000) having the same representative population distributions. These design weights were proportional to each lake's representation in the overall lake population stratified on lake size and geographic region (USEPA Citation2009, Vermont DEC Citation2013).

Each value in these synthetic distributions was randomly assigned a use impairment status (0 = no impairment; 1 = impairment) at frequencies corresponding to the probability of impairment determined by logistic regression (Helsel and Hirsch Citation2002). Because regression provides a way to establish the relationship between nutrient criteria variables and the probability of impairment in a manner that is robust to site selection bias, this procedure was used to minimize the effect of such bias on the ultimate nutrient criteria that were proposed.

The long-term individual lake mean TP, Chl-a, and Secchi depth values for the period of record from the Vermont Lay Monitoring Program dataset and the individual user survey responses from each date were used for the logistic regression analysis. Impairment status was treated as the binary dependent variable in the logistic regression model with the following form: (5) (6) where P = probability of impairment; X = lake mean TP, Chl-a, or Secchi depth; and b0, b1 are the regression coefficients.

Regressions were conducted for all 3 lake water quality variables at the 2 aesthetic use attainment tiers indicated in (excellent and good). All 6 regression equations had coefficients for the independent variable significantly different from zero (Wald statistic, α = 0.05).

Impairment status was randomly assigned at the frequencies (P) indicated by the logistic regression results to each lake mean TP, Chl-a, and Secchi depth value in the synthetic population distributions for each of the 2 aesthetic use attainment tiers. For example, if the weighting procedure produced a particular TP concentration having 100 occurrences in the synthetic distribution, and if the impairment probability (P) predicted for that TP value from the logistic regression was 0.40, then ∼40 of these observations would be assigned a positive impairment status and ∼60 assigned a not-impaired status. It was then possible using ROC procedures to calculate false positive and false negative error rates from distributions of summer mean TP, Chl-a, and Secchi depth representative of all Vermont lakes >8 ha in area.

Results

The distributions of individual lake TP, Chl-a, and Secchi depth observations associated with each lake user response category () varied significantly in the expected direction with the level of use support (, Kruskal–Wallis one-way analysis of variance on ranks, P < 0.001). The stream TP and TN distributions also varied significantly among the different tiers of aquatic life use support indicated by the macroinvertebrate community assessment procedure (, ).

Figure 2. Distributions of individual TP, Chl-a, and Secchi depth observations associated with each lake user survey response choice from (1 = excellent, 2 = good, 3 = slightly impaired, 4 = enjoyment substantially reduced, 5 = enjoyment nearly impossible). Box plots show the 25th, 50th, and 75th percentiles; 5th and 95th percentiles are shown as whiskers. Overall significance values (P) were based on a Kruskal–Wallis one-way analysis of variance on ranks. Medians without letters in common were significantly different, based on individual pairwise comparisons (Dunn's method, α = 0.05).

Figure 2. Distributions of individual TP, Chl-a, and Secchi depth observations associated with each lake user survey response choice from Table 1 (1 = excellent, 2 = good, 3 = slightly impaired, 4 = enjoyment substantially reduced, 5 = enjoyment nearly impossible). Box plots show the 25th, 50th, and 75th percentiles; 5th and 95th percentiles are shown as whiskers. Overall significance values (P) were based on a Kruskal–Wallis one-way analysis of variance on ranks. Medians without letters in common were significantly different, based on individual pairwise comparisons (Dunn's method, α = 0.05).

Figure 3. Distributions of low-flow TP and TN concentrations associated with each tier of aquatic life use support () as determined from the stream macroinvertebrate community assessments. Stream types are small, high-gradient (SHG); medium, high-gradient (MHG); and warm-water, moderate-gradient (WWMG). Box plots show the 25th, 50th, and 75th percentiles; 5th and 95th percentiles are shown as whiskers. Overall significance values (P) were based on a Kruskal–Wallis one-way analysis of variance on ranks. Medians without letters in common were significantly different based on individual pairwise comparisons (Dunn's method, α = 0.05).

Figure 3. Distributions of low-flow TP and TN concentrations associated with each tier of aquatic life use support (Table 2) as determined from the stream macroinvertebrate community assessments. Stream types are small, high-gradient (SHG); medium, high-gradient (MHG); and warm-water, moderate-gradient (WWMG). Box plots show the 25th, 50th, and 75th percentiles; 5th and 95th percentiles are shown as whiskers. Overall significance values (P) were based on a Kruskal–Wallis one-way analysis of variance on ranks. Medians without letters in common were significantly different based on individual pairwise comparisons (Dunn's method, α = 0.05).

The statistical significance of these overall relationships indicated it was appropriate to use these variables to develop nutrient criteria for aesthetic use in lakes and aquatic life uses in wadeable streams in Vermont; however, not all individual pairwise comparisons indicated statistically significant differences between median stream TP and TN values associated with different tiers of aquatic life use support (Dunn's method, α = 0.05). Collapsing the analysis into fewer tiers might have been warranted but would have departed from the structure of the Vermont Water Quality Standards that define 4 tiers of aquatic life use support (), so all tiers were retained in the analysis.

FPR and FNR error rates across the range of potential nutrient criteria candidate values were plotted for the applicable use attainment tiers for lakes () and wadeable streams (). These plots make explicit the trade-offs involved in selecting nutrient criteria values. More stringent criteria reduce the risk of failing to identify impaired waters (false negatives) but increase the risk of declaring impairments that do not actually exist (false positives).

Figure 4. False positive (solid lines) and false negative (dotted lines) impairment determination error rates for 2 tiers of aesthetic use attainment (: “excellent” and “good”) as a function of potential criteria values for lake mean summer TP, Chl-a, and Secchi depth.

Figure 4. False positive (solid lines) and false negative (dotted lines) impairment determination error rates for 2 tiers of aesthetic use attainment (Table 1: “excellent” and “good”) as a function of potential criteria values for lake mean summer TP, Chl-a, and Secchi depth.

Figure 5. False positive (solid lines) and false negative (dotted lines) impairment determination error rates for 3 tiers of aquatic life use attainment (: “natural condition,” “minor change” from reference condition, and “moderate change” from reference condition) as a function of potential criteria values for low-flow TP and TN in small, high-gradient (SHG); medium, high-gradient (MHG); and warm-water, moderate-gradient (WWMG) streams.

Figure 5. False positive (solid lines) and false negative (dotted lines) impairment determination error rates for 3 tiers of aquatic life use attainment (Table 2: “natural condition,” “minor change” from reference condition, and “moderate change” from reference condition) as a function of potential criteria values for low-flow TP and TN in small, high-gradient (SHG); medium, high-gradient (MHG); and warm-water, moderate-gradient (WWMG) streams.

Areas under the ROC curves were higher for the lake variables than for the stream variables (). Areas under ROC curves can be interpreted as the probability that a randomly selected site with a use impairment has a nutrient test result that indicates a greater suspicion of impairment than a randomly chosen unimpaired site (Obuchowski Citation2003). Choice of a minimum acceptable ROC curve area is a judgment call, but values >0.80 are generally sought by medical researchers. TP provided the highest prediction accuracy of impairment among the variables used to assess aesthetic uses in lakes, with a median ROC curve area of 0.89. The median ROC curve area for stream TP was 0.66, indicating only fair diagnostic accuracy. ROC curve areas for stream biological impairment were similar for TN (median = 0.68), suggesting that TN might be as effective as TP as a water quality criterion for Vermont streams.

Figure 6. (a) Example of an ROC curve (solid line) for the diagnostic accuracy of lake TP in determining attainment of the “good” aesthetic use support tier. The area under the ROC curve (AUC) is 0.93 in this example. The dotted line illustrates an AUC of 0.5, representing the expected outcome from random guessing. (b) Areas under the ROC curves, showing medians (horizontal mid-lines) and ranges (vertical bars) across all use attainment tiers () for each lake and stream variable.

Figure 6. (a) Example of an ROC curve (solid line) for the diagnostic accuracy of lake TP in determining attainment of the “good” aesthetic use support tier. The area under the ROC curve (AUC) is 0.93 in this example. The dotted line illustrates an AUC of 0.5, representing the expected outcome from random guessing. (b) Areas under the ROC curves, showing medians (horizontal mid-lines) and ranges (vertical bars) across all use attainment tiers (Table 4) for each lake and stream variable.

Discussion

The FPR and FNR curves ( and ) provided a basis for deriving nutrient criteria with explicit knowledge of the risk of false positive and false negative errors when making use impairment determinations from the criteria. Both types of error are serious. False negative errors could result in ongoing use impairment without focusing management attention on the pollution sources; false positive errors could lead to inappropriate or excessive management interventions that would be better directed elsewhere.

One method for minimizing and balancing these 2 types of errors is to select a nutrient criterion value corresponding to the point where the 2 error rates are equal (i.e., the point where the FPR and FNR curves cross). This approach assumes that each type of error is equally undesirable. Alternate methods could weight one type of error more heavily than the other, or choose a point where the sum of the 2 error rates is minimized (Zou et al. Citation2007). The latter approach, however, would not provide management control over the balance between the 2 types of errors.

Criteria for TP, Chl-a, and Secchi depth were derived for Vermont lakes and streams using the intersections of the FPR and FNR curves in and , with some modifications (). Although 10 states have been identified by USEPA (Citation2015) as having some form of TN criteria for their lakes or streams, Vermont DEC (Citation2014b) deferred adoption of TN criteria for wadeable streams in its state water quality standards because of uncertainty over the need for nitrogen reduction independent of phosphorus controls on point and nonpoint sources. The lake Chl-a criterion corresponding to “good” aesthetic value was adjusted upward from 5.2 to 7.0 µg/L to improve consistency of use impairment determinations between TP, Chl-a, and Secchi depth criteria. It was suspected that metalimnetic algal populations in some lakes were resulting in high Chl-a concentrations in the vertically integrated samples without producing aesthetic use impairments at the lake surface. Adoption of stream TP criteria representing “minor” biological change was deferred to a future revision of the Vermont Water Quality Standards (Vermont DEC Citation2014b).

Table 4. Nutrient criteria values derived from the ROC analysis. Values correspond to the intersections of the false positive and false negative error rate curves in and , and were incorporated into the Vermont Water Quality Standards except where noted (Vermont DEC Citation2014a). Areas under the ROC curves (AUC) are provided as measures of diagnostic accuracy for each variable and use support tier.

The ROC analysis provided a way to minimize the rates of false positive and false negative use impairment determinations but did not eliminate the potential for such errors. The percentage of lake or stream sites that would be misclassified in one direction or the other by applying the criteria in remained in the range of 20–40% in most cases. ROC curve areas () indicated that the diagnostic accuracy of the stream TP and TN criteria was relatively low in some cases. These findings suggested that independent application of TP or TN concentration criteria alone would too often result in incorrect impairment determinations and inappropriate management responses, perhaps due to additional variables or stressors influencing these relationships.

The US EPA (Citation2013) released guidance on an approach for implementing numeric nutrient criteria that integrates causal and response variables. Under this approach, compliance with nutrient criteria may be attained either by compliance with causal variables (e.g., TP or TN concentrations) or by compliance with a set of eutrophication response variables (e.g., Chl-a, Secchi depth, biological community response, pH, dissolved oxygen, turbidity). This US EPA guidance provided a means to resolve state concerns about independent applicability of nutrient criteria in light of high false positive and false negative error rates. Vermont subsequently adopted a set of integrated nutrient criteria for lakes and wadeable streams in its state water quality standards (Vermont DEC Citation2014a, Citation2014b).

The nutrient criteria derived for Vermont lakes and wadeable streams using ROC methods were comparable to criteria proposed or adopted for other northern US states (). The Vermont lake TP, Chl-a, and Secchi depth criteria were within the overall range of values for Maine, Minnesota, and Wisconsin but less stringent than criteria developed by the US EPA (Citation2000c) to represent reference conditions for lakes and reservoirs in Nutrient Ecoregion VIII (Nutrient Poor Largely Glaciated Upper Midwest and Northeast). The Vermont wadeable stream TP and TN criteria tended to be at the lower end or below the range of criteria in Maine, Minnesota, Montana, and Wisconsin but within the range of reference condition values developed by the US EPA (Citation2001) for rivers and streams in Nutrient Ecoregion VIII.

Table 5. Examples of nutrient criteria proposed or adopted in northern US states.

The nutrient criteria for other states were derived using a variety of methods including distribution percentiles, stressor–response relationships, and weight-of-evidence approaches. The comparability of the Vermont criteria to those in other northern US regions suggests that ROC analysis may provide a valid and potentially simpler method to develop nutrient criteria where nutrient concentration data and direct measures of the level of attainment of designated uses are available. Balancing false positive and false negative error rates using ROC methods, and considering both nutrient concentrations and response variables when making use support determinations, can lead to more optimal and effective water quality criteria (Shine et al. Citation2003, McLaughlin Citation2012).

Conclusions

ROC analysis had several benefits as a method for deriving nutrient criteria for Vermont waters. The analysis made it possible to explicitly consider the risk of false positive and false negative errors in making use impairment determinations from the criteria and to appropriately balance these error rates when choosing the criteria values. Documentation of the potential error rates when applying TP or TN concentration criteria independently provided convincing evidence for the need to adopt a water quality standards structure that integrated both causal and eutrophication response variables. The ROC analysis also facilitated the public rulemaking process for the adoption of water quality standards in Vermont. Stakeholder confidence in the fairness of the proposed nutrient criteria was enhanced by the ability to communicate the manner in which false positive and false negative error rates were balanced in the derivation of the criteria.

Acknowledgments

We thank L. Yuan from the US EPA Office of Science and Technology for pointing out the potential for sampling bias to affect the analysis of false positive and false negative error rates and for suggesting the solution involving logistic regression. Three anonymous reviewers provided constructive comments that improved the manuscript.

Funding

This work was supported by Vermont DEC and grants from the US EPA.

References

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.