255
Views
1
CrossRef citations to date
0
Altmetric
Research Article

The effect of forest land use on the cost of drinking water supply: machine learning evidence from South African data

&
Pages 361-374 | Received 25 Sep 2021, Accepted 27 Dec 2021, Published online: 24 Jan 2022
 

ABSTRACT

Water quality amelioration is one of the key ecosystem services provided by forests in the catchment areas of water supply systems. In this study, we applied random effect models and the least absolute shrinkage and selection regression method of machine learning to South African panel data to estimate the causal effect of natural forest cover on municipalities' water treatment cost. We controlled for a range of confounding covariates including other land cover variables including wetlands, plantation forests, grassland, woodland etc. The Lasso based instrumental variable (IV) method allowed us to simultaneously account for model uncertainty surrounding variable selection and endogeneity bias. We found significant and robust evidence that natural forestland cover reduces water treatment costs at the intensive margin. Estimates from our preferred models indicated that the marginal benefit of increasing forest cover is R310.63 /ha/year. We also found that the elasticity response of water treatment cost to natural forest area is 0.02%. Our estimate of the marginal value of the water purification service is small compared to the producer's surplus from alternative land uses. However, protection of natural forest land use might be defended if other ecosystem goods and services provided by natural forests are taken into account.

JEL CLASSIFICATION:

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 For detailed review of identification problems of this previous literature, see Vincent et al. (Citation2015).

2 For deeper exposition of this framework, we refer our readers to Vincent et al. (Citation2011) and Vincent et al. (Citation201Citation5).

3 A hydrologically sensitive area is a relatively small portion of the watershed which contributes actively to runoff (Qiu, Citation2009). It is a saturated area such that runoff is independent of rainfall intensity; this hydrological process is called variable-source area hydrology (Walter et al., Citation2000).

4 We lack theoretical guidance or useful representation of prevailing structural assumptions for inclusion of environmental variables.

5 The number of regressors increases if we account for non-linearity, interaction effects, parameter heterogeneity, and spatial and temporal effects

6 Note that iterative testing procedures, such as these, typically induce pre-testing biases, and that hypothesis tests often lead to many false positives (Ahrens et.al, 2019).

7 The strength of regularized regression as a prediction technique stems from the bias-variance trade-off (Ahrens et.al, 2019).

8 For ease of exposition, Lasso, much like OLS, estimates regression coefficients by minimizing the sum of the usual least squares objective function. However, it imposes a penalty for model size through the sum of the absolute values of the coefficients to achieve a sparse solution, i.e., most coefficients are set to zero.

9 Note also that, unlike traditional IV methods, instrument selection procedures do not require the identity of these “important” variables to be known a priori, as the identity of these instruments will be estimated from the data.

10 Note that the Newey SE estimate is robust to heteroscedastic and auto-correlated error terms of MA(q). Thus, deviation from MA(q) autocorrelation of error terms may not guarantee robustness of SE.

11 Both Lasso and post-Lasso estimators selected lagged volume as the only optimal instrument and the weak identification tests rejected the null hypothesis that the selected instrument is only weakly correlated with the endogenous regressors (optimal Post-Lasso F stats (cluster-robust) = 55). Moreover, the super-score test shows that the orthogonalized version of the endogenous variable (treated water volume) and exogenous variable are not correlated with respect to structural error term (p = 0.05).

12 Note that the Lasso algorithm doesn’t penalize the endogenous variable (in our case, volume of treated water). Moreover, we report that the Wu-Hausman test of water volume endogeneity rejected the hypothesis that water volume is exogenous (F = 4, 7, p = 0.08)

13 Note that a log-log or double-log model provides us with direct estimates of the elasticities of the independent variables, unlike the log-linear models presented in .

14 In terms of semi-elasticity, which facilitates comparison to other published evidence (Vincent et al. (Citation2015), avoiding the conversion of natural forest land use into non-forest, non-grassland land use reduces water treatment cost by 1.5%.

15 The estimate on forestland use is interpreted as the intensive margin. Because our model excludes all other land uses, except grassland, it estimated the average effect of increased forest relative to all those excluded land uses instead of just a single land use.

Additional information

Funding

This work was supported by Swedish Environmental Protection Agency.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.