Abstract
Habitat models of species distributions provide useful information about species and biodiversity spatial patterns, which form the basis of many ecological applications and management decisions such as the definition of conservation priorities and reserve selection. These models, however, are frequently based on existing datasets which have been collected in an unbalanced (biased) manner. In this study we investigated the effects of data sampling bias on model performance, interpretation and particularly spatial predictions. We collected a large steppe bird dataset in southern Portugal, following a carefully designed sampling scheme and then sub-sampled this dataset, roughly discarding between 80% and 90% of the observations, with varying degrees of geographical bias and random sampling. We characterised the data subsets in terms of data reduction and environmental bias. Multivariate adaptive regression splines (MARS) models were run on all datasets, and all the subset models compared with the baseline to assess the effect of the respective biases.
We found that environmental bias in the datasets was very influential on the predicted spatial patterns of species occurrences. It is therefore important that special attention is paid to the quality of existing datasets used in habitat modelling, as well as the sampling design for collection of new data. Also, when modelling with biased datasets, the ecological interpretation of such models should be made with caution and explicit awareness of the existing bias.
Acknowledgements
This work benefited from a research grant by FCT (SFRH/BD/12569/2003) and support from Liga para a Protecção da Natureza (LPN), Sociedade Portuguesa para o Estudo das Aves (SPEA), Instituto de Estradas de Portugal (IEP) and Perímetro Florestal da Contenda. Supplementary data were provided by Pedro Rocha, Ana Delgado and Inês Henriques and research projects EDIA/PMo5.4, PRAXIS/C/AGR/11062/1998, PRAXIS/C/AGR/11063/1998 and LIFE02/NAT/P/8476. Additional support was provided by Prof. Patrick Hostert and Prof. Tobia Lakes at the Geomatics Department, Humboldt-Universität zu Berlin. The R scripts used for fitting the MARS models and cross-validations were kindly provided by Jane Elith. Maria João Santos provided helpful comments on an earlier version of this work and comments from Carsten Doorman, Marc Kéry and an anonymous referee further improved the manuscript.