1,050
Views
14
CrossRef citations to date
0
Altmetric
Reproducibility and Replicability Forum

Practical Reproducibility in Geography and Geosciences

Pages 1300-1310 | Received 15 Nov 2019, Accepted 08 Apr 2020, Published online: 13 Oct 2020
 

Abstract

Reproducible research is often perceived as a technological challenge, but it is rooted in the challenge to improve scholarly communication in an age of digitization. When computers become involved and researchers want to allow other scientists to inspect, understand, evaluate, and build on their work, they need to create a research compendium that includes the code, data, computing environment, and script-based workflows used. Here, we present the state of the art for approaches to reach this degree of computational reproducibility, addressing literate programming and containerization while paying attention to working with geospatial data (digital maps, geographic information systems). We argue that all researchers working with computers should understand these technologies to control their computing environment, and we present the benefits of reproducible workflows in practice. Example research compendia illustrate the presented concepts and are the basis for challenges specific to geography and geosciences. Based on existing surveys and best practices from different scientific domains, we conclude that researchers today can overcome many barriers and achieve a very high degree of reproducibility. If the geography and geosciences communities adopt reproducibility and the underlying technologies in practice and in policies, they can transform the way researchers conduct and communicate their work toward increased transparency, understandability, openness, trust, productivity, and innovation.

可再现研究常常被认为是技术上的挑战, 其实它是数字化时代改善科研交流的挑战。在使用计算机并希望别的科学家对成果进行检验、理解、评价和使用的时候, 我们需要建立研究概述, 包括程序、数据、计算环境和工作流程。本文描述了能实现计算可再现性的方法的现状, 讨论了编程和集装化, 尤其关注了地理空间数据的研究(数字地图, 地理信息系统)。所有使用计算机的研究人员, 都应当了解这些技术、进而控制计算环境。本文还展示了在实践中采用可再现工作流程的优势。通过几个研究概述的实例, 说明了本文所表达的概念, 也是地理学和地球科学所面临的挑战的基础。基于现有来自不同科学领域的调查和最佳实践, 我们认为, 研究人员可以克服许多障碍, 实现高度的可再现性。如果地理学和地球科学在实践和政策中采用可再现性及其相应的技术, 就可以使科研和沟通变得更加透明、可理解、开放、可信、高产和创新。

La investigación reproducible se percibe a menudo como un desafío tecnológico, aunque eso esté arraigado en el reto de mejorar la comunicación erudita en la edad de la digitalización. Cuando los computadores se ven involucrados y los investigadores quieren permitir a otros científicos inspeccionar, entender, evaluar y construir a partir de su trabajo, ellos necesitan crear un compendio de investigación que incluya código, datos, el entorno computacional y los flujos de trabajo basados en guion que se usan. Aquí presentamos el estado del arte sobre los enfoques que alcanzan este grado de reproducibilidad computacional, abocando competencia en programación y transporte en contenedores, al tiempo que se presta atención al trabajo con datos geoespaciales (mapas digitales, sistemas de información geográfica). Sostenemos que todos los investigadores que trabajan con computadores deben entender las tecnologías relacionadas para controlar su entorno computacional, y presentamos los beneficios de los flujos de trabajo reproducible en la práctica. Ejemplos de compendios de investigación ilustran los conceptos presentados y son la base para retos específicos para la geografía y las geociencias. Con base en estudios existentes y las mejores prácticas de diferentes dominios científicos, concluimos que ahora los investigadores pueden sobrepasar muchas barreras y lograr un muy alto grado de reproducibilidad. Si la geografía y las geociencias y sus comunidades adoptan la reproducibilidad y las tecnologías subyacentes en sus prácticas y en sus políticas, ellas podrán transformar el modo como los investigadores realizan y comunican su trabajo en pro de mayor transparencia, comprensibilidad, apertura, confianza, productividad e innovación.

Acknowledgments

We thank Celeste R. Brennecka from the Scientific Editing Service, University of Münster, for her editorial review. Two anonymous reviewers provided valuable comments on an earlier version of this article. We thank the organizers of the workshop on Reproducibility and Replicability in Geospatial Research at Arizona State University and of the subsequent Forum for the opportunities to contribute.

Notes

1 Reproduction means that the authors’ materials are available for third parties to re-create identical results, whereas replication means different data and methods lead to the same findings. From a computational standpoint, identical is more complicated than it sounds; for example, floating point computations might result in small yet insignificant numerical differences, or image-rendering algorithms might introduce nondeterministic artifacts.

2 For example, CRAN (https://cran.r-project.org) and renv (https://cran.r-project.org/package = renv) for R, or PyPI (https://pypi.org/) and conda (https://conda.io) for Python, which even has tooling for separating full installations in virtual environments; for example, virtualenv (https://virtualenv.pypa.io).

3 For the simplicity of the argument, we use recipe instead of Dockerfile and containers as a catch-all term, whereas the experienced reader might expect a distinction between container and image.

4 Docker is the most common containerization solution today (see https://en.wikipedia.org/wiki/Docker_(software)). It is open source, and relevant parts are standardized (see https://www.opencontainers.org/).

5 Singularity is mostly used in scientific contexts and high-performance computing, see Kurtzer, Sochat, and Bauer (Citation2017).

6 Proprietary software cannot be avoided in some areas, such as the system BIOS or device drivers.

7 The notebook might render directly into submission-ready manuscripts with R Markdown and the rticles package by Allaire et al. (Citation2020), which supports a variety of journals, including the publisher of the Annals, Taylor & Francis, and other publishers close to the disciplines such as AGU or Copernicus Publications (EGU).

8 See https://research-compendium.science/ for a minimal definition, extensive literature, and examples. The R (R Core Team 2019) community is at the forefront of enabling reproducibility both in the available tools and in the mindset of the user community (e.g., Pebesma, Nüst, and Bivand Citation2012b; Marwick Citation2015).

9 For example, Sage (Estop Citation2019), De Gruyter (Code Ocean Citation2018), or Nature (“Easing the Burden of Code Review” Citation2018).

10 All articles in this special issue on software for spatial statistics in the Journal of Statistical Software are in principle reproducible, but these articles by software developers are probably not representative of the whole community using the software.

11 The largest study to date, it reproduced thirty-one research articles. See the full list at https://osf.io/sfqjg/.

13 A summary of the issues, changes, suggestions, and subsequent communication with the authors is available at https://github.com/geoss/acs_demographic_clusters/issues/2.

14 The Carpentries (https://carpentries.org/) is an excellent resource to learn data science skills outside of topical studies.

Additional information

Funding

This work is supported by the project Opening Reproducible Research II (https://www.uni-muenster.de/forschungaz/project/12343) funded by the German Research Foundation (DFG) under project number PE 1632/17-1.

Notes on contributors

Daniel Nüst

DANIEL NÜST is Researcher at the Institute for Geoinformatics, University of Münster, 48149 Münster, Germany. E-mail: [email protected]. He develops tools for creation and execution of research compendia in geography and geosciences in the project Opening Reproducible Research (o2r, https://o2r.info).

Edzer Pebesma

EDZER PEBESMA is Professor of Geoinformatics at University of Münster, 48149 Münster, Germany. E-mail: [email protected]. He is developer and maintainer of several popular R packages for handling and analyzing spatial and spatiotemporal data (sp, spacetime, gstat, sf).

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 312.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.