550
Views
3
CrossRef citations to date
0
Altmetric
Research Article

Exploring variability during data preparation: a way to connect data, chance, and context when working with complex public datasets

, &
Pages 312-330 | Published online: 24 May 2021
 

ABSTRACT

Data preparation (also called “wrangling” or “cleaning”) – the evaluation and manipulation of data prior to formal analysis – is often dismissed as a precursor to meaningful engagement with a dataset. Here, we re-envision data preparation in light of calls to prepare students for a data-rich world. Traditionally, curricular statistics explorations involve data that are derived from observations that students record themselves or that reflect familiar, relatively closed systems. In contrast, pre-constructed public datasets are much larger in scope and involve temporal, geographic, and other dimensions that complicate inference and blur boundaries between “signal” and “noise.” As a result, students have fewer opportunities to consider sources of variability in such datasets. Due to these constraints, we argue that data preparation becomes an important site for students to reason about variability with public data. Through analyses of repeated task-based interviews with five pairs of adolescent participants, we find that specific actions during data preparation, such as filtering data or calculating new measures, presented opportunities to engage leaners with variability as they prepared and analyzed several public socioscientific datasets. More broadly, our study highlights some changes to theory and curriculum in statistics education that are necessitated by a focus on “big data literacy”.

Acknowledgments

We are grateful for the sustained and enthusiastic contributions from our youth participants. We also thank Susanne Schnell, Jennifer Noll, our anonymous reviewers, and the 2019 Statistical Reasoning, Thinking, and Literacy cohort for their insightful commentary and feedback on this research at various stages of formulation. This material is based upon work supported by the National Science Foundation under Grant No. IIS-1530578. The larger project this research was a part of was conducted in collaboration with partners at the Concord Consortium.

Notes

1. White, Collins, & Fredericksen (2011) call these “model types” and “modeling strategies”. We use the original terms (a) to emphasize knowledge construction as result of data moves, and (b) to avoid confusion with other definitions of modeling used in the special issue.

2. All participant names used in this paper are pseudonyms.

Additional information

Notes on contributors

Michelle Hoda Wilkerson

Michelle Hoda Wilkerson is an Assistant Professor of the Graduate School of Education at the University of California, Berkeley. Her research explores middle and high school students' learning of computational practices (e.g., simulation, visualization) in mathematics in science. Most recently, this has led her to examine students' development of data literacy and statistical reasoning in technology-rich, „big data„ contexts.

Kathryn Lanouette

Kathryn Lanouette is an Assistant Professor at the School of Education at William & Mary. Her research examines the ways in which children's science learning develops across multiple contexts, focusing on the interplay of data, place and digital spatial technologies to support ambitious and equitable learning opportunities. 

Rebecca L. Shareff

Rebecca L. Shareff is a recent PhD graduate from the University of California, Berkeley Graduate School of Education.  Her thesis work examined the development of students' ecological knowledge and computational modeling skills through garden-based science lessons. She is currently a User Experience Researcher at Google focusing on information security. 

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 451.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.