Abstract
The world is full of text data, yet text analytics has not traditionally played a large part in statistics education. We consider four different ways to provide students with opportunities to explore whether email messages are unwanted correspondence (spam). Text from subject lines are used to identify features that can be used in classification. The approaches include use of a Model Eliciting Activity, exploration with CODAP, modeling with a specially designed Shiny app, and coding more sophisticated analyses using R. The approaches vary in their use of technology and code but all share the common goal of using data to make better decisions and assessment of the accuracy of those decisions.
Additional information
Notes on contributors
Nicholas J. Horton
Nicholas J. Horton is Beitzel Professor of Technology and Society (statistics and data science) at Amherst College. He earned his doctorate in biostatistics from the Harvard School of Public Health in 1999 and has co-authored a series of books about data science and statistical computing. He is a member of the ASA Board of Directors and co-chair of the National Academies Committee on Applied and Theoretical Statistics. This work is part of a larger project with this team that stemmed from Horton’s work as a Tinker Fellow with the Concord Consortium.
Jie Chao
Jie Chao is a learning scientist at the Concord Consortium. She earned her PhD in instructional technology and STEM education from the University of Virginia in 2012. Chao is the principal investigator of multiple NSF-funded projects on innovative approaches to STEM teaching and learning. Her research focuses on designing learning environments, helping students develop computational thinking skills, mathematical modeling competencies, and understanding artificial intelligence.
William Finzer
William Finzer is a senior scientist at the Concord Consortium, where he leads the development of CODAP. He serves as co-principal investigator on the NSF-funded StoryQ, M2Studio, and Boosting Data Fluency projects. Finzer’s work centers on bringing data science into the K–12 curriculum and integrated across subject areas through the creation of data exploration software designed to be accessible and usable in the classroom.
Phebe Palmer
Phebe Palmer is a recent graduate from Amherst College, having earned a BA in statistics in 2021. Her research centers largely on STEM education, having assisted with projects focused on approaches to statistics pedagogy, as well as equitable access to STEM curriculum. She works as a research assistant at SageFox Consulting Group, based in Amherst, Massachusetts.