300
Views
0
CrossRef citations to date
0
Altmetric
Taking a Chance in the Classroom

Spam Four Ways: Making Sense of Text Data

 

Abstract

The world is full of text data, yet text analytics has not traditionally played a large part in statistics education. We consider four different ways to provide students with opportunities to explore whether email messages are unwanted correspondence (spam). Text from subject lines are used to identify features that can be used in classification. The approaches include use of a Model Eliciting Activity, exploration with CODAP, modeling with a specially designed Shiny app, and coding more sophisticated analyses using R. The approaches vary in their use of technology and code but all share the common goal of using data to make better decisions and assessment of the accuracy of those decisions.

Additional information

Notes on contributors

Nicholas J. Horton

Nicholas J. Horton is Beitzel Professor of Technology and Society (statistics and data science) at Amherst College. He earned his doctorate in biostatistics from the Harvard School of Public Health in 1999 and has co-authored a series of books about data science and statistical computing. He is a member of the ASA Board of Directors and co-chair of the National Academies Committee on Applied and Theoretical Statistics. This work is part of a larger project with this team that stemmed from Horton’s work as a Tinker Fellow with the Concord Consortium.

Jie Chao

Jie Chao is a learning scientist at the Concord Consortium. She earned her PhD in instructional technology and STEM education from the University of Virginia in 2012. Chao is the principal investigator of multiple NSF-funded projects on innovative approaches to STEM teaching and learning. Her research focuses on designing learning environments, helping students develop computational thinking skills, mathematical modeling competencies, and understanding artificial intelligence.

William Finzer

William Finzer is a senior scientist at the Concord Consortium, where he leads the development of CODAP. He serves as co-principal investigator on the NSF-funded StoryQ, M2Studio, and Boosting Data Fluency projects. Finzer’s work centers on bringing data science into the K–12 curriculum and integrated across subject areas through the creation of data exploration software designed to be accessible and usable in the classroom.

Phebe Palmer

Phebe Palmer is a recent graduate from Amherst College, having earned a BA in statistics in 2021. Her research centers largely on STEM education, having assisted with projects focused on approaches to statistics pedagogy, as well as equitable access to STEM curriculum. She works as a research assistant at SageFox Consulting Group, based in Amherst, Massachusetts.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.