2,046
Views
5
CrossRef citations to date
0
Altmetric
Teacher's Corner

Integrating Ethics into the Guidelines for Assessment and Instruction in Statistics Education (GAISE)

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 323-330 | Received 22 Aug 2022, Accepted 04 Dec 2022, Published online: 04 Jan 2023

Abstract

Statistics education at all levels includes data collected on human subjects. Thus, statistics educators have a responsibility to educate their students about the ethical aspects related to the collection of those data. The changing statistics education landscape has seen instruction moving from being formula-based to being focused on statistical reasoning. The widely implemented Guidelines for Assessment and Instruction in Statistics Education (GAISE) Report has paved the way for instructors to present introductory statistics to students in a way that is both approachable and engaging. However, with technological advancement and the increase in availability of real-world datasets, it is necessary that instruction also integrate the ethical aspects around data sources, such as privacy, how the data were obtained and whether participants consent to the use of their data. In this article, we propose incorporating ethics into established curricula and integrating ethics into undergraduate-level introductory statistics courses based on recommendations in the GAISE Report. We provide a few examples of how to prompt students to constructively think about their ethical responsibilities when working with data.

1 Introduction

Introductory statistics courses are taken by hundreds of thousands of undergraduate students every year. This is typically, not always, a single-semester introduction to statistics course geared toward students majoring in nonquantitative disciplines. In 2016, a committee of the American Statistical Association (ASA) constructed a list of college majors that require a course in statistics or quantitative methods, and although it is only a partial list, it includes more than 140 majors (American Statistical Association Citation2016). Most students who take these courses will never take another statistics course, so statistics educators have had extensive discussions over the years about the ideal use of this limited opportunity to educate students about statistics.

Before the early 1990s, the traditional introductory statistics course focused on how to carry out statistical analyses, with emphasis on formulas and procedures. Throughout the 1990s and early 2000s a major shift occurred, partly instigated by forward thinkers such as George Cobb, Robert Hogg, David Moore, and others. Garfield et al. provide an overview of some of the workshops, grants, and publications that led to this shift (Garfield et al. Citation2002). They note, “In recent years many statisticians have become involved in the reform movement in statistics education… A principle (sic) aspect of the reform movement is the focus on concepts, reasoning and thinking” [p. 2]. In 2005, the ASA published the first version of the college level “Guidelines for Assessment and Instruction in Statistics Education,” more commonly known as the College GAISE Report (Aliaga et al. Citation2005). An updated version was published in 2016 (GAISE College Report ASA Revision Committee Citation2016) and will be referenced throughout this article as the “GAISE Report.” The GAISE Reports were built around the statistical problem-solving process consisting of four components: formulate questions, collect/consider data, analyze data, and interpret results.

It did not take long for statistics educators to realize that it is much more challenging to teach a course focused on concepts and reasoning than to teach one focused on formulas and procedures. At the 2005 United States Conference on Teaching Statistics, Roxy Peck outlined some of these difficulties in an entertaining presentation titled “How Did Teaching Introductory Statistics Get to Be So Complicated?!?” (Peck Citation2005). She summarized the old method as “Lecturing at the Bored” and then gave examples of how to engage students in more meaningful ways. Since that time, many textbooks, journal articles and other resources have emerged as tools to make it easier for statistics instructors to focus on concepts and reasoning (Bradstreet Citation1996; Garfield et al. Citation2008; Parke Citation2008).

The reform movement of the 1990s and early 2000s emphasized using real data for examples and analyses. However, we are concerned that the quest to use interesting real-world examples and datasets has missed the significance of ethics. Now there is also greater reliance on secondary datasets gathered for other purposes being used to answer newly posed investigative questions. Thus, it is important to consider the ethical issues involved in the use of data. Additional ethical issues such as informed consent, data privacy, and the ease of posting fake data have arisen since the early 2000s because of the growth of the Internet and social media. Therefore, incorporating ethics into statistics education should be more of a priority now than it was 20 years ago. There are many recent examples of work regarding ethical issues in this big data era (Dwork Citation2011; Barocas and Nissenbaum Citation2014; De Veaux et al. 2017; O’Neil Citation2017; Hand Citation2018; Tractenberg Citation2019; Mittelstadt and Kwakkel Citation2020; Nissenbaum Citation2020; Washington and Kuo Citation2020; Gebru et al. Citation2021).

The ASA, International Statistical Institute, and other professional associations have established various ethical guidelines for statisticians (American Statistical Association Citation1999, Citation2018; International Statistical Institute Citation2018), each of which echo a common set of principles and shared professional values. They discuss the relevance of ethical norms for statisticians and provide an applied practice framework. However, the guidelines do not mention how ethics relates to the statistics education process such as in an introductory statistics classroom. At the graduate level, one survey found that as of 2015, 35% of the universities offering graduate (bio)statistics programs “required an ethics course for at least some students to obtain the degree” (Lee, McCarty, and Zhang Citation2015). Commonly, graduate students’ exposure to ethical principles and training materials starts with programs for responsible conduct of research. However, restricting ethical responsibility to general research integrity and introducing it later in graduate school is problematic for at least three reasons. First it denies many students who may go on to be practitioners the opportunities to reflect on ethical issues in statistics. Second, it risks being too late for students to optimally master the skills involved in understanding and incorporating ethics into statistics practice. Third, it ignores broader ethical questions. Hence, we propose making ethics a key component of discussions in introductory-level undergraduate statistics classrooms.

Although imparting statistical ideas and procedures is the primary goal of most introductory undergraduate statistics courses, students not understanding the data-generating pipeline are at risk of not developing the essential tools to engage with ethical predicaments that they may encounter. Ethics is an important component in introductory statistics education—particularly since many students are not statistics majors and may be completing a statistics course for the first and last time. More broadly, educated citizens need an understanding of elementary statistics (Utts Citation2003). A few papers have offered suggestions and examples for incorporating ethics into the statistics education curriculum as a whole (Moore Citation1997; Utts Citation2021; Baumer et al. Citation2022).

Statistics educators in introductory classrooms are in a unique position to prepare students, who may eventually be consumers of data, producers of data or both (Hotelling Citation1940; In reality, there are more than two categories of students, though for simplicity, we use the consumer/producer dichotomy.) Both groups operate and interact with data in different ways and the learning competencies and goals for a consumer of statistics differ from that for a producer. For instance, a student who eventually becomes a medical researcher will need to understand more of the technical aspects of statistics than one who eventually becomes an artist. They nevertheless often face challenges from a family of related ethical responsibilities, and both need to understand ethical issues related to the statistical problem-solving process.

Our work was motivated from growing concerns regarding unethical uses of data (Narayanan and Shmatikov Citation2008; De Montjoye et al. Citation2013; Kramer, Guillory, and Hancock Citation2014; Xiao and Ma Citation2021). Undergraduate data science programs have recognized this and have included ethics as one of the core topics in their curriculum guidelines (De Veaux et al. 2017; De Veaux et al. Citation2022). Another paper (Tractenberg Citation2016b) discussed integrating ethics into undergraduate and graduate nonmajor statistics courses using the components of ASA’s Ethical Guidelines for Professional Practice (American Statistical Association Citation2018). Since the GAISE Report has been widely implemented by introductory statistics educators, we will focus on using the GAISE recommendations to weave ethical themes into introductory courses. To our knowledge this has not been done before, and we believe that integrating ethics using the GAISE provides a great opportunity for instructors to engage students in discussions on ethics in an effective way. The GAISE Report is structured with instructor recommendations and corresponding student goals. There is no explicit mention of ethics in the recommendations for instructors in the GAISE Report, although one of the student goals states “Students should demonstrate an awareness of ethical issues associated with sound statistical practice” (GAISE College Report ASA Revision Committee Citation2016). The GAISE Report implicitly addresses ethical concerns with recommendations about respecting subjects’ privacy, seeking appropriate permissions for using datasets, and committing to open scientific inquiry. We aim to make explicit some of the ethical considerations underlying these and other GAISE instructional goals. We discuss some examples and offer some recommendations for ethically sensitive statistics pedagogy. We offer suggestions for incorporating statistical ethics into the curriculum and integrating into the classroom based on the six GAISE recommendations. We then discuss how instructors can consult with an applied ethicist for curricular design.

We acknowledge there are ethical considerations bearing on many (perhaps all) areas of statistics. An entire paper could be written on the ethics of each step of the statistical problem-solving process, separately for consumers, producers, and others. We chose to focus on the data collection process in this article because it is the area that is likely to be of the most concern to both consumers and producers due to growing concerns regarding unethical uses of data. The data collection process is also easier for students in an introductory statistics course to engage in regardless of their statistics knowledge.

2 Statistical Ethics and the Classroom

Multiple studies have suggested it is important to establish a set of principles and discuss ethical conduct in the classroom to help students understand their responsibilities and potential impacts of what they do (Shulman Citation2002; Vardeman and Morris Citation2003; Lesser and Nordenhaug Citation2004; Tractenberg et al. Citation2015; Tractenberg Citation2016a; Elliott, Stokes, and Cao Citation2018). The American Statistical Association’s Ethical Guidelines state that the ethical statistical practitioner “[s]erves as an ambassador for statistical practice by promoting thoughtful choices about data acquisition, analytic procedures, and data structures among nonpractitioners and students” (American Statistical Association Citation2018). Presumably, this role of ambassador applies to statistics educators as well as to practitioners. Statistics education at all levels includes studies of examples and applications that involve human subjects. Hence, statistics educators have a responsibility to educate their students about the ethical aspects related to the collection of those data.

Ethical principles can be incorporated in the classroom either through standalone courses or integrated into the existing curriculum. But given how little room there is for major curricular change in most institutions, we believe that there are opportunities for significant improvement with an important but incremental change by incorporating ethics into established curricula. This would minimize the burdens of curricular change while reaping the benefits of increased attention to ethical concerns.

2.1 GAISE Report

The GAISE Report is structured with six core recommendations for instructors and nine goals for students in introductory statistics courses. The six core recommendations to teaching statistics are as follows:

  1. Teach statistical thinking.

  2. Focus on conceptual understanding.

  3. Integrate real data with a context and purpose.

  4. Foster active learning.

  5. Use technology to explore concepts and analyze data.

  6. Use assessments to improve and evaluate student learning.

The GAISE Report provides a platform for integrating statistics teaching with ethics principles by actively engaging students using case studies of ethics violations (Baggerly and Coombes Citation2009) and discussion of real-world ethical issues (Stein Citation2015). In the following sections, we tie together each of the six recommendations with a few ethical considerations, particularly pertaining to data collection. We also suggest some activities and exercises. We do not provide a comprehensive list of ethical principles that need to be considered. Our goal is to provide a few examples of how to engage and prompt students to think about their ethical responsibilities regarding data collection.

2.1.1 Recommendation 1: Teach Statistical Thinking

The first GAISE recommendation, “Teach statistical thinking,” emphasizes that students consider statistics to be a rational problem-solving and decision-making process. Our understanding of this recommendation is to suggest educators teach students to develop critical reasoning and thinking skills. These skills are importantly different than what it takes to plug numbers into formulas. In a similar vein, students need to be educated about basic ethical principles and how to incorporate ethics when critically evaluating the data they work with or statistics they consume.

Instructors can begin students’ exposure to ethical principles by drawing on readily available platforms for research ethics such as programs for responsible conduct of research (Institute of Medicine and National Research Council Citation1989) and training modules developed and offered by the Collaborative Institutional Training Initiative (CITI) program (Collaborative Institutional Training Initiative Citation2017). This process is likely familiar to statisticians collaborating on any human subjects research. There are several landmark reports that are commonly used in introductory ethics training in STEM fields. One example is the Belmont report (Ryan et al. Citation1979), that provides ethical guidelines for the treatment of human subjects. It sets out some key ethical principles to shape any human subjects research: respect for persons, beneficence, and justice (Paxton Citation2020). Educators can then facilitate studies of how those principles guide researchers and how regulatory bodies such as Institutional Review Boards (IRBs) uphold such principles. Case studies of ethical violations can also be useful in illustrating the kinds of challenges that can arise.

Typical introductory undergraduate statistics courses use classic datasets such as the Titanic dataset (https://www.kaggle.com/c/titanic) and the credit card approval dataset (https://www.kaggle.com/datasets/rikdifos/credit-card-approval-prediction). Although IRBs vet studies in which researchers actively collect data from participants, ethical concerns appear even for those who conduct secondary data analysis or obtain their datasets through a textbook. With the availability of topic-centric data repositories that are instantaneously and publicly accessible (AwesomeData – GitHub Repository 2022), it is natural for students to jump right into the analytics and not think about ethical issues on data provenance. Gardenier notes, “An issue that may be overlooked is how do we know what the data are? They do not simply appear out of nowhere, neatly organized and arrayed for analysis” (Gardenier Citation2011). Educators can promote discussion among students using existing frameworks or prompts about the ethical aspects of using these datasets. Possible discussion prompts are:

  • Do we have permission to use these datasets?

  • Did the participants consent to their data being used?

  • Who collected the data and made it publicly available?

  • Do the data have identifiable information?

  • Were the variables collected in an accurate manner?

  • Should we analyze data if we do not know how the data were collected?

  • May researchers prioritize certain principles from the Belmont report over others?

When using classic datasets, instructors could also prompt discussions that revolve around ethics. The Titanic dataset contains information on identifiable people from more than a century ago and could be used to prompt discussions on consent. When using the credit card data, possible discussion points are:

  • Are the data sufficiently anonymized or old to be free of ethical concerns?

  • How were these data used to discriminate in the issuance of consumer credit?

  • Should race be used as a variable?

  • Are researchers who use certain variables complicit in or perpetuating some undue stigma or discrimination against minority subgroups?

Another useful resource may be the Open Data Institute that provides the Data Ethics Canvas (Open Data Institute Citation2021), which can also be used as a framework to develop ethical guidance.

2.1.2 Recommendation 2: Focus on Conceptual Understanding

The revised GAISE’s second recommendation encourages educators to help students go deeper than surface level understandings of concepts. Many research contexts call for nuanced understanding of ethical complexities. Few such ethical challenges have a unique or a straightforward solution. Thinking in terms of some key ethical principles might help students to reason constructively about what they have learnt and to be more adept at applying their learning to other and more complex situations they might encounter.

The three Belmont principles mentioned earlier, respect for persons, beneficence, and justice, draw on conceptual frameworks that are implemented, respectively, as informed consent, assessment of risks and benefits, and fairness in outcomes and selection of subjects. Soliciting informed consent is one application of the ethical obligation to treat each research subject as someone who is in charge of their lives. Researchers show subjects the respect they are due by honoring their contribution toward the advancement of knowledge. Researchers offer prospective subjects appropriate information about what their participation in a study would involve. They solicit subjects’ agreement to be part of the study. This request conveys respect for their ability to comprehend the risks and benefits and to decide whether to consent to participate (Clayton Citation2005).

When researchers construct studies, they must also assess the risks for human subjects. Researchers must consider whether there is an appropriate balance of risks and benefits for subjects in the study. Among the common potential risks for subjects are loss of privacy and damage to personal or professional reputation. There are further possible risks to researchers in particular and for research in general such as might come from poorly designed studies, unanticipated events, or adverse events. There might be possible benefits for subjects from some clinical or educational gains or contributing to the growth of scientific knowledge in general. Attending to the balance of risks and benefits is one expression of a commitment to avoiding harm to subjects and promoting their welfare.

Instructors might draw on the Belmont framework and institutional review processes for classroom activities that could help to deepen students’ understanding of the ethical concepts. Students could submit a mock IRB protocol or participate in a mock IRB meeting. In each case, students would attend to the ethical principles and how they bear on complex and challenging case studies such as the Havasupai lawsuit (Van Assche, Gutwirth, and Sterckx Citation2013). This lawsuit was brought about by the Havasupai tribe. Members had donated their blood samples for diabetes research but learnt that researchers then used the specimens to investigate other diseases and genetic markers. This case challenged the meaning and scope of informed consent, particularly from members of vulnerable populations. Considering such and similar case studies could allow students to think about the meaning and significance of ethical principles, apply them to important research contexts, and practice giving and receiving clear, depersonalized feedback in a public setting (Emery, Harvey, and Andersen Citation2006; Ritchie Citation2021).

These would not be idle exercises for students. Some of them will go on to be part of human subjects research. Others might be called on for their expertise in regulatory review of other protocols. Indeed, many IRBs allow students with an academic interest in research ethics and the functioning of IRBs in general to observe convened meetings. These experiences can be invaluable since they provide insights into review procedures and how institutionalized ethical principles function for regulatory bodies. It is common and desirable for an IRB to include one or more statisticians. Statisticians often also participate as a member or advisor on other boards such as scientific review panels at the National Institutes of Health (NIH). Guidelines for funding from the NIH or other federal agencies require that researchers conducting a clinical trial assemble a data safety monitoring board that includes one or more statisticians to ensure the safety of human subjects in a clinical trial. Hence, for students who go on to practice statistics, observing IRB meetings gets them thinking about being a statistician and their involvement in professional ethics bodies. For students who never take another statistics course, understanding the purpose and practices of IRBs would provide an appreciation of some of the ethical concerns in research on humans and animals.

2.1.3 Recommendation 3: Integrate Real Data with a Context and Purpose

Educators mindful of GAISE’s third recommendation can provide students opportunities to apply what they learned in the classroom. We are in an era of big data. Corporations and governments collect vast quantities of information from people discreetly though websites we visit or devices that we use. Researchers can now more easily integrate real data while ignoring the context under which the data were collected. Including case studies involving real data is one powerful way to help students spot ethical dilemmas, brainstorm alternative strategies to resolve them, and develop an ethical mindset. Below are a few real data examples that can be used to illuminate key ethical challenges:

  • Patient consent: Students can review the Facebook emotional contagion experiment (Kramer, Guillory, and Hancock Citation2014). Educators can guide them to reflect on themes such as data privacy, consent, and reidentification of data.

  • Privacy when using publicly available data: Educators can lead students in a review of the OK Cupid data release (Xiao and Ma Citation2021). This can be used as an example of how quantitative researchers need to revisit existing ethical guidelines and assumptions when working with publicly available data.

  • Poor anonymization of data: The release of the taxi dataset by the New York City Taxi & Limousine Commission - of 173 million individual cab rides, with the pickup and drop-off times, locations, fare and tip amounts—provides a practical exercise in understanding how poorly anonymized datasets can be used to correctly identify a person if sufficient attributes are provided (De Montjoye et al. Citation2013; Metcalf and Crawford Citation2016; Rocher, Hendrickx, and De Montjoye Citation2019).

  • Reidentification when datasets are linked: Netflix released a dataset with anonymous movie ratings by half a million Netflix subscribers. They offered a prize for the best algorithm to predict user ratings for movies. Researchers were later able to identify individual users by linking this dataset with film ratings on the Internet Movie Database. This linkage exposed political preferences and other potentially sensitive information (Narayanan and Shmatikov Citation2008).

2.1.4 Recommendation 4: Foster Active Learning

The GAISE Report unpacks “active learning” in terms of what encourages students to be critically engaged in statistical thinking and practice. Active learning strategies such as peer-to-peer discussions allow students to listen to other points of view. Students can then better appreciate that many ethical matters elicit reasonable disagreement. Students can nevertheless improve their skills at communicating and evaluating the competing ethical considerations at stake in many disputes.

Below are some discussion prompts educators might use to facilitate active discussions in the classroom. Each would engage students and encourage their reflection on the ethics of the work they do and the statistics they consume:

  • Was this research conducted using data obtained ethically?

  • Were the participants aware of the ways their data would be used for research?

  • If the data were publicly mined, is there information that could potentially be used to identify a particular person?

  • Can participants provide informed consent to the use of their data in applications or algorithms that do not yet exist, and especially when these applications are not yet even foreseen?

For a classroom activity, students could provide data for an investigation and discuss possible ethical concerns. This could be done through traditional means such as a survey or through nontraditional approaches such as where student-generated photographs are used as data (Arnold, Perez, and Johnson Citation2021). Discussion questions could include:

  • What are the possible ethical implications of using such data?

  • Would the students be willing to share the data with others outside of the classroom for analysis in different contexts?

  • Would their willingness to share data depend on the mode of data collection or questions being evaluated?

2.1.5 Recommendation 5: Use Technology to Explore Concepts and Analyze Data

Authors of the GAISE Report explain that “it is important to view the use of technology not just as a way to generate statistical output but as a way to explore conceptual ideas and enhance student learning.” Hands-on exercises using technology can help instructors demonstrate in real-time how technology has in many ways complicated the ethical issues around data, such as privacy. For example, regulations associated with data protection permit the sharing of de-identified data. In human subjects research, de-identification implies that the 18 unique identifiers provided in the Privacy Rule of the U.S. Health Insurance Portability and Accountability Act of 1996 (HIPAA) be completely removed or there must be a very low likelihood that participants can be reidentified. Researchers study health disparities by using administrative datasets and linking them via geocoding to publicly available neighborhood level socioeconomic information such as the US Census. Though two linked datasets may independently maintain standards required for de-identification, a growing concern has been the risk of reidentification of records by linking data with similar attributes in another dataset or potential cross-classification of multiple variables (Malin Citation2006; Lubarsky Citation2010; Smith Citation2016; Simon et al. Citation2019). This serves as a good example to demonstrate how complex de-identification is and why it is valuable to evaluate not just the individual datasets one works with but also the ethical impacts of linking two or more datasets (Bowen Citation2021a, Citation2021b; Garfinkel and Bowen Citation2022).

Here is an example of a class activity that can dramatize the ethical risks from data collection. A class might collect data from students and then practice de-identifying it. Educators might structure the activity in stages, with variable amounts of data protection, all to show the impacts on students’ privacy. (Obviously, the exercise would need to be mindful of privacy breaches when making a point about the risks of privacy breaches.) Students can evaluate the tradeoffs between data quality and identifiability by assessing alternative methods of retaining, for example, dates of an event, duration from a specific time point, or other potential identifiers. A subsequent exercise would involve debriefing on whether they could guess the identity of their classmates from the de-identified data. This encourages active learning on the nonnull risk of identification possibility and issues to consider when assessing the level of identifiability of data. An interesting case study alongside this discussion to illustrate the importance of de-identification would be Sweeney’s translation of de-identifiable health records into identifiable data by linking it to a publicly available voter registration list (Sweeney Citation2002).

2.1.6 Recommendation 6: Use Assessments to Improve and Evaluate Student Learning

The GAISE Report’s authors note that recommendation six centers on formative and summative assessments and the importance of providing students with ongoing feedback about their learning. Summative assessments in an introductory statistical course typically involve an analysis project that evaluates understanding of concepts, data analysis, and interpretation skills. An extension to this could involve a brief write-up of ethical considerations of their data sources. Formative assessments could include a writing assignment on case studies where data ethics have been questioned (Kramer, Guillory, and Hancock Citation2014; Xiao and Ma Citation2021) and where students propose ethical safeguards that could be put into place.

3 Consulting with an Applied Ethicist for Curricular Design

Consultation with ethicists would help to frame curricular choices in light of the challenges statisticians face. Ethicists can best help not by proclaiming abstract moral principles nor by suggesting how to elicit students’ compliance. Doing so would risk turning ethics into a chore. Instead, ethicists can help statistics educators frame the ethical challenges shaping the study and practice of statistics. Statistics educators, at their best, help to empower students with the tools for navigating such challenges. They know from experience that such challenges are impediments to progress in their work and studies. Statistics educators appreciate curricular tradeoffs but are mindful of the growing significance of ethics. Ethics matters not merely because students increasingly care about it. It helps statisticians and their students better understand what they can, may, and must do when studying and engaging in statistics.

Ethicists—especially those familiar with interdisciplinary environments—would help to clarify the stakes of the dilemmas and offer some guidance for navigating the difficulties, including in the classroom. Ethicists can be allies for statisticians and their students in diagnosing the challenges they face and considering how best to move forward while mindful of the ethical risks and opportunities. This leaves the hardest ethical work where it belongs: in the hands of statisticians who have firsthand experience with ethical problems in statistics.

Ethics is hard. It is hard because it refers to a domain of reasons that present themselves in demanding ways. The reasons of ethics are typically especially weighty. They are usually more significant or compelling than other reasons. A reason not to murder a statistician is typically (we believe, always) weightier than a reason to prevail over one’s professional rivals. Ethical reasons sometimes obstruct what we otherwise want to do. (Sometimes we are strongly inclined to strangle our rivals, but we forbear doing so because that would be wrong.) This might make many ethical reasons somewhat inconvenient, but we often have an overriding concern with living well, which includes living according to our best ethical understandings. When it comes to more quotidian cases for statisticians, such as how to use data in ethically sensitive ways, producers and consumers of data typically want to avoid ethical lapses and do what is right. What it means to live well and do right are targets of much continuing discussion among ethicists, but we need not resolve age-old moral dilemmas to make progress on the challenges statisticians face. That is because statisticians know from professional experience what sort of challenges and opportunities there are to living well and succeeding as statisticians. Consulting an ethicist can then help to frame key concepts and ethical considerations in statistics pedagogy.

4 Discussion

As a result of the statistics education reform movement of the late 1990s and early 2000s, students who take introductory statistics courses now are much more prepared to be intelligent consumers of statistical information than were students who did so 30 years ago. But it is time to take another step forward and incorporate discussions about ethical issues into these courses. Societal changes over the past 20 years have made it imperative that students learn to think about ethical issues related to data collection, use, and interpretation, rather than simply about practical and informational aspects of collecting and using data.

Some of the changes that have occurred in the past 20 years include more publicly available data, including data for sale, scraping the web for data without informed consent, and the ease of using “black box” algorithms without considering the quality of the data or understanding what goes into those algorithms. These changes make it easy to violate ethical considerations such as data privacy, sometimes without even being aware that one is doing so. These changes also make it easier to use poor quality data but give the appearance of having solid results because the results were generated by sophisticated software. Often, users of “black box” algorithms have little understanding of how the algorithm actually works, and even less understanding of the quality of the data used to generate the algorithm.

In this article we have provided ideas about how to use the GAISE Report as a framework for incorporating ethics into introductory statistics education. We have offered some exercises and suggestions that educators can use to do so. Our focus has been on the data collection phase of the statistical problem-solving process, but similar ideas could be used for discussing ethical issues related to all stages. Since components of the problem-solving process cannot be truly separated, instructors should encourage students to evaluate ethics in each component and tie them together. For example, students should evaluate if the proposed question potentially has elements that will not be ethical (formulate questions) and if the data are reasonable and appropriate for answering the question (collect/consider data). This is particularly relevant when secondary datasets are used. We hope that the examples provided in the manuscript are useful for instructors in incorporating activities and discussions on ethics in their introductory statistics courses.

Disclosure Statement

The authors have no relevant financial or nonfinancial interests to disclose that are relevant to the content of this article.

Additional information

Funding

This was unfunded work.

References

  • Aliaga, M., Cobb, G., Cuff, C., Garfield, J., Gould, R., Lock, R., Moore, T., Rossman, A., Stephenson, B., Utts, J., Velleman, P., and Witmer, J. (2005), “Guidelines for Assessment and Instruction in Statistics Education (GAISE) College Report,” American Statistical Association [online]. Available at https://www.amstat.org/asa/files/pdfs/GAISE/2005GaiseCollege_Full.pdf.
  • American Statistical Association (1999), “Ethical Guidelines for Statistical Practice,” [online]. Available at https://www.amstat.org/asa/files/pdfs/EthicalGuidelines.pdf
  • American Statistical Association (2016), “College Majors Requiring Statistics,” [online]. Available at https://www.amstat.org/asa/files/pdfs/EDU-CollegeMajorsFlyer.pdf.
  • American Statistical Association (2018), “Ethical Guidelines for Statistical Practice,” [online]. Available at https://www.amstat.org/asa/files/pdfs/EthicalGuidelines.pdf.
  • Arnold, P., Perez, L., and Johnson, S. (2021), “Using Photographs as Data Sources to Tell Stories,” Harvard Data Science Review, 3. DOI: 10.1162/99608f92.f0a7df71..
  • AwesomeData - GitHub Repository (2022), [online]. Available at https://github.com/awesomedata/awesome-public-datasets
  • Baggerly, K. A., and Coombes, K. R. (2009), “Deriving Chemosensitivity from Cell Lines: Forensic Bioinformatics and Reproducible Research in High-Throughput Biology,” The Annals of Applied Statistics, 3, 1309–1334. DOI: 10.1214/09-aoas291..
  • Barocas, S., and Nissenbaum, H. (2014), “Big Data’s End Run around Anonymity and Consent,” in Privacy, Big Data, and the Public Good: Frameworks for Engagement, eds. J. Lane, V. Stodden, S. Bender and H. Nissenbaum, pp. 44–75, Cambridge: Cambridge University Press.
  • Baumer, B. S., Garcia, R. L., Kim, A. Y., Kinnaird, K. M., and Ott, M. Q. (2022), “Integrating Data Science Ethics into an Undergraduate Major: A Case Study,” Journal of Statistics and Data Science Education, 30, 15–28. DOI: 10.1080/26939169.2022.2038041..
  • Bowen, C. (2021a), “Personal Privacy and the Public Good: Balancing Data Privacy and Data Utility,” [online]. Available at https://policycommons.net/artifacts/1808780/personal-privacy-and-the-public-good/2543692/.
  • Bowen, C. M. (2021b), Protecting Your Privacy in a Data-Driven World, Boca Raton, FL: Chapman and Hall/CRC.
  • Bradstreet, T. E. (1996), “Teaching Introductory Statistics Courses so that Nonstatisticians Experience Statistical Reasoning,” The American Statistician, 50, 69–78. DOI: 10.2307/2685047..
  • Clayton, E. W. (2005), “Informed Consent and Biobanks,” The Journal of Law, Medicine & Ethics, 33, 15–21. DOI: 10.1111/j.1748-720x.2005.tb00206.x..
  • Collaborative Institutional Training Initiative (2017), “The CITI Program,” [online]. Available at https://about.citiprogram.org/en/series/human-subjects-research-hsr/.
  • De Montjoye, Y. A., Hidalgo, C. A., Verleysen, M., and Blondel, V. D. (2013), “Unique in the Crowd: The Privacy Bounds of Human Mobility,” Scientific Reports, 3, 1–5. DOI: 10.1038/srep01376.
  • De Veaux, R. D., Agarwal, M., Averett, M., Baumer, B. S., Bray, A., Bressoud, T. C., Bryant, L., Cheng, L. Z., Francis, A., Gould, R., Kim, A. Y., Kretchmar, M., Lu, Q., Moskol, A., Nolan, D., Pelayo, R., Raleigh, S., Sethi, R. J., Sondjaja, M., Tiruviluamala, N., Uhlig, P. X., Washington, T. M., Wesley, C. L., White, D., and Ye, P. (2017), “Curriculum Guidelines for Undergraduate Programs in Data Science,” Annual Review of Statistics and Its Application, 4, 15–30. DOI: 10.1146/annurev-statistics-060116-053930..
  • De Veaux, R., Hoerl, R., Snee, R., and Velleman, P. (2022), “Toward Holistic Data Science Education,” Statistics Education Research Journal, 21, 2. DOI: 10.52041/serj.v21i2.40..
  • Dwork, C. (2011), “A Firm Foundation for Private Data Analysis,” Communications of the ACM, 54, 86–95. DOI: 10.1145/1866739.1866758..
  • Elliott, A. C., Stokes, S. L., and Cao, J. (2018), “Teaching Ethics in a Statistics Curriculum with a Cross-Cultural Emphasis,” The American Statistician, 72, 359–367. DOI: 10.1080/00031305.2017.1307140..
  • Emery, L. J., Harvey, C., and Andersen, C. M. (2006), “Formative Evaluation using Checklists to Improve Research Proposals,” Perspectives in Health Information Management, 3, 2.
  • GAISE College Report ASA Revision Committee (2016), “Guidelines for Assessment and Instruction in Statistics Education College Report 2016,” [online]. Available at https://www.amstat.org/docs/default-source/amstat-documents/gaisecollege_full.pdf.
  • Gardenier, J. S. (2011), “Ethics in Quantitative Professional Practice,” in Handbook of Ethics in Quantitative Methodology, eds. A. T. Panter and S. K. Sterba, pp. 15–36, New York: Routledge.
  • Garfield, J. B., Ben-Zvi, D., Chance, B., Medina, E., Roseth, C., and Zieffler, A. (2008), Developing Students’ Statistical Reasoning: Connecting Research and Teaching Practice, Dordrecht: Springer.
  • Garfield, J., Hogg, B., Schau, C., and Whittinghill, D. (2002), “First Courses in Statistical Science: The Status of Educational Reform Efforts,” Journal of Statistics Education, 10. DOI: 10.1080/10691898.2002.11910665..
  • Garfinkel, S. L., and Bowen, C. M. (2022), “Preserving Privacy While Sharing Data,” MIT Sloan Management Review, 63, 1–4.
  • Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Iii, H. D., and Crawford, K. (2021), “Datasheets for Datasets,” Communications of the ACM, 64, 86–92. DOI: 10.1145/3458723..
  • Hand, D. J. (2018), “Aspects of Data Ethics in a Changing World: Where are We Now?,” Big Data, 6, 176–190. DOI: 10.1089/big.2018.0083..
  • Hotelling, H. (1940), “The Teaching of Statistics,” The Annals of Mathematical Statistics, 11, 457–470, 414. DOI: 10.1214/aoms/1177731833..
  • Institute of Medicine and National Research Council (1989), The Responsible Conduct of Research in the Health Sciences, Washington, DC: The National Academies Press. DOI: 10.17226/1388..
  • International Statistical Institute (2018), “Declaration on Professional Ethics,” [online]. Available at https://www.isi-web.org/images/about/Declaration-EN2010.pdf.
  • Kramer, A. D., Guillory, J. E., and Hancock, J. T. (2014), “Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks,” Proceedings of the National Academy of Sciences of the United States of America, 111, 8788–8790. DOI: 10.1073/pnas.1320040111..
  • Lee, L. M., McCarty, F. A., and Zhang, T. R. (2015), “Ethical Numbers: Ethics Training in U.S. Graduate Statistics Programs, 2013–2014,” The American Statistician, 69, 11–16. DOI: 10.1080/00031305.2014.997891..
  • Lesser, L. M., and Nordenhaug, E. (2004), “Ethical Statistics and Statistical Ethics: Making an Interdisciplinary Module,” Journal of Statistics Education, 12. DOI: 10.1080/10691898.2004.11910630..
  • Lubarsky, B. (2010), “Re-identification of “Anonymized” Data,” Georgetown Law Technology Review. Available at https://www.georgetownlawtechreview.org/re-identification-of-anonymized-data/GLTR-04-2017 (accessed on 10 September 2021).
  • Malin, B. (2006), “Re-identification of Familial Database Records.” In AMIA Annual Symposium Proceedings: American Medical Informatics Association.
  • Metcalf, J., and Crawford, K. (2016), “Where are Human Subjects in Big Data Rresearch? The Emerging Ethics Divide,” Big Data & Society, 3, 205395171665021. DOI: 10.1177/2053951716650211..
  • Mittelstadt, B., and Kwakkel, J. (2020), “Assessing Provenance and Bias in Big Data,” in The Routledge Handbook of the Philosophy of Engineering, eds. D. P. Michelfelder and N. Doorn, pp. 191–205, New York: Routledge.
  • Moore, D. S. (1997), “New Pedagogy and New Content: The Case of Statistics,” International Statistical Review/Revue Internationale de Statistique, 65, 123–137. DOI: 10.2307/1403333..
  • Narayanan, A., and Shmatikov, V. (2008), “Robust De-anonymization of Large Sparse Datasets,” in 2008 IEEE Symposium on Security and Privacy (sp 2008). DOI: 10.1109/SP.2008.33.
  • Nissenbaum, H. (2020), “Protecting Privacy in an Information Age: The Problem of Privacy in Public,” in The Ethics of Information Technologies, eds. K. W. Miller and M. Taddeo, pp. 141–178, London: Routledge.
  • O’Neil, C. (2017), Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, New York: Crown.
  • Open Data Institute (2021), “Data Ethics Canvas,” [online]. Available at https://theodi.org/wp-content/uploads/2021/07/Data-Ethics-Canvas-English-Colour.pdf
  • Parke, C. S. (2008), “Reasoning and Communicating in the Language of Statistics,” Journal of Statistics Education, 16. Available at DOI: 10.1080/10691898.2008.11889555..
  • Paxton, A. (2020), “The Belmont Report in the Age of Big Data: Ethics at the Intersection of Psychological Science and Data Science,” in Big Data in Psychological Research, eds. S. E. Woo, L. Tay and R. W. Proctor, pp. 347–372, Washington, DC: American Psychological Association.
  • Peck, R. (2005), “How Did Teaching Introductory Statistics Get To Be So Complicated?!?,” [online]. Available at www.statlit.org/pdf/2005PeckUSCOTS1up.pdf.
  • Ritchie, K. (2021), “Using IRB Protocols to Teach Ethical Principles for Research and Everyday Life: A High-Impact Practice,” Journal of the Scholarship of Teaching and Learning, 21, 120–130. DOI: 10.14434/josotl.v21i1.30554..
  • Rocher, L., Hendrickx, J. M., and De Montjoye, Y. A. (2019), “Estimating the Success of Re-identifications in Incomplete Datasets using Generative Models,” Nature Communications, 10, 1–9. DOI: 10.1038/s41467-019-10933-3..
  • Ryan, K. J., Brady, J. V., Cooke, R. E., Height, D. I., Jonsen, A. R., King, P., Lebacqz, K., Louisell, D. W., Seldin, D. W., Stellar, E., and Turtle, R. H. (1979), “Ethical Principles and Guidelines for the Protection of Human Subjects of Research (The Belmont Report),” Retrieved from US Department of Health & Human Services, Human & Health Services website: https://www.hhs.gov/ohrp/sites/default/files/the-belmont-report-508c_FINAL.pdf.
  • Shulman, B. (2002), “Is There Enough Poison Gas to Kill the City?: The Teaching of Ethics in Mathematics Classes,” The College Mathematics Journal, 33, 118–125. DOI: 10.1080/07468342.2002.11921929..
  • Simon, G. E., Shortreed, S. M., Coley, R. Y., Penfold, R. B., Rossom, R. C., Waitzfelder, B. E., Sanchez, K., and Lynch, F. L. (2019), “Assessing and Minimizing Re-identification Risk in Research Data Derived from Health Care Records,” eGEMs, 7, 6. DOI: 10.5334/egems.270..
  • Smith, D. (2016), “Re-identification in the absence of common matching variables,” [online]. Available at http://hummedia.manchester.ac.uk/institutes/cmist/archive-publications/working-papers/2016/2016-02.pdf.
  • Stein, R. (2015), “A Controversial Rewrite For Rules To Protect Humans In Experiments. National Public Radio: Morning Edition November, 25.,” [online], 2020 (June 20). Available at https://www.npr.org/sections/health-shots/2015/11/25/456496612/a-controversial-rewrite-for-rules-to-protect-humans-in-experiments.
  • Sweeney, L. (2002), “k-anonymity: A Model for Protecting Privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10, 557–570. DOI: 10.1142/S0218488502001648..
  • Tractenberg, R. E. (2016a), “Creating a Culture of Ethics in Biomedical Big Data: Adapting ‘Guidelines for Professional Practice’to Promote Ethical Use and Research Practice,” in The Ethics of Biomedical Big Data, eds. B. D. Mittelstadt and L. Floridi, 367–393, Cham: Springer. DOI: 10.1007/978-3-319-33525-4_16..
  • Tractenberg, R. E. (2016b), “Institutionalizing Ethical Reasoning: Integrating the ASA’s Ethical Guidelines for Professional Practice into Course, Program, and Curriculum,” in Ethical Reasoning in Big Data, eds. J. Collmann and S. A. Matei, pp. 115–139, Cham: Springer.
  • Tractenberg, R. E. (2019), “Teaching and Learning about Ethical Practice: The Case Analysis,” SocArXiv. DOI: 10.31235/osf.io/58umw..
  • Tractenberg, R. E., Russell, A. J., Morgan, G. J., FitzGerald, K. T., Collmann, J., Vinsel, L., Steinmann, M., and Dolling, L. M. (2015), “Using Ethical Reasoning to Amplify the Reach and Resonance of Professional Codes of Conduct in Training Big Data Scientists,” Science and Engineering Ethics, 21, 1485–1507. DOI: 10.1007/s11948-014-9613-1..
  • Utts, J. (2003), “What Educated Citizens Should Know about Statistics and Probability,” The American Statistician, 57, 74–79. DOI: 10.1198/0003130031630..
  • Utts, J (2021), “Enhancing Data Science Ethics through Statistical Education and Practice,” International Statistical Review, 89, 1–17. DOI: 10.1111/insr.12446..
  • Van Assche, K., Gutwirth, S., and Sterckx, S. (2013), “Protecting Dignitary Interests of Biobank Research Participants: Lessons from Havasupai Tribe v Arizona Board of Regents,” Law, Innovation and Technology, 5, 54–84. DOI: 10.5235/17579961.5.1.54..
  • Vardeman, S. B., and Morris, M. D. (2003), “Statistics and Ethics: Some Advice for Young Statisticians,” The American Statistician, 57, 21–26. DOI: 10.1198/0003130031072..
  • Washington, A. L., and Kuo, R. (2020, January). “Whose Side are Ethics Codes On? Power, Responsibility and the Social Good,” In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, (pp. 230–240).
  • Xiao, T., and Ma, Y. (2021), “A Letter to the Journal of Statistics and Data Science Education — A Call for Review of “OkCupid Data for Introductory Statistics and Data Science Courses” by Albert Y. Kim and Adriana Escobedo-Land,” Journal of Statistics and Data Science Education, 29, 214–215. DOI: 10.1080/26939169.2021.1930812..