15,641
Views
1
CrossRef citations to date
0
Altmetric
Rapid Communication

A New Era of Learning: Considerations for ChatGPT as a Tool to Enhance Statistics and Data Science Education

&
Pages 128-133 | Published online: 10 Jul 2023

Abstract

ChatGPT is one of many generative artificial intelligence (AI) tools that has emerged recently, creating controversy in the education community with concerns about its potential to be used for plagiarism and to undermine students’ ability to think independently. Recent publications have criticized the use of ChatGPT and other generative AI tools in the classroom, with little focus on the potential benefits. This article focuses on the potential of ChatGPT as an educational tool for statistics and data science. It encourages readers to consider the history of trepidation surrounding introducing new technology in the classroom, such as the calculator. We explore the possibility of leveraging ChatGPT’s capabilities in statistics and data science education, providing examples of how ChatGPT can aid in developing course materials and suggestions for how educators can prompt students to interact with ChatGPT responsibly. As educators, we can guide the use of generative AI tools in statistics and data science classrooms so that students and educators can leverage the benefits of this technology.

1 Introduction

The introduction of the calculator in the mid-20th century had a significant impact on mathematics education (Ellington Citation2003; Watters Citation2015). Initial reactions to this technology were mixed, with some educators and mathematicians expressing concern about the potential negative effects of calculators on students’ ability to perform basic arithmetic operations and understand mathematical concepts (Savage Citation1986; Ellington Citation2003; Watters Citation2015). Others, however, saw the calculator as a valuable tool that could enhance students’ understanding of mathematics and enable them to solve more complex problems (Savage Citation1986; Watters Citation2015). Despite the controversy surrounding the use of calculators in mathematics education, the technology has become ubiquitous in classrooms around the world and is now considered an essential tool for students at all levels of mathematical learning (Ellington Citation2003).

In 2015, Watters referred to calculators as “arguably one of the most controversial pieces of education technology to enter the classroom” (Watters Citation2015). The end of 2022 brought a new contender for the most controversial technology in the classroom, ChatGPT. ChatGPT is one of many generative artificial intelligence (AI) tools, working like a chatbot to generate text responses to user-provided prompts. Like the calculator, the emergence of ChatGPT has quickly created controversy in the educational landscape. Recent articles boast titles such as “ChatGPT: Bullshit spewer or the end of traditional assessments in higher education?” (Rudolph, Tan, and Tan Citation2023), “ChatGPT: The end of online exam integrity?” (Susnjak Citation2022), “Chatting and cheating: Ensuring academic integrity in the era of ChatGPT” (Cotton, Cotton, and Shipway Citation2023), and “ChatGPT user experience: Implications for education” (Zhai Citation2022). The message is clear: educators need to take note of ChatGPT, but what note should they take?

The initial response to ChatGPT from the education community has been largely cautionary; some educators and universities are taking steps to mitigate its use in academic settings. One of the primary concerns has been the potential for students to use ChatGPT to plagiarize or generate work that is not their own, leading to concerns about academic integrity (Rudolph, Tan, and Tan Citation2023; Susnjak Citation2022; Zhai Citation2022; Cotton, Cotton, and Shipway Citation2023). To address these concerns, educators are implementing various tactics, including using tools to detect the use of ChatGPT or implementing alternative assessment plans (Susnjak Citation2022; Cotton, Cotton, and Shipway Citation2023; Rudolph, Tan, and Tan Citation2023). Another concern about ChatGPT in the education landscape is that its use will lead to students being unable to think independently or detect if and when AI-generated text is incorrect (Baron Citation2023). As an artificial intelligence tool, there are also broader ethical concerns that apply to material generated by ChatGPT, such as the possibility of propagating biases present in source material into responses generated by ChatGPT (Dwivedi et al. 2023). Many others have discussed the potential negative implications of ChatGPT in education (Baron Citation2023; Cotton, Cotton, and Shipway Citation2023; Dwivedi et al. 2023; Gilbard Citation2023; Hirsh-Pasek and Blinkoff Citation2023; Rudolph, Tan, and Tan Citation2023; Weissman Citation2023); as such, that is not the focus of this article.

Rather than focusing on the cautionary tales of generative AI tools, this article focuses on the potential of ChatGPT as an educational tool for statistics and data science. We encourage readers to consider the history of trepidation amongst educators that has surrounded the emergence of new technology, such as the calculator, WolframAlpha, and Wikipedia, all of which caused concern when first introduced but are now commonly used as learning tools (Young Citation2009; Watters Citation2015; D’Agostino Citation2022). While generative AI tools are different in many ways from these technologies, they share the commonality that they have been a controversial addition to the set of tools available to both students and educators. In this article, we first describe ChatGPT and how to use it, then provide suggestions and examples for leveraging ChatGPT in statistics and data science courses. We then challenge the reader to think about the potential benefits of ChatGPT and how educators can incorporate ChatGPT in the classroom and train our students to use it responsibly.

2 What Is Generative AI and ChatGPT?

Generative AI is a type of artificial intelligence that is designed to create new content, such as images, music, and text. These systems use deep learning techniques to analyze patterns and relationships in large datasets, allowing them to produce novel and creative output. One type of generative AI that has gained significant attention in recent years is large language models. These models are trained on vast amounts of text data and are specifically designed to process and understand human language (Wolfram Citation2023).

ChatGPT (“GPT” stands for “Generative Pre-trained Transformer”) is a large language model generative AI tool that was introduced by OpenAI in November of 2022. It works like a chatbot, generating text responses to user-provided prompts. Users can request simple text responses or responses that are complex such as programming code, sonnets, entire essays, and mathematical theorems. The generative AI landscape is changing rapidly, with new tools regularly becoming available. The focus of this article is on ChatGPT due to its widespread popularity, but the general principles in this paper apply to other generative AI tools as well.

3 Capabilities of ChatGPT

ChatGPT has a wide range of capabilities in generating text output, and educators should be aware that ChatGPT can generate output that sounds remarkably human-like. The conversational nature of ChatGPT is modeled in, “A Conversation on Artificial Intelligence, Chatbots, and Plagiarism in Higher Education” (King and ChatGPT 2023). Others have also investigated the mathematical capabilities of ChatGPT, including its ability to solve complex mathematical problems and write proofs (Frieder et al. Citation2023).

To demonstrate the capabilities of ChatGPT for answering questions about introductory statistics content, we provided three prompts asking for the definition of a confidence interval and three prompts asking for the definition of a p-value. Both are concepts that learners of statistics find challenging. The responses provided by ChatGPT to these prompts about confidence intervals and p-values are provided in and , respectively. The phrasing of each prompt is slightly different to illustrate the variety of responses that ChatGPT can generate to similar prompts. Repeatedly inputting the same prompt can also lead to different generated text; the responses provided in and were the first responses generated by ChatGPT at the time of inquiry.

Table 1 Example prompts provided to ChatGPT and resulting responses generated regarding the definition of a confidence interval.

Table 2 Example prompts provided to ChatGPT and resulting responses generated regarding the definition of a p-value.

While each of the examples presented in and asks ChatGPT to explain the same concept, the generated responses differ in length, style, and approach to the explanation. Because ChatGPT is trained on a large corpus of existing text data such as books, articles, and websites, inaccuracies written in these sources can be passed to ChatGPT responses. For example, a common misinterpretation of a frequentist confidence interval is to treat it as a probability statement about the true value of the population parameter, incorrectly treating the population parameter as a random variable rather than a fixed but unknown value. One sentence in ChatGPT’s response to the second prompt (“The level of confidence associated with a confidence interval represents the likelihood that the true value of the population statistic is contained within the interval”) somewhat alludes to this misinterpretation (). However, it does follow up with a correct interpretation in the following sentences. When defining a p-value, a common misinterpretation is that the p-value represents the probability of your research finding occurring by chance—a statement that ChatGPT provides in response to the second prompt. This definition could be true if it was understood under the assumption of the null hypothesis being true, but ChatGPT does not provide this qualifier. Although there are inaccuracies in ChatGPT-generated responses, the responses in and show that ChatGPT is capable of successfully defining these statistical concepts using a range of language styles (both technical and non-technical). It is important that statistics and data science educators teach students to recognize that the statements generated by ChatGPT may contain accuracies and inaccuracies, and a fruitful classroom exercise may consist of guiding students in how to critique ChatGPT’s responses.

Despite its wide-ranging capabilities, there are some types of tasks that ChatGPT may not perform well. For example, asking ChatGPT to generate references may lead to a reference list that is completely fabricated (King and chatGPT 2023). ChatGPT also cannot accommodate non-text prompts such as images. Occasionally, ChatGPT responds very slowly, and sometimes it cannot be used at all, as it may indicate a message such as, “We’re experiencing exceptionally high demand. Please hang tight as we work on scaling our systems.” ChatGPT lists its own limitations as: “may occasionally generate incorrect information”, “may occasionally produce harmful instructions or biased content”, and “limited knowledge of world and events after 2021” (OpenAI Citation2023). ChatGPT does not have common sense knowledge or the ability to understand context beyond what it has learned from its training data. Therefore, it may not always be able to provide accurate or appropriate responses to prompts that require a deep understanding of the context beyond the scope of its training data. It also may be unable to tailor instruction to an individual student as effectively as a human teacher. As with any AI model, it is essential to be cautious and critical of its outputs and use it as a tool rather than a definitive source of information.

4 Leveraging ChatGPT for Statistics and Data Science Education

There are several ways that educators can consider leveraging ChatGPT in statistics and data science education. One of these methods is utilizing ChatGPT to generate course content, which could include supplements to educator-derived material, such as prompting ChatGPT to create a summary of lecture notes (Roose Citation2023; Staake Citation2023). It could also generate new material, such as practice quizzes or exam questions (Roose Citation2023; Staake Citation2023). For example, when providing ChatGPT the following prompt, “Write three free-response exam questions about sample size and power for a college-level introductory statistics class,” it generated the following:

  1. Explain the concept of statistical power and why it is important in hypothesis testing. What factors can impact the power of a hypothesis test, and how does this affect the choice of sample size?

  2. In a hypothesis test, what is the relationship between the sample size, the level of significance, and the power of the test? Give an example to illustrate this relationship, and explain why it is important to consider all three factors when conducting a hypothesis test.

  3. Describe the consequences of having a sample size that is too small for a hypothesis test. How does this impact the ability to detect an effect, if one exists, and how does it affect the risk of making a Type II error? How can increasing the sample size help to mitigate these consequences?

Note that some ChatGPT responses are more appropriate than others, and we do not condone blindly using ChatGPT-derived materials without critically assessing whether these materials meet assessment goals for student learning. However, even in cases where ChatGPT responses are not meeting the educator’s goals, an iterative process can be used to refine ChatGPT responses. For example, a prompt may be revised to ask for something that is shorter or longer, written in a specific style, or written without some part of the response that was undesirable. In our experience, ChatGPT performs well when provided with a specific prompt that is then refined by the user as needed, as opposed to a traditional search engine that may not perform well with an overly-specific or verbose input.

Educators can also manually revise ChatGPT responses to meet their needs or leverage ChatGPT to generate general ideas that are then used for educator-developed materials. For example, when providing ChatGPT the following prompt, “Give me an example of a biomedical variable that follows a Poisson distribution,” it generated the following: “A Poisson distribution is a probability distribution that can be used to model count data, such as the number of occurrences of an event in a given time or space. In a biomedical context, an example of a variable that can follow a Poisson distribution is the number of infections acquired by patients in a hospital. The number of infections can be modeled as a Poisson process, where the mean rate of infections depends on various factors such as the type of infection, the patient population, and the quality of healthcare. By modeling the number of infections as a Poisson variable, healthcare professionals can estimate the likelihood of different infection scenarios and implement measures to reduce the risk of infections in the hospital.” This response provides much more information than was requested. Still, if the educator’s goal was to get an idea of a health-related count variable to use in an example about a Poisson distribution, the goal was achieved—they could consider using “number of infections” as the example. As previously mentioned, critically thinking about ChatGPT responses and manually revising as necessary is an important part of the process. For example, the educator may recognize (as ChatGPT failed to mention) that counts of hospital infections may be overdispersed and more appropriately modeled using a negative binomial distribution than a Poisson distribution.

Statistics and data science educators may also find ChatGPT helpful for developing materials for courses focused on statistical programming or as a learning tool for students in programming courses. For example, an educator can ask ChatGPT to write pseudocode for a given problem, which then can be used as a teaching example to help students who are new to statistical programming. An educator could also provide an excerpt of code to ChatGPT and ask it to write the code in a different language; for example, to convert it from SAS to R or from R to Python. Statistics and data science educators may also wish to guide students in how to use ChatGPT to debug their code, a topic that has been covered by others (Jain Citation2022; Rollbar Editorial Team Citation2023). Of note, several other AI tools also exist to assist in code writing and debugging, and this area is rapidly evolving as new tools become available regularly (Phillips Citation2023).

Educators may also wish to test their assignments or exam questions on ChatGPT to be aware of what students could derive if they utilized ChatGPT to answer the questions themselves. Educators may find that ChatGPT is surprisingly adept at providing short answers to free-response questions that ask students to explain statistical concepts, such as, “Briefly explain one disadvantage of utilizing complete case analysis to analyze a study with missing data.” As previously mentioned, there are other types of tasks that ChatGPT is less capable (or sometimes incapable) of performing well, such as interpreting statistical output that is provided as an image.

Tools exist to assess student learning while ensuring students are not using ChatGPT or other generative AI tools, such as programs blocking access to specific websites or the entire internet. However, we echo the sentiments of many others suggesting that educators consider an alternative approach to address how students can leverage generative AI to enhance learning directly (Roose Citation2023, Abramson Citation2023, Duckworth and Ungar Citation2023; Staake Citation2023). Doing so mimics the environment that students will be in outside of the classroom so that they can learn to responsibly use the tools that they will have available to them. For a free-response task, this could include a three-part question that asks students to: (a) answer a prompt in their own words, (b) feed the prompt into ChatGPT and copy the response, and (c) analyze how their response differs from ChatGPT’s response and how they can assess whether the differences are “correct.” For a statistical programming class, an example could include asking students to do the following: (a) prompt ChatGPT to write some R code to perform a specific task, (b) run the ChatGPT-generated code in R, and (c) assess whether the code worked as intended and explain how they determined whether or not it worked, or alternatively, explain line by line what the generated code is doing. Incorporating exercises that require students to critique ChatGPT provides instructors the opportunity to help students navigate responses generated by ChatGPT. Such exercises can be leveraged to create classroom discussions where students share their responses, and the instructor can facilitate conversations around responsibly using AI tools and how to critique the generated responses.

5 Reflection

We recognize that educators have valid concerns regarding the implementation and integration of AI tools in the classroom, and a full discussion of these issues can be found elsewhere (Baron Citation2023; Cotton, Cotton, and Shipway Citation2023; Gilbard Citation2023; Hirsh-Pasek and Blinkoff Citation2023; Rudolph, Tan, and Tan Citation2023; Weissman Citation2023). In addition to the concerns related to the classroom use of AI there are legal and ethical concerns which have arisen with the introduction of ChatGPT (Karim Citation2023; Woodie Citation2023; Zhou et al. Citation2023). Despite these concerns, the popularity of AI tools is increasing. Many universities are establishing AI-related committees or utilizing existing ethics committees to provide guidance on issues of academic integrity as AI tools continue to evolve and advance (Grove Citation2022; Young Citation2023). We encourage readers to consider other technologies, such as the calculator, Wolfram Alpha, and Wikipedia, all of which were met with initial wariness but are now commonly used as learning tools. As statistics and data science educators, we can actively shape and guide the incorporation of AI tools within our classrooms. This article highlights several avenues for responsibly leveraging ChatGPT in statistics and data science education.

Finally, we invite the reader to consider that two paragraphs of this manuscript were generated by ChatGPT. Can you tell which ones? To some, this exercise may highlight how dangerous ChatGPT can be to academic integrity. On the other hand, this exercise demonstrates how ChatGPT can be utilized effectively by feeding it a specific prompt, thinking critically about whether the generated response meets the user’s goals, and revising the prompt if necessary. (For those looking for the answer, we used ChatGPT to generate the first paragraph of the Introduction section and the first paragraph of the “What is Generative AI and ChatGPT?” section using the following two prompts: (a) “Write a short paragraph in the style of academic writing that describes the initial reactions to the impact of the calculator on mathematics education,” and (b) “In the style of academic writing, briefly explain to a layperson what generative AI and large language models are.”) The responses from ChatGPT were minimally edited for flow, and references were added by the authors.

Data Availability Statement

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

References