2,481
Views
5
CrossRef citations to date
0
Altmetric
Data Science

Undergraduate Learning Outcomes for Achieving Data Acumen

, , , , , , , , , , & show all

Abstract

It is imperative to foster data acumen in our university student population in order to respond to an increased attention to statistics in society and in the workforce, as well as to contribute to improved career preparation for students. This article discusses 13 learning outcomes that represent achievement of undergraduate data acumen for university level students across different disciplines.

1 Introduction

Recently the value of an undergraduate degree has been challenged (Estes Citation2011), and demands for greater accountability in higher education have emanated from prospective students, their parents, business leaders, and politicians. The economic climate and employment considerations are central to many of these concerns. The percentage of students who report that their decision to go to college has been strongly shaped by a desire to “to get a better job” has increased in recent years, and in 2012, 88% of first-year students reported this factor as very important (Eagan et al. Citation2014). Colleges and universities have repeatedly been called upon to do a better job in preparing students for careers.

One area that needs to be strengthened in response to the career climate is student preparation in statistics and data science. The Chronicle of Higher Education recently listed the growth of data science programs as a key trend in higher education. However, they also noted that data science programs are being added without careful attention to what a data science curriculum should look like. Moreover, because data and statistics play an important role in all disciplines, undergraduate curricula in statistics and data science may be embedded within different disciplinary contexts. As such, there is a need for a set of comprehensive learning outcomes to help guide data learning across the disciplines. Such learning outcomes will help departments across institutions, administrators, as well as individual faculty to better understand how statistics and data courses across departments fit together to provide a coherent curriculum. Data education entails ensuring that students not only have sound computing, data analysis, and statistical skills, but also have good communication skills and the ability to work as part of a team (Zorn et al. Citation2014; Holdren and Lander Citation2012; Davenport and Patil Citation2012). As noted by Horton and Hardin (Citation2015), “the idea that an undergraduate statistics [major] develops general problem solving skills to use data to make sense of the world is powerful.” This is what offerings in colleges and universities in statistics should strive to achieve—nimble computing data problem solvers (Nolan and Temple Lang Citation2010; Nolan and Temple Lang Citation2015).

Some data science education recommendation and guidelines already exist in the literature. For example, in 2018, the Two-Year College Data Science Summit published a report outlining recommendations for data science programs at community colleges (Gould et al. Citation2018). Recommended program outcomes are organized into four categories: computational, statistical, data management and curation, and mathematical. The program outcomes are further partitioned into foundations, thinking, and modeling outcomes (p. 16). Overall the guidelines provide community colleges housing data science programs a set of explicit learning outcomes to organize their programs around. Also in 2018, the National Academies Press put forth the Envisioning the Data Science Discipline: The Undergraduate Perspective Interim Report (National Academies of Sciences et al. 2018b). This report defined the term “data acumen” as the ability to make good judgments and decisions with data. It also notes that data acumen is “not a final state to be reached but rather a skill that data scientists develop and refine over time.” (p. 12). To develop data acumen, mathematical foundations, computational thinking, statistical thinking, data management, data description and curation, data modeling, ethical problem solving, communication and reproducibility, and domain-specific considerations are needed (p. 33).

The learning outcomes presented in this article differ from those in these reports in a few ways. A main goal here is not to provide statistics and data science guidelines for a specific program dedicated to data science but instead to present outcomes for working toward data acumen across university courses and across disciplines. Responding to the call in the National Academies report that data science requires participation from all different disciplines understanding that the degree to which different disciplines develop the components of data acumen varies (National Academies of Sciences et al. 2018a), this article presents a cross-disciplinary study that was undertaken to develop baseline learning outcomes for statistical and data learning at a university. A second National Academies report, Data Science for Undergraduates, notes the difficulty in furthering data science education across disciplines through upper division courses due to the varying topics in introductory courses in different disciplines. It also notes a need for cross-disciplinary coordination and collaboration from a wide spectrum of disciplines (p.39). While the Two-Year College report and the first National Academies Report discuss learning outcomes for specific programs in data science, this article outlines a series of common learning outcomes valid across disciplines for working toward data acumen on a university campus.

Following the recommendations of the American Statistical Association put forth in the Curriculum Guidelines for Undergraduate Programs in Statistics (ASA, Undergraduate Guidelines Workgroup 2014), this article discusses how statistics and data education bridge many disciplines and how the different disciplinary approaches can be integrated into one set of coherent learning outcomes for undergraduate education in statistics and data education. Overall, to fulfill the growing needs of the workforce, students graduating from college need to be prepared to tackle problems using technology, work with real data, and communicate their ideas.

2 Statistics and Data Science Education across Disciplines at Universities

Universities across the U.S. typically have many different statistics course offerings across campus. Because it is very common to have statistics courses housed in different disciplines (e.g., mathematics, computer science, psychology, economics), the ASA and Mathematical Association of America (MAA) offer guidelines for teaching introductory statistics targeted at non-statistics departments (ASA/MAA Joint Committee on Undergraduate Statistics 2014). Oftentimes these courses overlap and yet their prerequisite structures do not allow a student to move from a statistics course offered in one department to a more advanced course offered by another department. Departments often rightfully argue that the type of statistical techniques needed are discipline specific and thus necessitate the offering of a course within a specific discipline.

Although specific techniques do vary from discipline to discipline, certain basic themes of working with data should be present in all courses. Three important, fundamental, and particularly timely themes are that students need to (1) employ technology, (2) explore real datasets, and (3) practice communicating statistical ideas and results.

Scholarly articles and recommendations of professional organizations concerning undergraduate preparation in various disciplines, in addition to the ASA sponsored documents already discussed, align with these themes. For example, skills in statistics and the ability to work with data and technology are increasingly recognized as core components of an education in sociology (Wilder Citation2010). A 2010 report published by the American Psychological Association, recommends that psychology students complete coursework in statistics and research methods as early as possible and that the knowledge and skills gained from these courses be reinforced throughout the curriculum. A national study of undergraduate business education conducted by The Carnegie Foundation for the Advancement of Teaching concluded with recommendations that programs provide a stronger linkage between business, arts, mathematics, and science curricula and that programs promote courses that incorporate complex and ambiguous real-world issues and three essential modes of thinking: Analytical Thinking, Multiple Framing, and Reflective Exploration of Meaning (Colby et al. Citation2011). Statistics courses that incorporate the statistical thinking process of formulating a question, collecting appropriate data, choosing an appropriate analysis technique, and interpreting results (Franklin et al. Citation2007) promote these modes of thinking. Teaching statistics as an interrogative process is also stressed in both the GAISE college report (Everson et al. Citation2016) and the ASA Undergraduate Guidelines Workgroup (2014).

Several important reports have stated the need for students to work with real data. The Committee on the Undergraduate Program in Mathematics Curriculum Guide 2015 (Mathematical Association of American Citation2015) states “Working mathematicians often face quantitative problems to which analytic methods do not apply. Solutions often require data analysis, complex mathematical models, simulation, and tools from computational science.” This report recommends that all mathematical sciences major programs include concepts and methods from data analysis and computing. The Guidelines for Assessment and Instruction in Statistics Education (GAISE) college guidelines also include working with real data as one of the necessary six components of structuring an introductory statistics course (ASA?) . In addition, the recommendations of the ASA on undergraduate programs in data science include Real Applications and Problem Solving as two of their Background and Guiding Principles. They state programs should “emphasize concepts and approaches for working with complex data and provide experiences in designing studies and analyzing real data (defined as data that have been collected to solve an authentic and relevant problem)” (ASA, Undergraduate Guidelines Workgroup 2014).

As data science has been described as an intersection of statistics with computer science, when considering undergraduate preparation, one must consider how the use of software interplays with statistics. Regardless of the discipline, technological fluency has become a must for success in the workforce. Therefore, university statistics and data science courses must incorporate heavy use of technology and computing.

The material commonly taught in introductory statistics courses often merely focuses on techniques. However, such methods are often “necessary but not sufficient” for modern data science (Hardin et al. 2015; Ridgeway 2016). Instead, an undergraduate education should focus on the unifying themes of working with technology, working with real data, and communicating results for all course offerings across campuses. Moreover, if a model existed for explicit learning outcome goals of an undergraduate education in statistics and data related courses, then the door may be open to creating a coherent curriculum for students seeking statistics and data education beyond what just their departments offer.

3 Undergraduate Data Pathways (UDaP) Study

The National Science Foundation (NSF)-funded project (NSF Grant No. 1712296), Undergraduate Data Pathways (UDaP), focused on understanding differences and similarities of statistics and data related course offerings across different disciplines. The project carried out a rigorous study to develop a set of learning outcomes for statistics and data related courses at the undergraduate level that integrated the data-related goals put forth by several different disciplines. This paper reports on that study and presents a set of learning outcomes (LOs) that work toward data acumen for university level students across different disciplines. If a student meets all of the LOs, the student will have achieved an introductory level of data acumen appropriate to the undergraduate level. The LOs not only reflect cross-disciplinary goals but they also reflect societal needs of data analysis. The study took place at Loyola Marymount University (LMU), a mid-sized comprehensive university in Los Angeles, California. Faculty from eight departments across campus carried out the study.

4 Methods

Five steps were undertaken to better understand the differences and commonalities of statistics and data education across disciplines and subsequently develop a unifying set of learning outcomes for undergraduate statistics and data education.

As a first step, a faculty working group consisting of LMU faculty from mathematics, economics, biology, psychology, sociology, business, and education was formed. While LMU does not have a department dedicated to statistics or data science, the Department of Mathematics, Department of Biology, Department of Engineering, Department of Economics, Department of Political Science, Department of Psychology, Department of Sociology, the School of Business, and the School of Education offer courses related to statistics and data analysis.

The formation of a working group of invested change agents was no easy task. The Associate Dean for Undergraduate Studies urged faculty across departments that had investment in statistics and data analysis to join the group. In addition, members of the research team personally reached out to faculty in other departments to encourage them to join the working group. A total of 10 faculty members were selected for the working group. The working group was centered around understanding the processes and support needed to implement the themes of communication, technology, and real data in statistics courses across the disciplines. Four meetings per semester were conducted over the course of two academic years. The purpose of the working group discussions was to gather qualitative data on how different disciplines articulated the importance of statistics and data analysis and to determine what all of the disciplines had in common.

A second step in the process was to develop and administer a 36 question survey to the working group. The survey asked about software platforms, data sources, types of class assignments offered (e.g., statistics investigations in the form of projects, problem sets), and the types of activities used in the classroom (e.g., students using computers in a lab setting, group work). The survey included questions from the Statistics Teaching Inventory (STI) developed by Zieffler et al. (Citation2012) focusing on teaching practice, assessment practice, technology use, teaching, and assessment beliefs. The goal of the survey was to gather in-depth information about the statistical habits of the working group faculty across disciplines. See Appendix A for the inclusion of the entire survey.

Using the discussions and internal survey results, an initial set of learning outcomes was developed. Each disciplinary representative researched and brought forth any guiding documents that were present from their disciplines related to data education. In addition, each disciplinary representative gathered syllabi and University bulletin course descriptions for all of the courses taught within their discipline. Using a blinded exercise, the working group sorted the course descriptions by similarities—all course names were removed from the descriptions and the working group worked in pair groups to organize the descriptions into groups according to the topics covered within the courses. The names and departments of the courses were then revealed. This exercise was a catalyst for the working group to summarize common themes that were present across courses at the institution. These themes were outlined and noted. Syllabi were then reviewed to pick out how many courses highlight the themes and whether other themes were present that were not touched upon in the course descriptions. The working group members were asked to review the syllabi and then discuss the recurring themes present within and across disciplines. Courses were sorted into basic courses, introductory courses, application courses, and beyond courses. The discussions were guided by two of the PIs of the project (Bargagliotti and Larson). Bargagliotti and Larson guided the group in readings of papers in the literature discussing data acumen and readings of guidelines and reports from the different disciplines. They also presented enrollment data for specific courses at LMU (see Bargagliotti et al. Citation2020, for presentation of enrollment results) to help provide student context to the discussions. In addition, discussion questions focused on technology use and necessities were posed to the working group at each meeting session. This process led to the formulation by the group of explicit learning outcomes. The process was completed over the course of one year through monthly meetings and email and phone conversations in between the in-person meeting times.

To validate these learning outcomes, a third step in the development process included carrying out a larger-scale survey to the greater community, both academic and nonacademic, to garner thoughts on the necessary learning outcomes for data education at the university level. This survey gathered data on whether respondents agreed, were neutral, or disagreed that a learning outcome was important for achieving undergraduate data acumen at the undergraduate level. More specifically, the survey asked: For each statement below, please mark whether you agree, disagree, or are neutral that the statement describes a data analysis skill that you believe a college graduate in today’s society should have. The full survey is included in Appendix C.

A fourth step carried out by the research team was to review position statements, policy documents, and curriculum guidelines put forth by professional organizations (e.g., ASA, APA, AEA) regarding data education proficiency to understand whether there was common ground between the disciplines.

Based on the information gathered in these four steps, the culminating step of the work was to develop a final set of learning outcomes to represent appropriate data acumen at the undergraduate level across disciplines.

5 Findings

5.1 Working Group Survey Findings and Working Group Discussion

Nine members of the working group completed the internal survey. The survey was administered at the first group meeting before any discussion took place. The purpose of the survey was to gauge satisfaction with the manner in which the teaching and learning of statistics and data related topics was approached at the University as well as to gather baseline data on the typical statistical and data analytical processes used across disciplines. These data would then serve for a starting point of conversation to develop a set of learning outcomes for data education that would bridge the disciplines.

Of the nine respondents, only one reported they were happy with the course offerings and curriculum related to data, four responded they were somewhat happy, three were unable to judge, and one said they were not happy. Two main issues were identified as those keeping faculty from being satisfied or being able to change the course offerings and curriculum to their liking. The identified issues were the general feeling that the institution did not support current statistical needs – specifically with providing access to technology or materials needed to teach statistics and data analysis properly as well as the institution not providing enough faculty lines to cover the growing needs.

Respondents were asked what they would like to change about the course offerings and curriculum related to statistics if they had all of the resources needed. The responses were:

  • Our students can take basic stats and they can take more advanced Biostats, though it isn’t taught very frequently (once every 2 years). More gradations might be good, as well as more frequency. Also the class is co-taught with Bio and Math faculty, which is great.

  • I am most familiar with the statistics requirement in the psychology dept and less aware of the offerings elsewhere.

I know that in the psych dept in the past we didn’t have enough people to cover stats and often relied on visiting professors or adjunct professors. This is changing now though. I will also say that it is difficult to get access to lab classrooms with computers for all of the stats sections that we offer.

  • math department has some solid courses, but it would be nice to have at least one advanced data science class.

  • More computational statistics required [of students in order to graduate]

  • I believe in the social sciences all students should be required to take an introductory statistics course, an empirical research methods course, and a qualitative methods course.

  • Students do not have the ability to further their statistical knowledge past their own department offerings.

  • Overlapping topics; no interactions between various departments

Several themes were present in the responses and these themes continued to emerge throughout subsequent discussions. These themes included: a general siloed approach to data education curriculum across departments, frustration over a lack of advanced courses, and a lack of understanding of what is happening in other departments.

Although our group of nine faculty was identified as the primary professors teaching data-related courses in their disciplines, there was repeated evidence that indicated that we had difficulty thinking of statistics past our own departments and across the university as a whole. For example, in multiple instances throughout discussions, the conversations revolved around single departments and single courses. There were many statements such as “in my department, in my course, we…” While this reaction was to be expected, there was extensive effort made to keep the cross-disciplinary goal in mind as the development of learning outcomes progressed. This cross-disciplinary focus was thus identified as a main take-away for the working group as we strove to make adjustments over the course of the next year.

Subsequent working group meetings consisted of discussions and group exercises that took a closer look at the University course descriptions of all courses offered at the University related to statistics and data analysis. A total of 29 courses offered at LMU were identified by the working group that spanned 11 different departments (see for the list of courses, see Appendix B for brief descriptions of each course). The courses being offered cover a total of 11 different departments and therefore have a wide reach across the University. As shown in the , The College of Liberal Arts offers 12 courses related to data; the College of Science and Engineering offers 12, the College of Business Administration offers 4, and the School of Education offers 1 course. Of the 29 courses offered, 9 are lower division (shown in light gray) courses and 20 are upper division courses (shown in dark gray). Of these upper division courses, five were special reading courses offered in small settings.

Table 1 List of statistics and data-related courses at LMU.

Table 3 Proposed Undergraduate Data Pathways (UDaP) learning outcomes.

Using the courses, the working group participated in an exercise where the course descriptions of all of the 29 courses were placed on 3 × 5 cards but without course names and titles. Each group member paired up with another member, with no pair coming from the same department. Each pair had to sort the cards by similarity of course content as well as difficulty. All pairs agreed that there appeared to be several basic and introductory statistics courses being taught across the University that had similar content. In addition, two other course types were identified – a research methods type course (where statistical methods were applied) and an advanced level course. The introductory courses could be further distinguished by those courses that covered regression and/or ANOVA versus those that did not. shows the established descriptions for the different types of courses found.

Table 2 Course-level types.

Examples of Basic Introductory courses were MATH 104, a non-calculus based introductory statistics course aimed at general education requirements. In contrast, MATH 204 was considered to be a Basic Introductory Statistics + Regression/ANOVA course. Similarly, SOCL 2100 and PSYC 2001 were also in this category of courses. Beyond Basic courses were defined by the inclusion of topics that are past basic inference and linear regression. For example, Biological Databases and PSYC 2002 both go beyond the introductory courses by including topics specific to data in biology and database structures and research methods in psychology. Special Topics courses included Machine Learning and Deep Learning as well as courses like Econometrics.

The classification exercise led to an attempt by the working group to create a set of interchangeable courses – interchangeable intro level statistics courses, interchangeable research methods courses, and interchangeable advanced courses. For example, if two courses within a group were deemed interchangeable, then those courses would fulfill the same requirements. The idea of creating some type of interchangeability course map was grounded in the belief that students might then be provided more ways to reach advanced content. It was also at this third meeting that the working group determined that a set of learning outcomes for a complete data pathway explicitly needed to be defined. All meetings that followed focused solely on the purpose of determining these learning outcomes.

To guide the creation of the learning outcomes, further analyses were done on the survey discussed above in order to identify the types of statistical techniques that were frequently used in each discipline. The results-were categorized into five groups:

  • Descriptive statistics

  • Visualization

  • Inferential

  • Predictive

  • Application

Based on these results, the working group defined a set of 12 learning outcomes with the idea that certain learning outcomes might be developed within a category of interchangeable courses. presents the initial 12 learning outcomes put forth by the working group.

The learning outcomes highlighted in yellow describe the Descriptive bullet, the blue describes the Visualization, the purple describes Inferential, and the green describes Predictive. Several outcomes, highlighted in orange, focused on Application. The remaining outcomes characterized data processes. Using these 12 learning outcomes as a guide, an external community survey was administered.

5.2 Community Survey

To validate the 12 learning outcomes, a community survey was administered online. The goal of this survey was to assess whether peers at other universities and in industry would also view this set of outcomes to adequately represent the skills that a university graduate should have today. The online survey was sent out by members of the working group to peers and to listserves for several disciplines (specifically sent out by the American Statistical Association and CAUSEWeb). It was also posted on several listserve forums (e.g., isostat). A total of 367 people opened the survey and 287 people completed the survey within the allotted time frame of one week. shows the distributions of backgrounds of people who completed the survey. The survey was largely dominated by College and University Faculty with 82% of the total respondents. Industry scientists, researchers or consultants made up the next largest category at approximately 6% of the respondents.

Table 4 Distribution of background of survey respondents.

A total of 14 disciplinary backgrounds were represented in the respondents as noted in . Statisticians were the largest group of respondents, with mathematicians and psychologists being the second and third largest.

Table 5 Distribution of disciplines of survey respondents.

Because the American Statistical Association (ASA) helped in the distribution of the survey, it was expected that statistics would have a large response rate. The working group team all sent out to their personal contacts, however, due to feasibility, the only large organization to actively post and distribute the survey was the ASA. Despite the imbalance in discipline representation of the survey respondents, the responses were still varied.

shows the percentage of survey respondents that agreed, were neutral, or disagreed with the statement that the learning outcome was an important skill that a university student must acquire.

Table 6 Percentages of agree, neutral, and disagreements.

Respondents grouped the LOs into roughly three categories. Of the 12 learning outcomes, four seemed especially important, as 90% of respondents agreed that they are important skills that a university student should acquire. These included univariate statistics, descriptive statistics, graphs and visualizations, and communicating in context. Five other learning outcomes had a large majority of respondents stated that they agreed or were neutral. This category included inferential statistics, predictive statistics, discussion of limitations, multivariate statistics, and use of software. Only three learning outcomes had large disagreements with the statements. This third category included having a large project, writing a program to analyze data from scratch, and studying advanced statistical methods.

Based on these data, the working group agreed that a student meeting all 12 learning outcomes would be deemed to have undergraduate data acumen. Due to the disagreements on three learning outcomes, subsequent levels of data acumen were then defined (see Bargagliotti et al. 2020 for a description of the categorizations of levels of acumen).

5.3 Policy Documents

To further validate the learning outcomes, nine curriculum guidelines from various professional organizations were reviewed by the working group. These guidelines specifically discussed students’ necessary data acumen skills for a given discipline. The working group identified different disciplines that had position statements or curriculum guidelines that mentioned statistics or data education explicitly. Those disciplines represented in the policy documents were: mathematics, statistics, psychology, economics, sociology, science, engineering and medicine. The policy documents reviewed were:

  • American Statistical Association, Curriculum Guidelines for Undergraduate Programs in Statistical Science

  • Mathematical Association of American CUPM-MAA’s Committee on the Undergraduate Program in Mathematics

  • American Statistical Association, Guidelines for Assessment and Instruction in Statistics Education (GAISE) in Statistics Education (GAISE) College Report College Report2016.

  • American Psychology Association, APA Guidelines for the Undergraduate Psychology Major

  • American Economic Association, Recommended Mathematical Training to Prepare for Graduate School in Economics.

  • American Sociological Association, The Sociology Major in the Changing Landscape of Higher Education: Curriculum, Careers, and Online Learning, A Report of the ASA Task Force on Liberal Learning and the Sociology Major

  • National Academies of Sciences et al. (2018a), Data Science for Undergraduates: Opportunities and Options.

  • National Academies of Sciences et al. (2018b), Envisioning the Data Science Discipline: The Undergraduate Perspective: Interim Report.

  • American Statistical Association, The Two-Year College Data Science Summit

illustrates that seven of the 12 learning outcomes were discussed in all of the policy documents. The remaining learning outcomes were supported by most of the documents.

Table 7 Policy document alignment with LOs.

Interestingly, the policy documents all mentioned a learning outcome that was not included in the hypothesized 12 outcomes. That is:

Students should become critical consumers of statistically-based results reported in popular media, recognizing whether reported results reasonably follow from the study and analysis conducted.

Due to its inclusion in all of the policy documents from the various disciplines, the working group felt that it should be added to the 12 developed learning outcomes. This LO aligned with growing societal needs of being able to merely ingest the news and participate in the information age. Because it was included in all eight of the policy documents, the project team opted to include it as an explicit LO. A total of 13 LOs were then proposed.

5.4 Final Learning Outcomes

presents the final 13 Undergraduate Data Pathways (UDaP) learning outcomes that were established as important for students to meet at the university level currently today. Several edits to the original learning outcomes were undertaken. They were:

Table 8 Final UDaP learning outcomes.

  • LO6 emphasizes that the project must count for a large portion of the final grade but does not specify an arbitrary percentage

  • LO11 articulates that the use of software be used to manipulate, extract information, and carry out statistical analyses from data

  • LO12 is rewritten to better reflect the data tasks a student would undertake using a software program

These adjustments were made based on the open comments received in the community survey, feedback from reviewers of this article during the revision process, feedback from audiences when the paper was presented in three different settings, discussions among the PIs on how to incorporate the comments, and approval from the working group members in writing the final LOs. Students meeting these 13 learning outcomes are deemed to have undergraduate data acumen.

The UDaP learning outcomes span both content and process. The important themes of using real data, communication with data, and technology are well-represented within the learning outcomes as well. These outcomes are meant to be broad and cross-disciplinary so they can serve as benchmarks across all disciplines offering statistics and data education courses on a university campus. These learning outcomes stemmed from two-years of discussions within the working group as well as the review of the policy documents and the community survey.

6 Discussion and Future Research

While there has been a large increase in data science and statistics majors and minors across the US over the past several years (Pierson Citation2018), explicit learning outcomes to govern such programs are relatively new (see Gould et al., 2018; National Academies of Sciences et al. 2018a and 2018b). Furthermore, while there is consensus that data education reaches across disciplines the wide reach and wide importance of data across disciplines makes it difficult to put forth coordinated efforts for student learning. In the cross-disciplinary context across departments and disciplines, no set of coordinated learning outcomes exist as a bridge to data education. There is an important need to acknowledge that data education is not taught solely in statistics, computer science departments, or within a single data science program but instead working with data is present in most disciplines and is often intertwined with disciplinary content. Therefore, although guidelines exist that specify recommendations on how to teach statistics courses (GAISE,?) and guide data science specific programs (Gould et al., 2018; National Academies of Sciences et al. 2018a, 2018b), these guidelines are not designed to bridge the interdisciplinary context.

Several challenges emerge as data education is conceptualized across disciplines. Perhaps a first step in advancing this conceptualization is an agreement of some basic content and process outcomes that students should acquire. The implementation of such outcomes necessitates departmental agreements and a concerted effort to create opportunities for students to advance their data acumen despite potential departmental limited offerings.

To develop goals for data education at the undergraduate level, the UDaP project explicitly considered the cross disciplinary nature of coursework related to data as well as the overall learning goals for students driven by current workforce and societal needs. As society pushes toward being more data-driven, it is important to understand and characterize what education should be doing as a response. Moreover, cross-disciplinary demands are more and more emerging in society with data being embedded in policy and discussions across all subjects. As such, how we conceptualize data acumen at the undergraduate level must be flexible enough to bridge many contexts and students with diverse academic backgrounds. This is different than the way the literature has conceptualized data science as being a three circle Venn diagram with computer science, statistics, and context; instead undergraduate data acumen aims to be flexible and broad to span disciplines. In other words, a sociology major must have the opportunity to gain data acumen just as much as a computer science major.

The UDaP learning outcomes presented in this paper can be used by colleges and universities that plan to assess their capacity across disciplines to produce undergraduates with data acumen by matching existing course offerings with the learning outcomes presented here. This could provide insight about the accessibility, quantity, and difficulty of existing pathways to achieving data acumen and guide the resource-efficient development of new pathways using cross-disciplinary badges, concentrations, minors, or majors. The UDaP learning outcomes can form a basis for ongoing assessment of data-related concentrations, minors, or majors. Moreover, they could form a basis for assessment of the role of co-curricular learning through internships, campus jobs, etc., toward students earning badges around data acumen. Universities with specific statistics departments or data science programs can lead such efforts by ensuring that their offerings can meet the learning outcomes without many prerequisite costs to students. Efforts for general data education courses (much like writing requirements) required by all students at a university could fulfill such a need.

Through a rigorous process, UDaP developed a set of 13 learning outcomes for undergraduate data acumen at the university level. The learning outcomes focused around three important themes of working with real data, communicating data driven results, and working with technology. Of the developed learning outcomes, five focus on process and communication while eight focus specifically on content. This breakdown reflects the changing needs of statistics education today.

This paper offers the important initial step in finding common ground across disciplines. The creation of a working group of “change agents” from different disciplines on a university campus that are invested in furthering data acumen in students has been an invaluable asset to the project. Next steps for research could include the design of assessments and curriculum that could bridge disciplinary contexts as well as the development of curriculum and projects that foster collaboration among students and embody the learning outcomes (e.g., https://ww2.amstat.org/education/datafest/, https://www.causeweb.org/usproc/).

The authors hope that this study will persuade readers to consider doing something similar on their own campuses. The manuscript provides an example of how to have cross-disciplinary discussions which can be invaluable to creating opportunities for students to achieve data acumen.

Additional information

Funding

This work is supported by NSF Grant No. 1712296

References

Appendix A

A.1 Internal Working Group Faculty Survey

Welcome to Project Undergraduate Data Pathways (UDaP). As an initial step in our research, we would like to gather some feedback about your opinions and thoughts about data analysis and statistics at LMU and beyond. We greatly appreciate you taking the time to answer the questions below. This survey should take approximately 10 minutes to complete.

A.2 Background Information

1. Please enter your name: __________________________________________________________________

2. What is your home department?

Business

Educational Psychology/Educational Statistics

Mathematics

Mathematics Education

Psychology

Economics

Biology

Sociology

Other, please specify: ______________________________________________________________

  1. Please classify your position:

    1. Adjunct Faculty/Instructor (part time)

    2. Adjunct Faculty/Instructor (full time)

    3. Faculty (tenure track)

    4. Faculty (tenured)

    5. Other, please specify: _________________________________________________________________

  2. How many years have you been teaching statistics or data analysis courses at LMU or somewhere else? Please specify how many years at LMU and how many years at other places.

  3. What courses do you teach related to statistics/data analysis at LMU? List the name of the course and the course number.

  4. Please rate the frequency with which you analyze data outside of your coursework in statistics (e.g., research, consulting, etc.)

    1. On a daily basis

    2. On a weekly basis

    3. On a monthly basis

    4. Once per semester

    5. Once per year

    6. Less than once per year but occasionally

    7. Never

  5. Select all the types of data analyses you perform outside of your coursework (e.g., research, consulting)

    1. Descriptive

    2. Confidence intervals

    3. T-tests

    4. ANOVA

    5. Non-parametric tests

    6. Regression

    7. Linear models

    8. Visualization

    9. Parameter estimation

    10. Factor analysis

    11. SVD

    12. PCA

    13. Non-linear regression

    14. Generalized linear models

    15. Logistic regression

    16. Classification

    17. Clustering

    18. Supervised learning

    19. Unsupervised learning

    20. Longitudinal models

    21. Time series

    22. Other, please specify: _________________________________________________________________

  6. What type of statistical procedures do you do a lot of? For example, do you do factor analysis over and over again or do you do a variety of analyses depending on the type of data you have in front of you?

  7. How often do you use the same set of statistical procedure?

    1. 0-25% of the time

    2. 26-50% of the time

    3. 51-75% of the time

    4. 76-100% of the time

  8. Circle all the types of data you use outside of your coursework (e.g., research, consulting)

    1. Observational data with no time or spatial component

    2. Longitudinal/Repeated Measures/Panel

    3. Time series

    4. Big data (e.g., image, voice, spatial), please specify: _________________________

    5. Experimental

    6. Other, please specify: _________________________________________________________________

  9. Has anything changed with the way your discipline makes use of statistics or gives importance to statistics over the past 5, 10, 20 years? Please explain.

  10. In your discipline, do you deal with Big Data and if so, how is Big Data defined?

A.3 Teaching Information

Write in the table the percent of class time that you spend for each of the items below for each of your classes. In the table, write the course number for each item. For example, I teach math 104, 204, and 360 and on the first item, I would fill in the table in the following way:

Fill in the table below.

For questions #22–#25, if you teach several classes, write in the course number for the class if you answer varies per course. For example, I teach math 104, 204, and 360 and I would answer question 20 in the following way:

Indicate the type of data that you believe helps students learn statistics best.

  1. All constructed data

  2. Mostly constructed data

  3. Equal amounts of constructed and real data (360)

  4. Mostly real data (204)

  5. All real data (104)

22.Indicate the type of data that you use helps students learn statistics best.

  1. All constructed data

  2. Mostly constructed data

  3. Equal amounts of constructed and real data

  4. Mostly real data

  5. All real data

23.Indicate the method of computing numerical solutions to problems that you believe helps students learn statistics best.

  1. All solutions computed by hand

  2. Most solutions computed by hand

  3. Equal amounts of computing solutions by hand using technology tools

  4. Most solutions computed using technology tools

  5. All solutions computed using technology tools

24.What computing resources do you have available in the classrooms you teach in?

  1. I teach in a computer lab

  2. Students are required to bring their laptops

  3. Students can bring their laptops but it is not required

  4. Students have calculators

  5. Students have no technology in the classroom

25.What software do students use in your class?

  1. SPSS

  2. STATA

  3. SAS

  4. MATLAB

  5. R

  6. Python

  7. Point and click statistical software (e.g., Minitab, Tinkerplots, Fathom, JMP, StatCrunch)

  8. Excel

  9. TI Calculators

  10. None

  11. Other, please specify: _________________________________________________________________

Overarching Information

26.Are you happy with the way statistics is currently taught at LMU?

  1. Yes

  2. No

  3. Somewhat

27.If not, how do you believe the teaching of statistics at LMU needs to change?

28.Identify any institutional constraints that keep you from being satisfied with the manner in which statistics is currently taught at LMU (Check all that apply).

  1. Hiring issues (not enough, not the correct people, etc.)

  2. Institution does not understand current statistical needs and what statistics is

  3. Institution does not provide access to the required technology, materials, etc. to teach statistics well

  4. Student body does not have an interest in statistics or does not have the intellectual ability for statistics

  5. Other, please specify: _________________________________________________________________

29.Identify any constraints that keep you from making any changes that you would like to implement to improve your statistics courses (Check all that apply).

  1. Your personal time constraints

  2. Institutional constraints (e.g., choice of textbook, class size, mandated curriculum, etc.) If you circle this, please specify: _____________________________

  3. Technology constraints (e.g., lack of computer lab, cost of software)

  4. Characteristics of students (ability, interest, etc.)

  5. Other, please specify: _________________________________________________________________

30.Does your discipline organization (e.g., Mathematics Association of America, APA) have guidelines or other resources for statistics teaching and learning?

  1. Yes

  2. No

  3. I don’t know

31.If yes, do you follow those? How closely? What is your involvement with them? How do you use the guidelines directly in your classroom?

What is your primary professional role? If other, please specify.

  • College/University Faculty

  • College/University Administrator

  • Industry Scientist/Researcher/Consultant

  • Government Scientist/Researcher/Consultant

  • Other

What field of study do you feel most close to in your work? This could be the field of your highest degree or the field that relates most closely to your current work. If other, please specify.

  • Statistics

  • Data Science

  • Mathematics

  • Computer Science

  • Sociology

  • Psychology

  • Business

  • Education

  • Biology

  • Chemistry

  • Economics

  • Political Science

  • Other

Appendix B

Course Descriptions for Statistics and Data-Related Courses at LMU

Some courses listed in the table have multiple course numbers due to cross-listings or changing courses numbers during the study period.

Appendix C

Community Survey. The following questions were asked in the external community survey.