2,560
Views
0
CrossRef citations to date
0
Altmetric
Data Science

Students’ Experience and Perspective of a Data Science Program in a Two-Year College

, , , ORCID Icon, ORCID Icon &
Pages 248-257 | Published online: 20 Jun 2023

Abstract

Two-year colleges provide the opportunity for students of all ages to try new subjects, change careers, upskill, or begin exploring higher education, at affordable rates. Many might begin their exploration by taking a course at a local two-year college. Currently, not many of these institutions in the U.S. offer data science courses. This article introduces the perspective lens of students who have gone through the Montgomery College Data Science Certificate Program. We found that, contrary to many other educational fields at the College, data science students tend to come from diverse backgrounds and career paths. A common theme emerged that all students learned valuable skills and applications such as coding in various programming languages and approaches to machine learning. Other meaningful themes included an appreciation of course accessibility, especially catered toward busy professionals who might only be able to take evening courses. Students appreciated learning that data science and ethics are intertwined. Finally, it was evident that going through the data science program positively impacted the lives and careers of these students. The implications of the themes of these student experiences are discussed as they relate to data science education. Supplementary materials for this article are available online.

1 Introduction

Community colleges form the backbone of higher education in the United States, in part because they are agile and adept at changing to meet local, state, and federal workforce needs. “Perhaps because of their unique design as American institutions, community colleges have often been bellwether institutions for change, leading the way into new and unexplored territory” (O’Banion Citation1997). According to the American Association of Community Colleges (AACC), in Fall 2020, 39% of all undergraduates in the United States have been students at community colleges. Additionally, 36% of first-time first year students were students at community colleges. Two-year schools make higher education affordable and accessible for diverse populations of students and are essential institutions to help students achieve success in higher education.

Community colleges spin the web that connects many stakeholders—students, full- and part-time faculty, business and industry partners, 4-year transfer institutions, and funding organizations. Two-year colleges are tasked to not only work with all of these entities, but to create courses, programs, and degrees that are agreeable and beneficial to these stakeholders. Additionally, they must evolve rapidly as any components of the connected web change. It is this innate collaborative requirement that community colleges have grown to embody.

Data science is a growing and evolving discipline that initially focused primarily on providing credentials at the graduate level. As the need for more data scientists and analysts continues to increase, institutions at all levels will need to amplify their efforts to provide courses, certificates, degrees and other stackable credentials to fill these positions. The U.S. Bureau of Labor Statistics (BLS) sees strong growth in the data science field and predicts the number of jobs will increase by about 28% through 2026, which is about 11.5 million new jobs in the field (Rieley Citation2018). Parker et al. advised a call to action, that “every school is facing the reality that to truly prepare their student body for the expectations of 21st century employers, they must find a way to incorporate core critical thinking and data-intensive skills into nearly every discipline.” (Parker, Burgess, and Bourne Citation2021)

Community colleges must be poised to take a strong role in filling this particular need for data scientists and analysts. In fact, in NASEM’s (2020) Roundtable on Data Science Postsecondary Education Meeting #11 highlights, Nicholas Horton “emphasized the important role that two-year colleges play in the education system and in the development of a diverse and inclusive workforce” (2020). In addition, according to NASEM’s (2018) Data Science for Undergraduates: Opportunities and Options report, data science programs should be built to provide “relevant foundational, translational, and professional skills for data scientists in various roles; the use of high-impact educational practices in the delivery of data science education; and strategies for broad participation in data science education that rely on formal modes of evaluation and assessment” (National Academies of Sciences, Engineering, and Medicine Citation2018).

Due to their ability to engage students from a broad range of backgrounds, data science programs have the potential to attract students from diverse groups and promote equity and inclusion by: increasing the recruitment and retention of underrepresented students, providing a safe forum for students to discuss ethical implications in data science, and providing a curriculum designed to promote student success, retention, completion, and access to career opportunities. Data science education at all levels is responsible for providing students, especially from minority and underrepresented populations, access to paths into the digital workforce.

Montgomery College (MC) is a two-year college located outside of Washington, D.C. It was ranked by Chronicle of Higher Ed (2018) as the Most Diverse Two-Year College in the continental United States and has a current enrollment of approximately 43,000 total students across three campuses. In 2015, faculty and administrators began working to create a data science certificate program for the College. Those faculty and administrators collaborated with experts in academia, industry, and government with open-ended learning outcomes to build this new program, knowing that the program would evolve over time. Fortunately, invaluable resources now exist to assess the data science program in relationship to national expectations. The Dana Center at University of Texas produced the Data Science Course Framework, which provides guidelines for data science pedagogy and curricula. Fundamental data science course principles include: active learning, growth mindset, problem solving, authenticity, context and interdisciplinary connections, communication, technology, and assessment (Charles A. Dana Center at The University of Texas at Austin 2021). Other resources include the meeting of the NASEM Roundtable on Data Science (2019), the NASEM Data Science for Undergraduates: Opportunities and Options (2018), and the NSF-funded Two-Year College Data Science Summit Report (Gould et al. Citation2018). In particular, MC’s data science program is working to meet recommendations from the NASEM (Citation2018) report, with a focus on recommendations 2.2, 2.4, 2.5, and 4.1. Highlights from these recommendations include:

Recommendation 2.2: Academic institutions should provide and evolve a range of educational pathways to prepare students for an array of data science roles in the workplace.

Recommendation 2.4: Ethics is a topic that, given the nature of data science, students should learn and practice throughout their education. Academic institutions should ensure that ethics is woven into the data science curriculum from the beginning and throughout.

Recommendation 2.5: The data science community should adopt a code of ethics; such a code should be affirmed by members of professional societies, included in professional development programs and curricula, and conveyed through educational programs.

Recommendation 4.1: As data science programs develop, they should focus on attracting students with varied backgrounds and degrees of preparation and preparing them for success in a variety of careers.

The learning outcomes for the data science programs at Montgomery College align with the above NASEM recommendations, and are as follows.

  • Assess different analysis and data management techniques and justify the selection of a particular model or technique for a given task.

  • Execute analyses of large and disparate datasets and construct models necessary for these analyses.

  • Demonstrate competency with programming languages and environments for data analysis.

  • Summarize and communicate findings of complex analyses in a concise way for a target audience using both graphics and statistical measures.

  • Understand, evaluate, and apply ethical principles and practices in the data lifecycle.

The first cohort to join the college’s Data Science Certificate program in 2017 included 30 students. To date, we have had over 200 students who are either currently enrolled in or have taken data classes in the past. As the part-time and full-time faculty that have taught these students gain better insight into what stakeholders expect and need from the data science program, the department adjusts the content of the courses over time. One of the program’s greatest assets is the sheer diversity of students’ backgrounds—some students enter without any prior degree, while some students enter the program possessing undergraduate or even in some cases graduate degrees. A total of 59 students took at least one data science course during the spring 2022 semester. There was slightly higher representation from males to females (58%, 41%), and age groups ranging from teens to retired professionals (min = 18, max = 71) with a mean age of 32.5 and a median age of 27 years old. Black and Hispanic students comprised 43% of the current student body. Black students represented 24% (n = 14), Hispanic students represented 19% (n = 11), Asian students represented 29%(n = 17), Multi-race students represented 3% (n = 2), and White students represented 25% (n = 15) of the students who took data science courses.

Students must complete five courses, or 16 credits, to earn the certificate: one introductory statistics class of their choice; Introduction to Data Science which focuses primarily on data ingestion, cleaning, wrangling, Exploratory Data Analysis (EDA), and ethics; Data Visualization and Communication which includes coding in R and R Studio, exploring the Tidyverse, principles of Visualizations, web scraping, ethics, Tableau, GIS, and reproducible research; and Statistical Methods in Data Science which includes coding in Python while exploring advanced statistical methods, machine learning, clustering and regression. Most of the data examples used in the two introductory courses are already fairly clean, meant to provide accessible examples for learning data wrangling and cleaning techniques, though as students move through the sequence of courses, they learn increasingly complex methods for wrangling messy datasets. Throughout the curriculum, students are taught techniques for combining datasets, tidying, and transforming data. Dealing with outliers and missing data, as well as considering ethical questions around imputing missing data are also covered. The data science program curriculum culminates in the Capstone Experience in Data Science. In this course, students, using the coding and analytical tools they prefer, work with an industry or governmental partner. The capstone project entails a semester-long project to develop a question, formulate a strategy to use real data to answer that question, identify appropriate data sources, wrangle that data into a usable format, and analyze and communicate the results of the research to interested stakeholders.

It is in this final course that students display their accomplishments and the program gains some of its greatest visibility. Students acquire real-world experience, and county officials and other industry partners receive useful, sometimes eye-opening, information. As an example, a few years ago, a student merged our county’s open datasets on police traffic stops with reported car crashes. He created a visualization that showed that over time, there existed a direct inverse relationship between number of police stops and number of crashes. He presented his information to members of the county executive office and the police chief, who appreciated the visualization and intended to share it with his officers to encourage more traffics stops to prevent future car crashes.

As the number of students who have graduated from the certificate program has grown, we thought this was a perfect time to convene a group of our data science students to have them reflect deeply on their experience in this two-year college data science program. It was interesting to learn how our goals and expectations of the program did and did not match those of our students. Through the following student reflections, we hope faculty and administrators from both two-year and four-year institutions, will gain new insights about their own current or future programs.

2 Methodology

Students at community colleges have different characteristics from those of traditional undergraduate students at four-year institutions. They tend to have more outside responsibilities on top of the classes they take, including working at part-time and full-time jobs and caring for children and extended family members. They tend to be older, have lower incomes, and come from underrepresented minority (URM) populations. In order to complete the data science certificate at MC, students may complete the sequence of classes in as few as two or three semesters, so there is a short turn-around for students to participate in long-term projects such as collaborating to write articles. In addition, in spring of 2022 when student contributions for this article were solicited, 75% of all students taking MC’s data science classes were working professionals who held either undergraduate or graduate degrees. Because of these reasons, a convenience sample was used to select students to collaborate to write this article. Initially, 10 students who had either completed the certificate or were taking the final class to complete the certificate were invited to join the project. Initial zoom meetings to explore ideas about writing the article were held in the evenings in consideration of students’ class and job schedules. Five students agreed to work on the project to share perspectives about their experiences.

All five student authors of this article no longer attend MC. One of the authors transferred to an undergraduate four-year institution to study mathematics while continuing to work professionally as a data scientist at a local organization, two of the authors are currently attending graduate schools, one of the authors continued to work in her government job while taking additional data science courses, and one of the authors, a retired professional, is still exploring data science options.

3 Student Backgrounds

The diversity of our students’ backgrounds is impressive and contributes significantly to both the instructional experience and to what our students get out of this program. Several discovered data science as they were pursuing other studies.

Camilo is an excellent example of this:

I came into data science by mistake, led by my never-ending curiosity. I wanted to understand why when making tea, 10 minutes into brewing, my scale would magically go up 8 grams. I asked one of my professors at Montgomery College about this, and she recommended I take Biostatistics. This Biostatistics class ended up being 90% statistics and 5% biology (the missing 5% will become important later).

In this experience with the programming language R and statistics, I discovered that my experience with abstract thinking - thank you, linear algebra - became very useful when trying to trick my computer to behave in the way I wanted. This new experience of using math and statistics to write programs gave me a freedom I had never felt. I could start answering questions based on actual data. I became skeptical about where these data were coming from, and I came up with questions I would have never thought of. I found that abstract thinking had many real and visible applications. I decided to take more Data Science classes, and that missing 5% of the Biostatistics class started becoming more and more important. That 5% is everything I had not expected to learn and that is not directly related to statistics. During my first project at NIST, I worked on designing software to improve the storage and access of materials data. It was during this experience that I realized that data science is not only statistics. There is so much more! This is what fuels my passion for becoming a data scientist. And as my mentor at the National Institutes of Health (NIH) in my Data Science Traineeship put it: being a data scientist is becoming a jack of all trades and a master of one.

Marilyn had a similar experience:

I retired in the fall of 2020. I’d planned to go back to school in retirement because I’ve always enjoyed learning. And I enjoy the academic environment, the interaction with faculty and other students. Interacting with people, even remotely, who have varying backgrounds, experience and viewpoints enhances and increases my learning. I’m learning Data Science and statistics for fun and to keep my brain sharp and oiled for as long as I can. And if I happen to build this into a fun second, or third, career that would be a bonus.

I graduated with a B.S. in Mathematics and my first job out of college was developing software for the Space Shuttle backup flight systems database. I’ve been a programmer, a database designer, a database manager, a network manager, a systems analyst, a business analyst, and a business architect. And through it all what I really enjoyed was understanding data, designing with and for data, analyzing and communicating with data.

I had started looking at the MC class catalog to take a few language courses to brush up my French and German. I’d taken a few courses at MC before and I like the educational atmosphere of the community college. It’s a good place to explore a new subject without large expense but with full academic accreditation. When I saw the Data Science program description in the catalog it resonated so well with my career experience that I decided to give it a try. I’m very glad I did, I’ve thoroughly enjoyed the classes.

Mary came to the field with no background in mathematics at all:

I have a thoroughly liberal arts background, with multiple degrees in history and area studies. I also went to work in a field that requires no quantitative analysis whatsoever—diplomacy. In the course of my work, however, I began to come across people who used computer and data science. I served in Estonia, which is famous for its e-government. There I spent a great deal of time learning about and introducing visitors to this concept, which it turns out is based on understanding and using Big Data. Later I took a position as a Russian political analyst. In that role, I learned about the possibility of using open-source data to understand Russian political trends and behavior. At the same time, I discovered the American Historical Association was offering short classes for historians on how to use data science to analyze historical developments. Thus, it was becoming increasingly clear that data science had a lot to offer in fields I was interested in. My curiosity piqued, I began to take free online courses like Coursera and EdX. Those were fun, but I didn’t feel they gave me a comprehensive enough understanding of the subject.

But not all our students stumbled across the field. Some, like Juan, came with intention:

My professional career has always included some aspect of data analysis and statistics. During and after completing my master of arts, I tutored graduate students and professional analysts in the use of SPSS and Stata for data analytics and statistics. I learned a lot from this experience, including how important the use of data is for so many different fields. I use data everyday now and I really enjoy it.

I was formally introduced to data analysis when I took Statistics 101 while completing the requirements of my B.A. in Sociology. From then on, I really enjoyed statistics and data analysis so when I decided to pursue a M.A. in Sociology, I took as many statistics courses as I could. After completing my M.A., I was proficient in several programming languages (e.g., SPSS, Stata) but I still wanted to learn Python, R, and SQL specifically. I took several pre-recorded online courses that used these languages to stay motivated, but the courses were not as useful as I had hoped. Eventually, I came across the Data Science Certificate at MC and decided to enroll. The program worked well with my schedule since all the courses were offered in the evenings. I learned so much from the first course that I decided to complete the program. I graduated with the first cohort to finish theprogram.

Finally, Jennifer came to the program because she realized the skills would benefit her existing career:

My professional background has followed a nontraditional, interdisciplinary path. My career began as a human rights lawyer and public defender. Research and statistics were useful in that role because of how they support persuasive arguments. Data visualizations often were an even more effective influential tool. But at the time, statistics were used in a fairly cursory way. As lawyers we did not engage in data collection on our own, nor did we engage in extensive evaluation of the mathematical analysis behind the figures (as many researchers can attest to when they read judicial decisions and legislative framings that are based on frustrating misinterpretations, misuses, and skewed decontextualization of scientific data). That is now beginning to change, and legal offices are increasingly maintaining data on winning trial strategies, successful mediation approaches, and gaining a more specialized, comprehensive understanding of the science underlying legal issues at hand (Walters Citation2019). Particularly within the criminal legal system, artificial intelligence is being increasingly centralized in some judicial decision making, such as in bail determinations and sentencing (Rankin Citation2022). Moreover, legislative efforts intended to create regulations for technology industries require incorporation of people with an enhanced knowledge of computer and data science working as and amongst current policy makers (Rodrigues Citation2020).

Since my time as an attorney, I have worked in various disciplines as a teacher, designer, community educator, and project manager. All of these roles increasingly incorporated programming, data science, and data visualizations into their central operations. As a teacher, data was used to determine curriculum development, programmatic impacts, student needs, and outcome measurements for grant applications and reports. As a designer, data is increasingly used both as a tool and a subject of creative endeavors. Many artists are using mediums of data visualizations in artistic messages, including with the increase of social justice and community-based artistic practices. Artists are also increasingly relying on technology as an artistic tool and learning programming so they can put their personal touch upon technological visual canvases. As designers increasingly utilize software for 3D drawing, animation, and rendering, they also increasingly desire to know how to control and manipulate these tools on a level that they can personally customize and create with. As a project manager, at an architecture and engineering firm, data science was utilized for project assessments and comparisons, budgetary reviews, future development projections, and an array of actual design applications and preparations for client presentations.

I discovered MC’s data science certificate program when I was looking to brush up my statistical programming knowledge in preparation for a graduate counseling psychology program with an extensive quantitative research component, since I last took a statistics course in university 20 years ago when we still used only a hand-held calculator for quantitative assistance. After taking the first course in the certificate, which had joint statistical and programming curricular components, I knew that these skills would be extremely useful in nearly every professional trajectory I imagined. Thus, I decided to complete the entire certificate program. I believe completing the MC data science certificate program was one of the wisest decisions I have made.

4 What We Got from the Program

Coming from a variety of backgrounds and seeking different experiences, unsurprisingly we all get something a little different from the MC Data Science program. A common thread, however, was that we all learned some coding in Python and R and how to use Tableau.

Mary highlighted the coding experience and the broader methodology she acquired:

The certificate program taught me a good deal about how to use tools for data science. We learned both R and Python programming for basic statistical analysis, data cleaning and data wrangling, and data visualization. We also learned some basic machine learning techniques. The capstone project was especially useful, as it taught us how to frame a problem/question from the beginning and work our way through to a completed project.

Juan noted the accessibility of the education and what it taught him were keys for his professional achievement:

The courses in the MC Data Science Program were very accessible compared to the courses I took online. I found that talking to the faculty and the other students made the experience of learning more enjoyable. Since then, I have stayed in touch with several people from the program. Soon after I finished the program, I began working in a local research firm and I use SPSS, Stata, R, and Tableau often in my current role. I am certain that having gone through the MC Data Science Certificate Program has made me more successful in my career.

Jennifer had similar observations, but also stressed the diversity of the student body and the practical experience of the faculty as important:

Regarding learning the actual coding skills themselves, I come from a place of having absolutely no knowledge of programming prior to this certificate. The MC program struck a balance wherein I felt the material was accessible and possible to learn while also being challenging. To use Lev Vygotsky’s term, the program successfully created a “zone of proximal development” (Anderson and Gegg-Harrison Citation2013). The combined class materials, which provided a hands-on, step-by-step guide through the basics, followed by independent practice assignments and online supportive programs, such as DataCamp exercises, allowed me as a student to feel I had enough foundational material to work off of to experiment and expand my knowledge through independent and group practice. Each course involved a major final project, and the certificate program as a whole culminated with an in-depth capstone project which required us to independently apply our skills to create a comprehensive and socially meaningful report. These final projects were mostly based on autonomous student application and integration of skills with readily accessible support from knowledgeable teachers in moments when we became stuck, despite numerous individual problem-solving attempts aided by Professor Google. Many of our professors were simultaneously working in the field, so they were providing us with practical guidance to prepare us for the environment and contextual realities of our future workplace requirements as well.

Finally, one aspect of the MC program that greatly enriched my experience was the vast diversity of students the program attracted. In part this is due to the rich diversity found within community college populations, and in the D.C. metro geographic area, but it is also a testament to the incredible skill of the professors who made the learning experience a positive and inclusive one for students coming from a range of abilities and backgrounds. My classmates included high school students taking advanced college credits to multiple people with advanced degrees including numerous Ph.D.s and other professionals with decades of experience. The program curriculum was thoughtfully designed and the professors were universally supportive, inspiring, responsive, and respectful to the various group needs. I cannot speak highly enough about my experience in the data science certificate program, and I hope to maintain contact with the inspiring community of students, professors, and professionals that I have been introduced to through this program. Collectively we have created a network of engaged thinkers across a vast array of endeavors and fields we are each applying our data science skills to. We have the potential to support genuinely innovative ventures and cross-disciplinary perspectival applications through our collective engagements and future collaborations. I am excited and hopeful about the future impact that the MC data science community will have.

Marilyn appreciated the cross-fertilization of skills in different courses:

I found it beneficial to take statistics and data science courses together. The data science tools were used in the statistics course, and understanding the mathematics informed the mechanics of the data science course. I had fun pushing just a bit beyond the boundaries of class assignments, researching code packages, functions, and examples, to find alternative solutions or to give my work a bit of personal style. Knowing that there are alternative solutions is something that’s emphasized in the courses, particularly in the Data Visualization class.

Finally, for Camilo the coding experience was also key, but especially important in the way it enabled him to learn on his own as well:

With the projects and the guidance of professors, I gained the ability to become an autodidact for specific coding needs that arose in my journey. Professors were great at teaching the general learning objectives, but each data science project is so unique, that the skill of learning how to solve specific problems and issues with each project by doing research is of great importance.

I also gained a deep appreciation for the work that data scientists do, especially all of the parts of each project where the focus is on organizing and cleaning the data. This transformed into a deep desire to make meaningful contributions to this part of the field in order to optimize these processes.

5 The Importance of Data Science Ethics Education

Ethics is an important part of data science, and it is key that the concepts are taught as students learn the basic data science skills. Each of us noted the importance of that education at MC.

Mary:

Ethics was an important part of the curriculum in the Data Science program. Through readings and discussions, we covered topics such as the honest use of data (and dangers of things like p-hacking) and ethical issues arising from the unquestioned use of data science and machine learning. This included fascinating videos on racial discrimination embedded into machine learning development, and other types of discrimination inherent in the use of data for decision making.

Juan:

The cornerstone of my research has always been social justice. I was happy that the program allowed me to use public data that I chose and to learn to use tools like GitHub so that I could disseminate my findings publicly for all to see. I think that data transparency is extremely important so using free tools such as R and Python along with public data is very powerful.

Marilyn:

There’s both art and discipline in the field, ethics is an integral part of every course in the program, and I appreciate the sense of personal and community responsibility this encourages.

Jennifer had some suggestions on what can be expanded in data science education writ large as well:

The component of data science education that I view as the most crucial is a genuine inclusion of discussions surrounding ethics, self-reflection, and empathy. The people who control knowledge and access to data manipulation and who are fluent in coding languages are increasingly sculpting the parameters of our society (Lee, Resnick, and Barton Citation2019). Social media has profoundly impacted our social and democratic reality (Freiling et al. Citation2021; Goyanes, Borah, and Zuniga Citation2021; Kubin and von Sikorski Citation2021). In a time when economic inequities and access to resources are at one of the most extreme divisions in the last century, the opportunities for harmful impacts upon those who are most vulnerable in our society are great, and increasingly the tools for exploitation, harm, or alternatively, equalizing accessibility and institutional equity fall to a great extent within the hands of coders (Walsh Citation2020). With great power comes great responsibility. Given my experience witnessing the tremendous impact that data analysis, coding, and algorithms have upon individuals, particular communities, society at large and our global community, a core component of every data science program should be the inclusion of exercises that allow students to understand the great impact that their work will have upon others, particularly others who have backgrounds different from their own. As with most science and math-based disciplines, data and computer science fields are still dominated by a population that is mostly White, cisgender, heterosexual, able-bodied, male, and usually of middle-class background or higher (Paxton Citation2020). It cannot be overstated how important it is that students engage in exercises within data science curriculum that makes palpable the understanding of the great impact their work can have upon those who do not live within the realm of such intersectional privilege, especially as they enter and advance in a field that is still perpetuating inequitable social dynamics (Baumer et al. Citation2022).

Camilo:

I really liked that ethics was an important part of every course in the Data Science Certificate. It was not something taught only after we had learned how to perform a chi-squared test. Instead, it was expected that we constantly analyze and discuss ethical implications of multiple aspects of data science. I enjoyed that professors encouraged students to explore topics and perform scientific analysis without expecting any specific results from these analyses. Students were not trying to have the data tell a specific narrative. We were taught to let the data speak for itself, and that a statistical result different than expected, is still a result.

I do, however, wish that the discussion about things in data science that are not statistics, such as data management and organization, had been a little different. Many times it felt as if steps in a data science project that were not statistics were not very relevant and were just a necessary - yet annoying - step to then be able to perform statistical analysis. I wish that we valued more the science behind organizing, linking, and storing data. For example, for an important project in the Certificate, I worked on complex systems for organizing and linking crime data. However, at times it felt like my project was not considered too relevant because it did not include any actual statistics. I feel that having data that is better structured and more organized could be a very good first step towards producing research that is easily reproducible and that is more ethically robust.

6 Data Science at MC Was Enjoyable

Given the variety of backgrounds and motivations among students in MC’s data science program, one reason for MC’s success is likely that students genuinely enjoy their classes.

Mary enjoyed the applicability of her classwork to her day job:

In the course of my studies, I was able to work on several projects that I really enjoyed and that taught me a lot. For a data visualization class, I did a paper on the impact of economic sanctions on Russia. The purpose was to learn how to tell the story through the use of effective plots, which was a very useful skill to develop. At the same time, it taught me how to integrate text research (the kind of research I’ve always done as a historian and political analyst) with data research and analysis. In addition, we did a web-scraping project that taught me how to access, wrangle, and analyze data from twitter. That will be tremendously helpful for my future work.

Likewise, Juan found he could do projects that validated his work and study:

A memorable experience for me from the program is when I used machine learning to validate some of the findings from my master’s thesis because it reinforced my belief that my hard work can make a difference. I also really enjoyed the capstone project where I found evidence that supports the claim that at-risk communities in Montgomery County experience structural disadvantages. I dedicate my academic and professional career to social justice so this showed me that there is still a lot of work to be done in that area. I used a lot of what I learned in all the courses I took in the program to complete this project. The results of all of my analyses are available on GitHub if anyone is interested.

Marilyn enjoyed the ability to just explore something fun:

My first Data Science project was in Data Visualization and Communication, the assignments up to that point had been about “serious” topics - crime, social inequities, etc. While this was interesting and important, I wanted to explore something lighter, so I found a dataset on Kaggle on a fun subject, Roller Coasters! Working in R and using RStudio I enhanced the dataset with additional information from the Roller Coaster Database, Wikipedia, Coasterpedia - Roller Coaster wiki and Ultimate Coaster. I also imputed some missing data in roller coaster characteristics based on statistics for the population data. Then I developed a simple algorithm to compute a “Fun Factor” for each roller coaster from its characteristics. I created some charts comparing the fun factors for roller coasters and which U.S. state has the highest number of high fun factor roller coasters (it’s California).

Jennifer enjoyed being able to use data science to really illuminate her work as a lawyer:

The project that I completed at MC that had the greatest impact upon me was the Capstone Project. In that project I examined the ethical implications of using AI algorithms for pre-trial bail determinations and evaluations of police officer misconduct complaints. I was able to analyze three different datasets available from Montgomery County Maryland’s open source data and open source data from the state of Connecticut to examine trends in bail amounts and changes in trends of pretrial bail percentages over time as AI algorithms have been introduced and actively put in play in judicial determinations (Ryberg and Roberts Citation2022). The report examined the data and included numerous interactive data visualizations framed within the context of literature on potential implicit and explicit biases that may be perpetuated by AI algorithms in this context. It examined arguments for and against the use of AI algorithms and introduced some ethical considerations for data scientists and employees within the legal system as our institutions become increasingly influenced by technological mechanisms of determination and prediction, particularly as applied to anticipated patterns of human behavior. The project was completed with Python. It was truly a culmination of skills I had learned throughout the program including web scraping, mapping, creating interactive data visualizations, and completing advanced statistical and data analysis as well as surveying extant literature on the emerging topic of the ethics of AI as applied to social and governmental operations.

Camilo enjoyed the challenge of solving complex problems:

When learning web scraping techniques, there was one part of the code that we were using that could not automatically perform a desired task. Students would have to solve this by manually inspecting objects and then manually updating values. Knowing how annoyed I get at this kind of task, Professor Saidi said in front of the class when introducing the project “I am pretty sure Camilo will find out how to solve this, but we can keep doing it manually.”

When I started working on the code, one thing was clear: I had no idea how I could solve that annoying issue. However, Google and Stack Overflow came to the rescue and after three days of a lot of red letters on my screen, I was able to automate the process. I learned that the most important skill is to define a specific set of actions that need to be achieved. The mechanics on how to achieve them - the coding part - are much more simplistic to think about once you have a clear understanding of your objectives and your plan. I had a lot of fun and it became very clear that: first learn what to code, you will figure out how later.

7 Where Are We Now?

Not surprisingly for such a diverse group, we all are using our data science education in different ways.

Mary is not yet using her skills professionally:

Having completed the certificate, I am not yet using data science in my professional life. I need to develop more skills in order to conduct effective sentiment analysis and am exploring other opportunities to acquire them. I am, however, using what I’ve learned to help organizations I work with conduct basic analysis of their data. It has also enabled me to better understand and critique analyses others present to me. I am more data literate, and this is evident in my professional interactions and performance.

Marilyn views her data science coursework in the context of her pursuit of lifelong learning:

I have advantages from my years of experience, I already know how to code programs, the R and Python development environments and language syntaxes were in familiar formats. It was easy to become proficient in using them. Taking courses remotely worked well for me, I’d been working remotely successfully for several years, and I was comfortable with the format and the tools.

I also have the advantage of having free time in retirement. I don’t have other classes I need to take for a degree or for transfer, so I can concentrate on one or two classes a semester. I don’t need to finish the program in a particular time frame, I still have one class left to complete the certificate. This low-pressure scenario facilitates the enjoyment I have in learning. My thoughts about my next learning adventures include investigating graduate programs in Data Science, working on a graduate degree with this low-pressure-enjoy-learning scenario is appealing.

Jennifer is using her data science education to further her other educational goals:

Currently, I am a student in a Counseling Psychology program on the research track and am in the process of completing a Master’s thesis. In the mental health field, data is increasingly used to gauge the effectiveness of therapeutic treatments including gathering data at drug and alcohol addiction centers to gain a greater understanding of individual addiction behavioral patterns as well as a more general understanding of addiction patterns throughout a population that has higher rates of addictive behaviors en masse than at any previous point in recent history.

In my current program, we use SPSS, and given my experience with R and Python in the certificate program I have been able to fix SPSS coding for a professor who has decades of experience with SPSS but did not have knowledge of the background language and programming syntax required to produce more elegant and tailored output. The skills learned in the Montgomery County program have enabled me to be less confined and controlled by the SPSS software interface; I have more autonomy to shape the data program to provide customized outputs based on my idiosyncratic research needs. I can also utilize multiple different software depending on the need at hand, e.g. producing interactive data visualizations versus completing extensive mathematicalanalysis.

Juan uses his data science skills at work, and would like to build a community of fellow data science students:

As a current Data Analyst, I’m very thankful I had to opportunity to complete the data science program. Moving forward, I’m hoping to continue to be part of the program’s alumni network and to remain in communication with the faculty. I’m also hoping to be able to continue to help and advise anyone that can benefit from my knowledge and experience. In sum, data science and statistics have given me direction and provided valuable opportunities to be able to give back and to move forward in my academic and professional career.

Camilo is using his skills professionally and academically:

I am studying Applied Mathematics at the University of Maryland College Park. At the same time, I have a part time position as a Data Scientist at Axle Informatics, and I have a part time internship at NIST. The skills that I gained in the Data Science Certificate go much beyond learning how to write code in R - in fact, Julia and Python have stolen the show in my last years of coding. Learning how to learn was the most important skill I gained.

I think that my curiosity for understanding how systems around me work has helped me in becoming a better programmer and data scientist every day. My professors constantly motivated me to challenge myself, and now this is something I try to do with every coding project and with my classes in my undergraduate studies. I learned in my data science classes that data scientists have made mistakes and that we have put our own biases into platforms and algorithms. This is something I think of every time when designing new software, in an effort to prevent myself from repeating the same mistakes.

8 Closing Thoughts

We are grateful to these students for being willing to share their insights and reflections on their experience in our data science program. They wrote candidly about which of their expectations were met and which we could focus on improving in the future. It is our hope that in sharing the two-year college data science student perspective in this article, individuals in academia, business, and government may gain a new awareness of the potential partnerships they could gain, as well.

A common theme that became evident from these student reflections was how they, for the most part, universally stumbled upon taking data science classes. For some, taking the biostatistics course was a gateway to discovering data science classes. For others, it was simply a curiosity about what other courses were offered by the math, statistics, and data science department at the college. Because 75% of our current data science students already possess an undergraduate or a graduate degree, learning data science and attaining the certificate has been a choice—a way to upskill, to improve their resume, to look for new job opportunities, or to explore continuing their educational path. We expect that the data science student population will change as we add a new associate of science in data science degree and as we work to create partnerships with local high schools to develop an early college data science program. At the same time, the experiences above show that data science courses can also be viewed as an adjunct that complements other majors and fields of work. Colleges and universities should consider this as they establish and develop their programs. The founders of our data science program had the foresight to create open-ended learning objectives, knowing that the program would evolve over time. There is great growth in career opportunities in data science and data science related fields, and we as a discipline must remain focused on meeting the needs and expectations of our stakeholders—our students and our community.

Students described their appreciation for inclusion of data ethics and ethical considerations in the context of data science. It is nearly impossible for anyone anywhere to go about their daily activities without encountering the impact of data use and data science. As data science, machine learning, and artificial intelligence permeate every aspect of our daily lives, data science programs must address the uses, misuses and potential harms. “Questions on the role of the law, ethics, and technology in governing AI systems are thus more relevant than ever before” (Cath Citation2018). The data science program at Montgomery College intentionally scaffolds ethical considerations within each data course in addition to requiring degree students to take a semester-long course on the introduction to ethics, devoted specifically to topics in computer science, data science, and engineering. We must educate students, who might be the creators of future algorithms that “make decisions that alter our lives in direct, and potentially detrimental, ways” (Borenstein and Howard Citation2021).

On a local, national, and global level, there is great demand and interest in the field of data science. One of the most valued talent pools will include those with data science skills, and their abilities will become more vital for organizations of all types. Meeting this demand to provide a new generation of students with data literacy and data acumen is imperative. The teaching of mathematics and statistics using traditional pedagogical approaches is no longer sufficient. Faculty and administrators must make a shift to create experiences for students to acquire quantitative literacy and more specifically, data science acumen. We hope this article is a meaningful tool for others to evaluate whether to establish their own data science program, how to recruit and retain students, what topics and tools to include, how to establish pathways to other institutions, and how to create internship and career opportunities for students once they complete the certificate or degree.

Data science faculty at MC acknowledge that informal interviews with a convenience sample of students is not a scientific method for assessing the program. To that point, in order to maintain accreditation in the Maryland Higher Education Commission (MHEC), all programs including the data science program must submit program-level assessments at specified yearly intervals. Chance and Peck recommend mapping where in the curriculum students are learning, practicing, and demonstrating mastery of the skills necessary to achieve the outcomes (Chance and Peck Citation2015). Because the program is in its infancy, to date, it has not collected program-level assessment data. But in the near future, we will be collecting data to assess whether program outcomes have been met. At that point, we will qualitatively and quantitatively evaluate whether students are attaining mastery of our program outcomes, and we will present a formal analysis of the findings.

Supplementary Materials

Information regarding the demographics of those taking data science classes is available as supplementary material.

Supplemental material

Supplemental Material

Download MS Excel (14.7 KB)

Acknowledgments

Authors are grateful to faculty and administrators for developing this data science program at Montgomery College. In particular, we would like to thank Brian Kotz, Kathryn Linehan, and John Hamman for their work to build the program beginning in 2015. We would also like to thank Ben Nicholson for providing data on student enrollment in the data classes.

Disclosure Statement

The authors report there are no competing interests to declare.

References

  • Anderson, N., and Gegg-Harrison, T. (2013), “Learning Computer Science in the ”Comfort Zone of Proximal Development,” In Proceeding of the 44th ACM Technical Symposium on Computer Science Education (SIGCSE ’13), New York, NY, USA, pp. 495–500, New York: Association for Computing Machinery. DOI: 10.1145/2445196.2445344.
  • Baumer, B. S., Garcia, R. L., Kim, A. Y., Kinnaird, K. M., and Ott, M. Q. (2022), “Integrating Data Science Ethics into an Undergraduate Major: A Case Study,” Journal of Statistics and Data Science Education, 30, 15–28. DOI: 10.1080/26939169.2022.2038041.
  • Borenstein, J., and Howard, A. (2021), “Emerging Challenges in AI and the Need for AI Ethics Education,” AI and Ethics, 1, 61–65. DOI: 10.1007/s43681-020-00002-7.
  • Cath, C. (2018), “Governing Artificial Intelligence: Ethical, Legal and Technical Opportunities and Challenges,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 376, 20180080. DOI: 10.1098/rsta.2018.0080.
  • Chance, B., and Peck, R. (2015), “From Curriculum Guidelines to Learning Outcomes: Assessment at the Program Level,” The American Statistician, 69, 409–416. http://www.jstor.org/stable/24592142 DOI: 10.1080/00031305.2015.1077730.
  • Charles A. Dana Center at The University of Texas at Austin. (2021), Data Science Course Framework, Austin, TX: Author.
  • Freiling, I., Krause, N. M., Scheufele, D. A., and Brossard, D. (2021), “Believing and Sharing Misinformation, Fact-Checks, and Accurate Information on Social Media: The Role of Anxiety during COVID-19,” New Media & Society, 25, 141–162. DOI: 10.1177/14614448211011451.
  • Gould, R., Peck, R., and Amstat.org. (2018), “The Two-Year College Data Science Summit,” available at https://www.amstat.org/education/two-year-college-datascience-summit
  • Goyanes, M., Borah, P., and Zuniga, H. G. (2021), “Social Media Filtering and Democracy: Effects of Social Media News Use and Uncivil Political Discussions on Social Media Unfriending,” Computers in Human Behavior, 120, 106759. DOI: 10.1016/j.chb.2021.106759.
  • Kubin, E., and von Sikorski, C. (2021), “The Role of (Social) Media in Political Polarization: A Systematic Review,” Annals of the International Communication Association, 45, 188–206. DOI: 10.1080/23808985.2021.1976070.
  • Lee, N. T., Resnick, P., and Barton, G. (2019), “Report: Algorithmic Bias Detection and Mitigation: Best Practices and Policies to Reduce Consumer Harms,” The Brookings Institute. Available at https://www.brookings.edu/research/algorithmic-bias-detection-and-mitigation-best-practices-and-policies-to-reduce-consumer-harms/
  • National Academies of Sciences, Engineering, and Medicine. (2018), Data Science for Undergraduates: Opportunities and Options, Washington, DC: The National Academies Press.
  • National Academies of Sciences, Engineering, and Medicine. (2020), Roundtable on Data Science Postsecondary Education: A Compilation of Meeting Highlights, Washington, DC: The National Academies Press. DOI: 10.17226/25804.
  • O’ Banion, T. (1997), “The Learning Revolution: A Guide for Community College Trustees,” (Special Issue) Trustee Quarterly, 1.
  • Parker, M. S., Burgess, A. E., and Bourne, P. E. (2021), “Ten Simple Rules for Starting (and Sustaining) an Academic Data Science Initiative,” PLoS Computational Biology, 17, e1008628. DOI: 10.1371/journal.pcbi.1008628.
  • Paxton, L. (2020), “Crunching the Numbers on Diversity in Data Science: Events & Resources to Foster Inclusion.” Medium: Stem and Culture Chronicle. https://medium.com/stem-and-culture-chronicle/crunching-the-numbers-on-diversity-in-data-science-events-resources-to-foster-inclusion-5dc81d2ab52
  • Rankin, S. M. G. (2022), “Technological Tethered: Potential Impact of Untrustworthy Artificial Intelligence in Criminal Justice Risk Assessment Instruments,” Washington & Lee Law Review, 78, 647.
  • Rieley, M. (2018), “Big Data Adds up to Opportunities in Math Careers,” Beyond the Numbers: Employment & Unemployment, 7, no. 8 (U.S. Bureau of Labor Statistics, June 2018), available at https://www.bls.gov/opub/btn/volume-7/big-data-adds-up.htm
  • Rodrigues, R. (2020), “Legal and Human Rights Issues of AI: Gaps, Challenges and Vulnerabilities,” Journal of Responsible Technology, 4,100005. DOI: 10.1016/j.jrt.2020.100005.
  • Ryberg, J., and Roberts, J. V., eds. (2022), Sentencing and Artificial Intelligence, Oxford: Oxford University Press.
  • Walsh, M. (2020), “Algorithms are Making Economic Inequality Worse,” Business and Society, Harvard Business Review. Available at https://hbr.org/2020/10/algorithms-are-making-economic-inequality-worse
  • Walters, E. (2019), Data Driven Law: Data Analytics and the New Legal Services, Boca Raton, FL: Taylor & Francis Group.