273
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Diagnosing Statistical Education Needs of Health Science Learners

, , , , , & show all

Abstract

Many types of health science learners, including clinical and translational scientists, students, researchers, and clinicians, seek to increase their knowledge of biostatistics. These learners are heterogeneous in their field, career stage and career focus. Based on the collective experience of an expert panel with over 115 years teaching statistics to health science learners, we propose a framework for considering the needs of health science learners motivated by their career goals. This framework defines four types of health science learners seeking statistical training: (a) consumers, (b) “milestone makers”, (c) biomedical researchers with statistical support, and (d) biomedical researchers without statistical support. Each type of learner has different levels at which they need to understand statistical topics for their careers, such as when to use a particular statistical method or why a given method works; these differing levels of understanding are detailed in our proposed framework. Further, this framework identifies the expectations that each of these types of learners should have for gaining statistical knowledge in a single seminar, multiple seminars, a seminar series, an accredited course, or a certificate/degree program. Advantages and disadvantages of widely used educational formats for these learners are also described. From this work, health science learners seeking biostatistical training or those who are planning a training program for others can gain insight into identifying appropriate statistical training goals for the type of learner with which they identify. Statistical educators may also use these guidelines to help health science learners align expectations for various types of training.

1 Introduction

Health science learners have many reasons for wanting to improve their statistical knowledge, ranging from needing to comprehend the literature in their field to being the principal investigator of a research study. Our collective experience as statistical educators in the health sciences indicates that a mismatch of the type of learner with the type of statistical training can lead to feeling insufficiently prepared (too little training) or feeling overwhelmed with content (too much training), both considered failures from an educational perspective. The appropriate statistical training for these health science learners depends heavily on their goals.

For purposes of this article, health science learners are defined as individuals training or trained in a health science field and possess a specific skill set for providing or advancing healthcare. Health science learners include degree-seeking graduate students in biomedical fields as well as all areas of public health; professional school students, residents, trainees, and fellows in all areas of the health sciences including medicine, nursing, pharmacy, veterinary science and dentistry; practicing health science professionals, such as physicians, physician assistants, nurses, dentists, pharmacists, veterinarians, and public health officers; and health science researchers at all levels, from junior faculty just beginning their research careers, to principal investigators and senior faculty, to research staff. These learners are heterogeneous, not only in current or intended field (e.g., medicine, dentistry, nursing, public health), but also in career stage and in career focus (e.g., clinical work, research, government, industry, nonprofit organizations). Students solely training to become quantitative methodologists (e.g., (bio)statisticians, epidemiologists, bioinformaticians, data scientists) are not considered health science learners and therefore not the focus of this work.

Several authors have discussed real challenges that health science professionals who receive clinical training have with learning traditional statistical coursework (Novack et al. Citation2006; Windish and Diener-West Citation2006; Windish, Huot, and Green Citation2007; Rao Citation2008). A recent survey of medical graduates revealed the need (among medical learners) for the ability to practice statistics as well as the understanding of various statistical topics (MacDougall, Cameron, and Maxwell Citation2020); a finding that contrasts with previous views that medical students should not focus on the ability to analyze data (Marks Citation1989; Astin, Jenkins, and Moore Citation2002). Such claims state that “conducting clinical research requires a fundamental understanding of the terminology and concepts of biostatistics” and that “few clinical scientists need to know the mathematical and computing technicalities as covered in traditional academic courses” (O’Brien et al. Citation1995). We feel that this disagreement regarding what to teach health science learners is rooted in a lack of recognition of the uniqueness of these learners, particularly in contrast with undergraduates, and the heterogeneity of their goals.

Health science learners have different learning needs than the traditional undergraduates who are often the focus of statistical education research. First, health science learners are training or working in the health sciences where they focus on developing core competencies such as medical knowledge, patient care and procedural skills, interpersonal communication, practice-based learning and improvement, professionalism, and system-based practice (NEJM Knowledge + Team Citation2016; Joshi et al. Citation2021). Statistical training must complement these other priority competencies of practice. Second, health science learners tend to be older and have already chosen their career path, and many are practicing in their field. Consequently, these learners often desire examples in teaching material to be closely relevant to their careers and have little interest in what they perceive as irrelevant or ‘juvenile’ classroom activities. Third, health science learners are more likely to have concurrent or previous research experience or are about to become involved in research (through MS or MPH thesis work, programmatic research requirements, etc.). This contact with research often gives these learners an appreciation of the value of understanding statistical concepts for their career, but also for many a desire to conduct basic statistical analyses. Fourth, health science learners tend to have very different time constraints. They are typically working or training full-time, and often have family responsibilities and other competing interests. Consequently, not all health science learners have the interest, time, or opportunity for formal credit-bearing coursework. These learners may receive their biostatistics training (required or optional) via a seminar, a seminar series, a journal club, a course, or a degree program. Depending on the learners’ goals, some of these routes may be more effective than others.

Because of these differences between health science learners and traditional undergraduates, the range of topics that need to be covered for health science learners differs from that in an undergraduate introductory statistics class. Health science learners do not intend to become statisticians or data scientists, but they need to understand statistical results in the research literature in their field, conduct limited basic statistical analyses, and/or collaborate with biostatisticians or other quantitative methodologists (e.g., epidemiologists, bioinformaticians). Health science learners need exposure at some level to topics which, in our collective experience as statistical educators in the health sciences, are rarely covered in undergraduate introductory statistics classes. In alignment with the ASA GAISE College Report Revision Committee (Citation2016) College Report recommendations, we feel that topics such as study designs and adjustment for confounding should be included. However, exposure to additional topics such as relative risks and odds ratios, sensitivity and specificity, multiple testing, survival analysis, and sample size and power is also needed. Previous work on health science learners led to the development of 24 statistical competencies for the health sciences (Enders Citation2011; Oster et al. Citation2015; Enders et al. Citation2017; Oster and Enders Citation2018). These competencies defined fundamental and specialized statistical topics that should be taught to health science learners who intend to become principal investigators, co-investigators, and informed readers of the literature (Oster et al. Citation2015) and include assessing sources of bias and variation in published studies, identifying the strengths and limitations of study designs, and evaluating the size of an effect with a measure of precision. A complete list of the 24 competencies and details of how they were assessed can be found in of Enders et al. (Citation2017).

It was noted that these competencies should not be taught using a “one size fits all” approach (Oster et al. Citation2015, Citation2020), and that institutions should account for the multifaceted needs of different types of health science learners. There is not one teaching approach that will work for all health science learners, because their backgrounds, needs, and goals vary so widely. Many health science learners will have had at least some exposure to statistics in high school, college, or earlier in their professional training, but for many of them, the exposure was so long ago that they may not remember much of it, and as such, may need to relearn the fundamentals of statistics before spending time learning about specialized topics related to their focus area (Ambrosius and Manatunga Citation2002). Furthermore, some will have had no mathematics courses since “college algebra” while others may have had several semesters of calculus or even beyond. This diversity of math and statistics backgrounds can make it challenging to teach this population effectively (Novack et al. Citation2006; Windish and Diener-West Citation2006; Windish, Huot, and Green Citation2007; Rao Citation2008). Additional heterogeneity arises because the types of learners and the types of training experiences or courses vary greatly by institution, particularly between academic institutions with large research programs, clinical training (e.g., medical school, dental school) and clinical practice (e.g., affiliated hospitals) components compared to large health care systems with a substantial training and research focus (e.g., Cleveland Clinic, Mayo Clinic).

Our current work extends this earlier work. Efforts to educate health science learners on statistical concepts vary widely from institution to institution. The authors of this manuscript are all involved in such statistical education efforts and represent seven unique and diverse institutions. We began work on this article by each writing a description of the types of statistical learners we encounter and how statistical education is offered at our home institutions. The first author identified common themes and ideas in the seven institution descriptions and condensed them into initial versions of the tables. All details of the tables were discussed iteratively amongst the authors and revisions made until consensus was achieved.

First, we propose a new framework for classifying types of health science statistics learners based on their intended career goals and describe what statistical competencies are ideally needed for each learner type. This is a reflection of the types of health science learners we encounter in actual practice, but this is not an endorsement of all four types. Second, we describe various biostatistics training formats we have encountered at our institutions, including formal and less formal approaches, along with sample topics typically covered, the time commitment required, and the advantages and disadvantages of each. We aim to provide a realistic idea of how much a learner can reasonably expect to be able to do after a specific type and amount (time) of training. Finally, to set realistic expectations, we discuss which formats and types of training experiences or courses are likely to be most effective for each type of learner, and we make recommendations regarding which kinds of biostatistics training and educational offerings are most appropriate for each type of learner. Our aim is to raise awareness among health science learners, their program directors and administrators, mentors, and biostatistics educators and collaborators, of the need to appropriately match the trainings or courses offered to their learner’s career goals.

This article is intended for those planning statistical education opportunities for themselves or their program, or who are teaching or mentoring these learners, and aids in setting realistic expectations for what is possible with the time allotment available. This article focuses on formats for teaching and learning topics considered to be foundational to statistics, but these topics may be taught to health science learners by instructors in a variety of departments (e.g., Statistics, Biostatistics, Data Science). Additionally, this article serves as a shareable resource for biostatisticians receiving requests for training or providing counsel to those seeking statistical training to prompt discussion of learning objectives and avoidance of overpromising on deliverables.

2 Types of Health Science Learners

Educational offerings are often discussed as if all learners are the same. Failure to differentiate the type of learner often results in a mismatch of the learning objectives and course content, resulting in a suboptimal learning experience for both the instructor and learners (Oster et al. Citation2015; Oster and Enders Citation2018). Among health science learners, we have identified four primary types, distinguished by their goal for undertaking statistical training. The types of learners we describe are characterizations along a continuum of health science learner needs and wants. We are assuming that these health science learners have little or no background in statistics, the most common scenario. We define “consumers” as health science learners whose goal is to improve their statistical literacy to gain an understanding of the biomedical literature. The health sciences literature is the primary channel for disseminating topic-specific information, and the ability to read and follow this literature is critical for keeping current the knowledge of those involved in patient care or public health. We refer to “milestone makers” as the health science learners whose goal is to complete a research training milestone, which often entails completion of a required single research project culminating in a poster or presentation at a local research showcase. Frequently, training programs expect trainees to independently collect and/or analyze data to complete this research project. Such projects are usually small in scope due to limited timeframes (often 3–6 months) and limited available research resources. These trainees may or may not be interested in research, only a subset receive any sort of research training, statistical support is limited, and for many this will be the only research project they carry out. Being a “milestone maker” is difficult as they are often ill-equipped for the activities required of them with little to no option for statistical support. Finally, we refer to “biomedical researchers” as health science learners whose goal is to conduct research, either as an expectation of their chosen position or as a side interest. We further classify the latter learners according to the status of statistical support. “Biomedical researchers with statistical support” are typically researchers embedded within an institution that has a consulting center which provides biostatistical support for faculty of all ranks. Alternatively, these researchers are later career researchers with established research lines and grant support who are responsible for overseeing the research project, but who can fund methodological experts, such as biostatisticians, who take responsibility for statistical aspects of the research. In contrast, “Biomedical researchers without statistical support” are often employed by an institution without a consulting center. They are often early career researchers who are responsible for almost all aspects of the research project. Again, this type of learner is at a disadvantage because methodological experts typically require funding and obtaining funding often requires an established research line (or promise thereof). Thus, learners of this unenviable type find themselves in this situation often not by choice but rather by circumstance. Both types of biomedical researchers aim to contribute to the literature, and publication is often a benchmark metric of success.

identifies the statistical competencies recommended for each type of learner. Of importance is that these are minimal statistical competencies we feel are desirable for that type of learner, and not necessarily statistical competencies possessed by the learner. A competency may be a fully desired skill (indicated with a double checkmark) or a partially desired skill (indicated with a single checkmark), or a competency may not apply to a specific learner based on their career goal (indicated with a “—”). Note that it is not necessarily better to obtain more competency skills; instead, the number of necessary competency skills depends on the learner’s goals. Below we describe each of the six competencies and our rationale for learner specific recommendations.

Table 1 Types of health science learners and the six statistical learner competencies.

2.1 The WHAT Competency

For many, the statistical analysis paragraph of the methods section of a health sciences publication is complex and not easily understood. This is problematic because having a basic understanding of the concepts behind the statistical methodology used in a study is directly related to understanding the results of the study. The WHAT competency addresses the ability to recognize and properly interpret the results of statistical analyses. This competency encompasses many aspects: recognizing the statistical methods used by others, understanding the results obtained from the application of such statistical methods, and being able to translate these results into clinical knowledge. Only in understanding how to properly interpret research results can one then assess and translate them to impact their clinical practice or policy decisions. This is a key skill toward achieving statistical literacy and is required of all types of learners, whether they intend to assess or present someone else’s analyses (consumer or biomedical researcher with statistical support) or conduct their own analyses (“milestone maker” or biomedical researcher without statistical support).

2.2 The WHEN Competency

The sheer number of available statistical measures, tests and models can be overwhelming. This complicates the decision of when to use a particular method, which we define as the WHEN competency. Although some statistical methodologies can easily be eliminated as inappropriate, it becomes more complicated when there are multiple appropriate statistical methodologies for a given scenario. Ultimately, the decision on the specific statistical method to use to answer a research question is based on several factors related to study design including: the type of outcome variable (continuous, binary, multinomial, time-to-event, etc.), the type of primary independent variable, the sample size, and the structure of the data such as whether observations are independent (as in a cross-sectional study) or correlated due to a repeated measures or longitudinal design. Understanding both the goal of the research study and the aforementioned factors related to study design allows one to identify the data scenario and select appropriate statistical methodology options. For example, if a learner is interested in comparing side effect rates among subjects choosing to receive therapy A and those choosing to receive standard of care, they will dissect this scenario as an association study comparing two groups with respect to a binary outcome. Therefore, a Chi-square test, Fisher’s exact test or logistic regression may all be suitable, depending on whether or not covariate information is available. The ability to dissect research scenarios is a skill that requires stripping away the context of the problem and focusing on the goals of the research, the study design, and the data structure.

If a learner is conducting their own statistical analysis, other important aspects of the WHEN competency are exploratory data analysis and assumption checking. Making basic plots of the data can identify outliers or extreme situations that must first be investigated. Checking statistical assumptions is essential to assessing validity of the methods and thus the results. Flowcharts exist to help learners identify when particular statistical methodologies are appropriate, but it is also necessary to incorporate understanding of the research goals and verification of assumptions for each statistical test in order to succeed in the WHEN competency. This necessitates experiential integration of education, training, and a lot of hands-on practice, yet it is critical for selecting appropriate techniques for those who plan to conduct their own analysis (“milestone maker” or biomedical researcher without statistical support) or assess the work of others (consumer).

2.3 The WHERE Competency

After clarifying one’s research question, a researcher should always consider whether the required data already exists or whether it needs to be collected. Understanding where to obtain appropriate data is the WHERE competency. Data are time-consuming and expensive to collect, and with the growing number of national databases and registries, a dataset may already exist that could be used to answer the research question of interest. However, when someone else has collected the data, it is imperative to learn where the data originated, how it was collected, and how it was formatted (variable definitions, variable codes, survey weights, etc.), all which impact analysis, interpretation, and assessment of bias. Alternatively, the data needed may exist, but is not yet in an extracted format and may require an electronic medical record review (manual or automated). Those with limited time frames to complete their research (“milestone maker”) typically use available databases or conduct automated electronic medical record reviews that limit their involvement in the data collection process. In this setting, it is especially easy to fail to appreciate exactly what the data represents due to missing or extraneous documentation. If, however, an appropriate dataset does not currently exist, a study may be planned for data collection. The tool used for data collection has implications for data organization and storage. Tools such as REDCap (Harris et al. Citation2009, Citation2019) or Qualtrics (Provo, UT) may require substantial time and effort upfront, but downstream save time by improving the data quality. From our experiences, investigators who are responsible for (or participate in) the data collection process often more fully appreciate what the data represent in terms of the study goals or outcomes and its structure. Investigators responsible for data collection should also obtain additional training concerning human subject research ethics (The CITI Program Citation2021). For example, Enders et al. (Citation2017) proposed a competency guideline for medical learners to evaluate the impact of statistics on ethical research, and the impact of ethics on statistical practice.

If an individual simply desires to consume study findings, then we argue that it may not be necessary to meet this competency simply to know what a number in a table represents (e.g., higher risk for patients receiving treatment B compared to treatment A). A consumer can still glean valuable biomedical information without fully understanding the data or its collection process, albeit at a very superficial level and with great trust in the peer-review process. While some consumers may find it useful to understand where the data came from (WHERE competency), we feel it is not necessary for all consumers and thus have decided not to add this as a partially desired skill for consumers.

Anyone planning to conduct their own data analysis (“milestone maker” and biomedical researcher without statistical support) must know exactly how their data originated. It is essential to understand the date range in which the data was collected, the inclusion/exclusion criteria, and the data structure (e.g., independent observations, repeated measures, matched pairs, or crossover trial), all which impact analysis, interpretation, and assessment of bias. It is also important to understand why data may be missing, particularly for observational data. In addition, anyone presenting results of their research (biomedical researcher with or without statistical support), even if they did not analyze the data themselves, should be sufficiently familiar with how the data were collected to adequately explain the research and address audience questions.

2.4 The WHY Competency

Understanding the principles that underlie why common statistical methodologies work is defined as the WHY competency. This competency is important for those conducting their own analyses and desiring a sustained career in health science research. We intentionally exclude milestone makers, although they conduct their own analyses, as their work is typically a one-time investigation and learning statistical intuition is not a high priority for their intended career. Acknowledging that these health science researchers are not statisticians, we define the WHY competency to mean having an intuition for how the methods work and not necessarily a complete mathematical understanding of the theoretical underpinnings. Understanding how basic statistical methods work, not focusing on details but rather the big picture, will allow the researcher to more readily identify when issues arise. If statistical methods are used without even a general understanding of how the methods work, confusion and misunderstanding often ensue and the resulting conclusions may be entirely inappropriate. In this scenario, it becomes very difficult to recognize methodological problems. We give several examples. Without a basic understanding of sampling distributions, it is difficult to grasp what a standard error represents and the impact of sample size. Without understanding and checking model assumptions, it is impossible to know the extent to which the proposed model is or is not appropriate. Inadequate or incorrect adjustment for confounding or effect modification, or for correlated observations, may lead to biased estimates and incorrect conclusions. This issue is amplified with the availability of open-source software where one can download and use a package without ever seeing or understanding the code and for which documentation may be limited or too technical. Statistical methods often work well in a large number of cases, which can fool people into thinking that they work in all cases. Without the WHY competency, one is unable to identify when statistical analyses have not worked as intended.

While biostatisticians often believe that you cannot interpret results if you do not know the theory, in practice many health science learners are being taught without theory and focus on applying the methods. We do agree that it would be better if consumers, milestone makers and biomedical researchers with statistical support all achieved this competency, however, this table represents the minimum threshold. In our experience, many health science learners do not know the WHY and still function at the minimum threshold.

2.5 The HOW Competency

An important distinction must be made between those who need to interpret basic statistical analyses and those who need to perform basic statistical analyses or supervise those who are doing so. The ability to conduct basic statistical analyses is defined as the HOW competency. Therefore, this actionable skill set is required for the “milestone maker” and the biomedical researcher without statistical support. Ideally, a study involves the collaboration between content experts (physicians, nurses, dentists, pharmacists, bench researchers, etc.) and methodological experts (biostatisticians, epidemiologists, data scientists, etc.). However, we recognize that the number of content experts far exceeds the number of methodological experts. This unfavorable balance often requires that some content experts learn to conduct basic statistical analyses. With all the available open-source statistical software, the steps to conduct routine statistical analyses are not particularly difficult. It is rather that obtaining this competency requires time.

2.6 The WHO Competency

Often it is not until someone is exposed to some training on a subject that they truly begin to appreciate how much they know, but equally important, they discover how much they do not know. A common learning objective of statistical education sessions is to gain the ability to recognize when professional biostatistical help is needed and who can help (Enders et al. Citation2017). Learners who need additional biostatistical help, especially for challenging statistical topics that do not receive in-depth coverage in the classroom, could seek consultation training with a biostatistician on their research team (Deutsch et al. Citation2007; Welty et al. Citation2013). This is the WHO competency: knowing what you do and do not know and who can help with the latter. Real-world research questions and data collection processes can quickly and easily move a project from simple to complex in terms of appropriate analytic methodologies. The untrained researcher will often not notice this shift. For example, when subjects contribute multiple observations to a dataset, the resulting dataset has a dependent data structure that requires more advanced analytic methodologies, for which the assistance of a trained biostatistician is recommended. When possible, it is important that learners are provided with contact information for specific individuals or groups available to give statistical support. Frustrations arise when researchers recognize that they need professional statistical support but do not know where to access such help. The WHO competency is critical for biomedical researchers. Those with statistical support already know who can help since they have built a collaborating relationship with a biostatistician, whereas those without statistical support must be able to recognize when a project is beyond their skill set. “Milestone makers” tend to focus on what they know, and thus, achieve partial aptitude in this competency. However, they tend to not recognize when professional statistical help is needed. This is often dangerous and can result in inaccurate or inappropriate presentation and interpretation of research results.

2.7 The Value of Collaboration with a Biostatistician

These six competencies are focused on the health science learner and address minimum skills that are appropriate for someone assessing, conducting, or presenting the results of basic statistical analyses. So why do we recommend collaborating with a biostatistician when possible, instead of performing one’s own data analyses? What does a trained biostatistician provide beyond these skills? In designing a study, a biostatistician can suggest an experimental design to avoid biases of which the researcher may be unaware. With some background knowledge in the subject area, a biostatistician can help the health science learner frame the research question in terms of the data that has been or will be collected. The methodologist will also be attuned to potential problems with rote application of standard methods to the data collected, will be able to recognize when such problems occur, and can suggest alternatives. Collaborating with a professionally trained biostatistician can allow the utilization of more advanced statistical methods, the development of new methods, and keeping up to date with the most current methodologies. Another merit of collaboration with a biostatistician is that it provides a competitive advantage for grant proposals. The latter is one reason why leading institutions have created consulting centers and provision of biostatistical support for junior faculty. A methods expert possesses extensive experience and thus can accelerate a project’s timeline. The analysis will be done in a reproducible manner that ensures the same results if the analysis is rerun. In terms of the necessary competencies, it is much easier to collaborate with a biostatistician than to independently obtain all six competencies. However, familiarity with the terminology and possession of a basic working skill set will enhance any collaborative experience with a biostatistician.

2.8 Identifying Learner Goals

For many learners, the goal is to understand and interpret someone else’s statistical analyses. This level of understanding is often referred to as statistical literacy. Statistical literacy is essential for science students as a tool in their professional lives as well as an essential competency for their citizenship in the contemporary world (Gordon and Nicholas Citation2010). Statistical literacy is of growing importance for all those working in the health sciences where the professional literature aims to keep such community members informed of the most up-to-date guidelines, recommendations, treatments, and discoveries. The ability to read and critically assess such articles is important for delivering the best possible care to patients and communities. Understanding the literature requires the ability to know what statistical results mean and to critically appraise to some degree whether statistical methodologies were appropriate for the application. Here the focus is on the WHAT and WHEN competencies and aligns with the needs of a consumer.

Statistical literacy also allows a researcher working with a biostatistician to comprehend and interpret statistical reports. This is a critical skill as the researcher is often the one disseminating the research results at conferences or in manuscripts. Serving as the face of the research, this individual must be able to describe the research in sufficient detail. This requires the ability to describe the data collection process and interpret the statistical results. These individuals can rely on their statistical collaborators to select appropriate methodologies. Thus, the focus is on the WHAT and WHERE competencies and aligns with the needs of a biomedical researcher with statistical support.

For others, the goal is to conduct their own basic statistical analysis. As this goal is performance based, it requires a more detailed understanding of the statistical methodologies. The health science researcher must be able to recognize whether their skill set is sufficient (or whether an expert is required), how to interpret statistical results, and whether it is appropriate to use these statistical methods, which includes understanding and checking assumptions. It is also important to understand the data collection process and how to use statistical software to perform the desired analysis. Here the focus is on the WHAT, WHEN, WHERE, HOW, and WHO competencies and aligns with the needs of a “milestone maker.” Extending the required skills to understanding how to perform many types of analyses and the intuition behind why basic statistical methods work adds the WHY competency, aligning with the needs of a biomedical researcher without statistical support.

3 Models for Statistical Education of Health Science Learners

Institutions vary in their approach to educate health science learners on statistical concepts. Themes and key points from a comparison of the educational models across our seven institutions are presented in summary below and in .

Table 2 Common types of statistical training for health science learners.

Most notable was the difference in educational models used by authors employed by an academic institution versus a health care system. Statistical educators employed by an academic institution primarily described educating students enrolled in degree programs. Health science learners enrolled in formal degree programs (e.g., medical school, nursing school, dentistry school) are usually not yet practicing in their field of training but are working toward achieving the necessary credentials. Statistical educators employed by a health care system primarily described educational offerings provided as continuing education. Health science learners receiving the more informal continuing education (e.g., residents, fellows, doctors, nurses, dentists) already possess the required credentials and are currently working, typically full-time, in their field. Identified as a bridge between these two somewhat distinct types of learners is a Clinical and Translational Science Award (Citation2022) program, where working healthcare professionals receive protected time from their job responsibilities to receive training in research methodology. In this setting, more formal statistical education courses, often taught by statistical educators from academic institutions, are offered to working professionals at affiliated hospitals.

The time commitments shown in are estimates based on the authors’ experiences teaching many types of courses across many types of institutions. Based on the time commitment, we have identified five commonly encountered types of training opportunities (). A seminar (single or multiple) is the most common form of statistical continuing education for health science learners. Seminars have limited contact hours, and minimal (if any) out-of-session work (preparatory reading/homework). Assessment is not feasible. After all, participants in these seminars are very busy working professionals, and the training opportunities are in addition to their full-time job. Attendance is typically required, but with little oversight or consequence, it is difficult to enforce.

A seminar series meets regularly and thus has increased contact hours. Typical formats for health science learners include lecture series and journal clubs. Seminar series may be required by the program director or may be optional training opportunities for interested learners. This type of training allows for some limited out-of-session work, but completion and participation remains poor due to competing responsibilities and little personal investment. As with seminars, there is almost no assessment or evaluation of understanding. Seminars and seminar series can range from topic-specific to broad statistical thinking, and the level of detail that can be covered is directly related to the amount of exposure. These busy professionals are often less engaged than students would be in a formal course as there is no grade at stake and no tuition invested, thus, they have little accountability. Motivators that often improve participant engagement include offering food, continuing education credit or certificates, personal involvement in a research activity, or an approaching credential exam.

In contrast an accredited statistics course is often part of the formal curriculum of degree programs, with academic credit earned for completion of the course. Accredited courses have a required number of contact hours, an expectation of substantial out-of-classroom homework, and assessment, all of which are critical components. The course focus can range from statistical literacy to statistical application, where the level of detail is again directly related to the amount of exposure and level of the learners. Attendance is typically required, tracked by the instructor, and incorporated into the earned grade.

Lastly, in a certificate or degree option, trainees accomplish a complete academic program often requiring between 4 and 15 research or quantitative skills-oriented courses. Common examples include a Masters in Public Health or a Masters in Clinical Research and less frequently a Masters in Biostatistics or Bioinformatics. This option shares the same benefits as an accredited course, but with expanded coursework trainees gain exposure to a wider variety of methodologies and topics and more opportunities to practice and master necessary skills.

3.1 Matching Training Opportunities to Learner Goals

Frequently, requests are made for education on the topic of “statistics”. This, however, is extremely broad and akin to a request for education on the topic of “medicine.” Requests should include information on the goals of the attendees so that formats and topics can be properly aligned. provides some example topics (not intended as an exhaustive list) appropriate for each type of training.

For learners with very little statistical knowledge and available time limited to a single seminar, the appropriate content of the single seminar is a general introduction to statistics such as discussing populations and samples, variation, estimation, and hypothesis testing frameworks in general terms. Alternatively, an advanced specialized topic may fit this format if learners already have had substantial exposure to basic statistical concepts. The multiple seminar format allows one to expand beyond general frameworks and learn about estimating specific statistics or specific hypothesis tests comparing groups, for example. In general, seminars (single or multiple) are insufficient to learn enough to conduct independent statistical analyses. Unfortunately, this amount of training is common, and the expectations placed on the instructor and learners are unrealistic. A seminar series provides enough contact hours to explore additional important concepts such as nonparametric methods, sample size and power, or regression modeling. However, without time for application and assessment, even this modality is challenging for learners to translate into practice. We emphasize that the skills that people need to learn cannot be easily learned in a short period of time. An accredited course can include all topics mentioned but also has the added benefit of providing opportunities to assess learner understanding and receive feedback from course directors. With learners invested in their expanding skill set, this is a better modality to achieve the ability to conduct independent basic statistical analyses. Finally, a certificate/minor or degree is the ideal mechanism for those looking to learn how to conduct independent statistical analyses, particularly with more advanced methodologies. However, being that these are health science learners, their focus is not on becoming a methods expert, making such a high commitment approach less commonly pursued.

For learners with the goal of statistical literacy, the aim is to gain a better understanding of the literature and ability to interpret other’s statistical analyses. The probability of achieving this aim is directly proportional to the amount of exposure. Thus, it is quite doubtful that this can be achieved with a single seminar, possible with multiple seminars, probable with a seminar series, likely with a formal course (depending on the learning outcomes and type of educational model used) and definite with a degree program. Another reason that learners desire statistical literacy is in preparation for a professional accreditation exam. The probability of improving one’s performance on an accreditation exam is also directly proportional to the amount of exposure. It is unrealistic to expect an exam score improvement after a single seminar; thus, multiple seminars, a seminar series, or a formal course is recommended. A certificate or degree program is not recommended if the goal is accreditation exam preparation. This latter modality is ideal preparation for a career in team science.

For learners with the goal of conducting their own basic statistical analyses, the expectations shift from general understanding to doing and statistical software must be introduced. It is unrealistic to expect to learn how to analyze data after a single seminar. Learning how to use statistical applets, which perform a specific task that runs within the scope of a dedicated widget engine or a larger program, is a realistic expectation following multiple seminars. Applets typically present a user-friendly interface with no visible code. Applets are typically intuitive and easy to navigate but limited to one specific topic or function. Multiple seminars may be adequate to learn how to conduct independent group comparisons (e.g., t-tests, Chi-square tests). If training exposure increases to the level of a seminar series, it is a reasonable expectation that the learner can be taught to conduct some limited independent statistical analyses with a cleaned dataset using a menu-driven statistical software. Common point-and-click menu-driven statistical software include JMP, SPSS, and R Commander. These software packages offer more functionality than an applet, but again shield the user from the underlying code. Menu-driven software packages are typically easy to navigate but limited to topics or functions included within the menus. If training exposure increases to the level of a course, it is a reasonable expectation that the learner is introduced to a code-driven statistical software. Common code-driven software includes SAS and R. These software packages offer more graphical and analytical options and allow the users to customize and create code within the program to suit their needs. A trainee will become an adept user of such code-driven software if enrolled in a certificate or degree program.

There is a substantial learning curve associated with the programming languages underlying these types of software. The ability to create and run code-driven software allows for a reproducible workflow, which ensures results will be identical when the same code is applied to the same data a second time. A reproducible workflow including code sharing is critical for ensuring that the results can be trusted (Peng and Hicks Citation2021), and should be an essential component of statistics and data science education (Horton et al. Citation2022). Use of a computational notebook system such as R Markdown, knitr, Quarto, Jupyter Book or Observable allows code display, results, and explanation to be incorporated into a single document (Gandrud Citation2015; Perkel Citation2022). Although menu-driven software should give identical results when an analysis is replicated, without a record of what options were selected and in what order, the possibility exists that the set of actions were not taken in the intended sequence. Importantly, the results may be incorrect due to an incorrect sequence of steps, but it is impossible to discern this from the results alone. Concerns about lack of reproducibility and replicability in science has reached Congress, prompting their directive to the National Academies of Sciences, Engineering, and Medicine to assess these concerns; their recommendations were recently published (National Academies of Sciences, Engineering, and Medicine Citation2019).

Thus far, we have identified the goals for health science learners and matched an appropriate educational delivery methodology using . However, the contents are bidirectional, and if the organizer or learner knows their time allotment, can be used to identify corresponding educational delivery methods and realistic learning objectives. Similarly, if you know that you have a particular amount of time to devote to statistical training, then you can identify the type of learner that you are. is helpful in planning and prompting discussion to properly align expectations around statistical education.

All five types of training can be offered (with varying degrees of success) via an interactive instructor-led format, whether in-person, remote, or online, or via a static self-paced resource format (e.g., tutorial, video, slideshare, or open-access course). However, there are distinct advantages and disadvantages to each option (). Static self-paced educational resources are growing in number, but they need to be vetted. Websites are beginning to appear that offer educational resources and opportunities for statistical training; however, these sites too must be managed and updated. Mills and Raju (Citation2011) recommend that actionable skills are best taught in the classroom, while more knowledge focused concepts can be self-taught during pre-class preparation (reading/videos). Based on the collective experience of the authors, we concur that self-paced resources may be adequate for learning statistical concepts, but they are much less suited for learning actionable skills. The learners have no contact person for questions, and if their data does not match the scenario presented in the resource, or if they run into a software issue (which is probable), there is frustration and little hope for resolution. Interactive instructor-led options have the innate advantage of making a human connection and having an identified contact person for questions and assistance. However, convenience is a big allure to static self-paced resources.

Table 3 Advantages and disadvantages of educational formats.

3.2 Recommendations

This work includes a classification of health science learners of statistics into four learner types, depending on their goals. We have further identified the types of skills that are minimally necessary or desired for each learner type (). Based on statistical goals and learner types, in we conveyed the type of training that can be accomplished according to the learner’s time commitment. An understanding of the learner goals and the corresponding desired skills, as well as the time commitment necessary to obtain these skills, is needed to make informed decisions about learner expectations and planning of statistical training opportunities.

It is essential to understand that the goals and objectives of statistical training for health science learners vary according to learner type. Although the ability to properly interpret research results (the WHAT competency) is important for all learners, the necessary statistical skills depend on whether the learner will be conducting their own research, and whether this will be done in conjunction with statistical support.

The information provided in this article may be useful to several audiences: health science learners, program directors who may arrange for statistical training at their institutions, mentors who may be in a position to make recommendations about educational opportunities for these learners, and statisticians who teach seminars or courses for health science learners.

Recommendations for health science learners:

  1. Based on learner goals and whether research will be carried out with or without statistical support, we recommend consulting and the ensuing explanations of the competencies to identify your learner type and its associated skills. Learner type can also be identified based on time available for statistical learning ().

  2. Based on learner goals and learner type, we recommend using to identify the type of skills that can be gained from the time available for statistical learning, and to set expectations accordingly. Recognize that only limited statistical information and skills can be imparted in one or a few seminars.

  3. If there is a mismatch between your goals and the time available for statistical education, we recommend aligning these by either allocating more time toward obtaining statistical skills or changing your goals and therefore your learner type. As you obtain the needed statistical training according to your learner type and goals, evaluate the composition of your research team. Consider if your research team would benefit from including a biostatistician.

  4. For those making use of applets and point-and-click menu driven software, be mindful of their limitations. These limitations may include a small set of available functions (applets), difficulty in checking underlying assumptions (applets), and results that cannot be externally validated due to lack of a program trail (applets and point-and-click software).

  5. For learners carrying out their own analyses, be aware that the validity of statistical methods generally rests on model assumptions (the WHEN competency). Also be aware of the importance of data visualization, and that making basic plots of the data can be an important first step to assessing model assumptions.

  6. If conducting secondary data analysis on existing data, learn as much as possible about the design and data collection process, as well as the meaning and coding of all variables (the WHERE competency).

Recommendations for program directors and administrators:

  1. Identify the program needs and ascertain whether a formal or informal training approach is needed, as well as the time required for such training.

  2. Directors will benefit from identifying the statistical goals of the health science learners who might engage in statistical training, and the accompanying competencies required to meet these goals (). This will also inform the suggested duration, format, and content that can be covered ().

  3. Be realistic about expectations for statistical training. It is important to recognize that only limited statistical information and skills can be imparted to health science learners in one or a few seminars.

  4. If the goal for those you oversee is statistical literacy, consider providing statistical support to assist trainees who are completing programmatic research requirements to avoid the challenges faced by “milestone makers” or supporting development/utilization of an institutional consulting center to avoid the challenges faced by Biomedical Researchers without statistical support.

Recommendations for mentors:

  1. For mentors who are providing guidance about statistical training to health science learners, make use of to help the learner identify their learner type and needed skills to meet their goals. In addition, may be useful for seeking appropriate local statistical training opportunities that match the learner’s goals and expectations.

  2. Be realistic and set expectations accordingly about what can be achieved with the available time commitment. Focus should be given to ensuring the mentee can identify when it is necessary to work with a statistician rather than attempt analyses on their own (the WHO competency).

Recommendations for biostatisticians, including educators and collaborators:

  1. For biostatisticians who are teaching a seminar or seminar series, when outlining the objectives of the seminar(s) we recommend referring to , if appropriate, to clarify the types of statistical skills that can and cannot be achieved in the time available, and when appropriate, pointing to further local opportunities for statistical training.

  2. For biostatisticians who are teaching a seminar series we recommend conveying the types of situations in which a trained biostatistician should be consulted. We also recommend explaining how an interested health science learner can find individuals who may be available to provide statistical support, including specific names and contact information.

In future work, it would be interesting to assess the usefulness of the types of health science learners, the six statistical learner competencies () and the common types of statistical training for health science learners () to educators, directors, administrators, mentors, and students. This process would allow for validation of our proposed characterizations.

4 Conclusion

We have proposed a framework for aligning the statistical training needs of health science learners with their career goals and their available time. This work may be used by a variety of readers to help inform the appropriate statistical training for health science learners. A first step for both learners and those in the position to suggest statistical training is to identify the type of learner. This may be done based on goals of the statistical training () or based on time availability (). Each of the four learner types will have different needs for statistical training, as described by the WHAT, WHEN, WHERE, WHY, HOW, and WHO competencies. We have also outlined the type and depth of statistical skills that can be taught based on duration of training. By understanding the alignment of the learner goals, desired statistical skills, and what can be taught with a given time commitment, our hope is that this work will enable health science learners to better match their training needs to their goals, thereby reducing frustration and enabling informed decisions about appropriate statistical training.

Acknowledgments

The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Disclosure Statement

The authors have no conflicts of interest to disclose.

Additional information

Funding

This work was supported in part by the following grants from the National Institutes of Health: UL1 TR002548 (A.S.N.), UL1 TR002494 (A.M.B.), UL1 TR003096 (R.A.O.), UL1 TR001998 (E.S.), UL1 TR002001 (S.W.T.).

References

  • Ambrosius, W. T., and Manatunga, A. K. (2002), “Intensive Short Courses in Biostatistics for Fellows and Physicians,” Statistics in Medicine, 21, 2739–2756. DOI: 10.1002/sim.1212.
  • Astin, J., Jenkins, T., and Moore, L. (2002), “Medical Students’ Perspective on the Teaching of Medical Statistics in the Undergraduate Medical Curriculum,” Statistics in Medicine, 21, 1003–1006; discussion 1007. DOI: 10.1002/sim.1132.
  • Clinical and Translational Science Award Program (CTSA) Center for Leading Innovation & Collaboration (CLIC). (2022), “CTSA Program Directory” [Internet] [cited Jan 29, 2022], available at https://clic-ctsa.org/ctsa-program-hub-directory.
  • Deutsch, R., Hurwitz, S., Janosky J., and Oster, R. (2007), “The Role of Education in Biostatistical Consulting,” Statistics in Medicine, 26, 709–720. DOI: 10.1002/sim.2571.
  • Enders, F.T. (2011), “Evaluating Mastery of Biostatistics for Medical Researchers: Need for a New Assessment Tool,” Clinical and Translational Science, 4, 448–454. DOI: 10.1111/j.1752-8062.2011.00323.x.
  • Enders, F. T., Lindsell, C. J., Welty, L. J., Benn, E. K. T., Perkins, S. M., Mayo, M. S., Rahbar, M. H., Kidwell, K. M., Thurston, S. W., Spratt, H., Grambow, S. C., Larson, J., Carter, R. E., Pollock, B. H., and Oster, R. A. (2017), “Statistical Competencies for Medical Research Learners: What is Fundamental?” Journal of Clinical and Translational Science, 1, 146–152. DOI: 10.1017/cts.2016.31.
  • GAISE College Report ASA Revision Committee. (2016), “Guidelines for Assessment and Instruction in Statistics Education College Report 2016,” [cited Jan 29, 2022], available at http://www.amstat.org/education/gaise.
  • Gandrud, C. (2015), Reproducible Research with R and RStudio (2nd ed.), p. 5, Boca Raton, FL: CRC Press.
  • Gordon, S., and Nicholas, J. (2010), “Teaching with Examples and Statistical Literacy: Views from Teachers in Statistics Service Courses,” International Journal of Innovation in Science and Mathematics Education, 18, 14–25.
  • Harris, P. A., Taylor, R., Minor, B. L., Elliott, V., Fernandez, M., O’Neal, L., McLeod, L., Delacqua, F., Kirby, J., and Duda, S. N. (2019), “REDCap Consortium, the REDCap Consortium: Building an International Community of Software Partners,” Journal of Biomedical Informatics, 95, 103208. DOI: 10.1016/j.jbi.2019.103208.
  • Harris, P. A., Taylor, R., Thielke, R., Payne, J., Gonzalez, N., and Conde, J. G. (2009), “Research Electronic Data Capture (REDCap) – a Metadata-Driven Methodology and Workflow Process for Providing Translational Research Informatics Support,” Journal of Biomedical Informatics, 42, 377–381. DOI: 10.1016/j.jbi.2008.08.010.
  • Horton, N. J., Alexander, R., Parker, M., Piekut, A., and Rundel, C. (2022), “The Growing Importance of Reproducibility and Responsible Workflow in the Data Science and Statistics Curriculum,” Journal of Statistics and Data Science Education, 30, 207–208. DOI: 10.1080/26939169.2022.2141001.
  • Joshi, T., Budhathoki, P., Adhikari, A., Poudel, A., Raut, S., and Shrestha, D. B. (2021), “Improving Medical Education: A Narrative Review,” Cureus, 13, e18773. DOI: 10.7759/cureus.18773.
  • MacDougall, M., Cameron, H. S., and Maxwell, S. R. J. (2020), “Medical Graduate Views on Statistical Learning Needs for Clinical Practice: A Comprehensive Survey,” BMC Medical Education, 20, 1. DOI: 10.1186/s12909-019-1842-1.
  • Marks, R. (1989), “Emphasizing Concepts instead of Computations in Teaching Biostatistics,” in Proceedings of the Annual Meeting of the American Statistical Association: Section on Statistical Education, American Statistical Association.
  • Mills, J. D., and Raju, D. (2011), “Teaching Statistics Online: A Decade’s Review of the Literature about What Works,” Journal of Statistics Education, 19. DOI: 10.1080/10691898.2011.11889613.
  • National Academies of Sciences, Engineering, and Medicine. (2019), Reproducibility and Replicability in Science, Washington, DC: The National Academies Press. DOI: 10.17226/25303.
  • NEJM Knowledge + Team. (2016), “Exploring the ACGME Core Competencies (Part 1 of 7).” available at https://knowledgeplus.nejm.org/blog/exploring-acgme-core-competencies/.
  • Novack, L., Jotkowitz, A., Knyazer, B., and Novack, V. (2006), “Evidence-Based Medicine: Assessment of Knowledge of Fundamental Epidemiological and Research Methods among Medical Doctors,” Postgraduate Medical Journal, 82, 817–822. DOI: 10.1136/pgmj.2006.049262.
  • O’Brien, R. G., Bowling, D. W., Medendorp, S. V., and Piedmonte, P. R. (1995), “A Seminar in Clinical Biostatistics for Established Physicians,” in Proceedings of the Annual Meeting of the American Statistical Association: Section on Statistical Education, American Statistical Association.
  • Oster, R. A., Devick, K. L., Thurston, S. W., Larson, J. J., Welty, L. J., Nietert, P. J., Pollock, B. H., Pomann, G.-M., Spratt, H., Lindsell, C. J., and Enders, F. T. (2020), “Learning Gaps among Statistical Competencies for Clinical and Translational Science Learners,” Journal of Clinical and Translational Science, 5, e12. DOI: 10.1017/cts.2020.498.
  • Oster, R. A., and Enders, F. T. (2018), “The Importance of Statistical Competencies for Medical Research Learners,” Journal of Statistics Education: An International Journal on the Teaching and Learning of Statistics, 26, 137–142. DOI: 10.1080/10691898.2018.1484674.
  • Oster, R. A., Lindsell, C. J., Welty, L. J., Mazumdar, M., Thurston, S. W., Rahbar, M. H., Carter, R. E., Pollock, B. H., Cucchiara, A. J., Kopras, E. J., Jovanovic, B. D., and Enders, F. T. (2015), “Assessing Statistical Competencies in Clinical and Translational Science Education: One Size Does Not Fit All,” Clinical and Translational Science, 8, 32–42. DOI: 10.1111/cts.12204.
  • Peng, R. D., and Hicks, S. C. (2021), “Reproducible Research: A Retrospective,” Annual Review of Public Health, 42, 79–93. DOI: 10.1146/annurev-publhealth-012420-105110.
  • Perkel, J. M. (2022), “An End to Copy-and-Paste Errors,” Nature, 603, 191–192. DOI: 10.1038/d41586-022-00563-z.
  • Rao, G. (2008), “Physician Numeracy: Essential Skills for Practicing Evidence-Based Medicine,” Family Medicine, 40, 354–358.
  • The CITI Program. (2021), Accessed December 15, 2022. Available at https://www.citiprogram.org/index.cfm?pageID=260.
  • Welty, L. J., Carter, R. E., Finkelstein, D. M., Harrell, F. E., Lindsell, C. J., Macaluso, M., Mazumdar, M., Nietert, P. J., Oster, R. A., Pollock, B. H., Roberson, P. K., and Ware, J. H, on behalf of the Biostatistics, Epidemiology, and Research Design Key Function Committee of the Clinical and Translational Science Award Consortium. (2013), “Strategies for Developing Biostatistics Resources in an Academic Medical Center,” Academic Medicine: Journal of the Association of American Medical Colleges, 88, 454–460. DOI: 10.1097/ACM.0b013e31828578ed.
  • Windish, D. M., and Diener-West, M. (2006), “A Clinician-Educator’s Roadmap to Choosing and Interpreting Statistical Tests,” Journal of General Internal Medicine, 21, 656–660. DOI: 10.1111/j.1525-1497.2006.00390.x.
  • Windish, D. M., Huot, S. J., and Green, M. L. (2007), “Medicine Residents’ Understanding of the Biostatistics and Results in the Medical Literature,” JAMA, 298, 1010–1022. DOI: 10.1001/jama.298.9.1010.