585
Views
0
CrossRef citations to date
0
Altmetric
Ottawa Consensus Statement

Data sharing and big data in health professions education: Ottawa consensus statement and recommendations for scholarship

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon show all
Pages 471-485 | Received 14 Dec 2023, Accepted 20 Dec 2023, Published online: 02 Feb 2024

Abstract

Changes in digital technology, increasing volume of data collection, and advances in methods have the potential to unleash the value of big data generated through the education of health professionals. Coupled with this potential are legitimate concerns about how data can be used or misused in ways that limit autonomy, equity, or harm stakeholders. This consensus statement is intended to address these issues by foregrounding the ethical imperatives for engaging with big data as well as the potential risks and challenges. Recognizing the wide and ever evolving scope of big data scholarship, we focus on foundational issues for framing and engaging in research. We ground our recommendations in the context of big data created through data sharing across and within the stages of the continuum of the education and training of health professionals. Ultimately, the goal of this statement is to support a culture of trust and quality for big data research to deliver on its promises for health professions education (HPE) and the health of society. Based on expert consensus and review of the literature, we report 19 recommendations in (1) framing scholarship and research through research, (2) considering unique ethical practices, (3) governance of data sharing collaborations that engage stakeholders, (4) data sharing processes best practices, (5) the importance of knowledge translation, and (6) advancing the quality of scholarship through multidisciplinary collaboration. The recommendations were modified and refined based on feedback from the 2022 Ottawa Conference attendees and subsequent public engagement. Adoption of these recommendations can help HPE scholars share data ethically and engage in high impact big data scholarship, which in turn can help the field meet the ultimate goal: high-quality education that leads to high-quality healthcare.

Introduction

Vast data about learners are generated throughout the medical and health professions education (HPE) enterprise and, with the rise of digital information technology systems, these data are growing exponentially. HPE scholars are increasingly calling attention to the role education ‘big data’ can play in advancing the science and practice of health professional training (Chahine et al. Citation2018). Big data refers to the large and diverse datasets that are often magnitudes larger than traditional data sets. Big data created through aggregation and analysis of increasing volumes of data can reveal insights about practices that are not possible on a smaller scale. If harnessed appropriately, the field can answer important questions about the downstream impact of education on patient and population health outcomes, informing more socially accountable education.

Big data in HPE

Notably, while much of these data are generated through learner contributions to the processes that support the education enterprise (e.g. application to education programs, attendance records) and clinical data (e.g. electronic health records), a major source of education data amenable to big data scholarship is student assessment (e.g. test and assignment scores, selection/recruitment data, workplace-based appraisals, licensing and high-stakes examinations, records pertaining to patient care and clinical outcomes, etc.). While HPE is not yet in the same big data stratosphere as the tech giants, there is mounting evidence that this approach to scholarship is advancing the field (Triola and Pusic Citation2012; Ellaway et al. Citation2014; Chan et al. Citation2018). There are now several notable examples of research that have linked assessment data across training institutions (Jerant et al. Citation2015, Citation2021), with future education outcomes (Eva et al. Citation2012; Grierson et al. Citation2017; Barber et al. Citation2018; Ellis, Brennan, Scrimgeour, et al. Citation2022), and even practice and patient-level data (Tamblyn et al. Citation2007; Asch et al. Citation2014; Norcini et al. Citation2020; Han et al. Citation2023; Pusic et al. Citation2023; Smirnova et al. Citation2023), to develop novel theoretical and practice-relevant insights in various domains of HPE. Artificial intelligence based approaches are also enabled by big data (Masters Citation2019). While such studies can serve as models for big data research, they remain the exception rather than the rule. Yet, the continued automation and digitization of education and clinical activities portend a continued rise in HPE big data research.

The 5V framework for big data

Scholars typically define big data research with respect to the 5V framework, which contemplates the volume, velocity, variety, veracity, and value of the relevant dataset (Ahsaan Shafqat et al. Citation2019). The volume of assessment and other administrative HPE data continues to increase with the implementation of activities such as programmatic assessment and the rising use of virtual and online platforms. The velocity of assessment data is also increasing. The widespread use of app-based or online data collection tools for the frequent and daily assessments of competency-based education means these data are being processed faster than ever. The variety of relevant data is similarly increasing as clinical outputs, curricular experiences, and demographic characteristics are all becoming more integrated into enhanced models of assessment. Issues of veracity – accuracy and trustworthiness of data – are more contextual. However, continued focus on the validity of assessment data and the quality of data capture suggests that this element too is part of the emerging discourse of HPE big data (Downing and Haladyna Citation2004; Cook et al. Citation2016). The value proposition for big data research in HPE assessment is clear. These studies have considerable potential to inform improved education policy, advance social accountability missions, and stimulate innovations in education practice (Ellaway et al. Citation2014; Chahine et al. Citation2018).

Data throughout the continuum of education

The focus of this consensus statement is big data created through data sharing – defined as the voluntary provision of information from one individual, department or institution to another for a scholarly purpose. While this does not preclude the recommendations applying to other applications of big data (e.g. standalone studies within a learning management system or tool), our goal is to understand how sharing data across the continuum of HPE can impact and value across the field. While big data can be held within individual organizations, sharing creates the opportunity to connect data to important upstream and downstream factors. Big data is only so useful as it advances the goals of HPE research: enhancing the education of health professionals in a way that improves the health and wellness of patients, providers, and communities (Nundy et al. Citation2022). This requires a clear picture of the continuum of training along which professionals travel – from their early education to clinical practice. Each stage of this journey influences the subsequent one with the ultimate cumulative impact of the education process being the effective and competent delivery of healthcare. Sharing and linking data across this continuum – or even amongst many institutions within one stage of the continuum – can yield many interesting and meaningful insights to improve HPE (Wenghofer et al. Citation2009; Gale et al. Citation2017; Wenghofer et al. Citation2017; Smith et al. Citation2021; Ellis, Brennan, Lee, et al. Citation2022; Thelen et al. Citation2023).

Social accountability as a rationale for data sharing

The social accountability imperative to engage in this work is more pressing than ever. Numerous technological, societal, and environmental challenges face healthcare. There is greater awareness of educational conditions that are not optimal for learners and that reflect or promote structural inequities within society (Ona et al. Citation2020; Fyfe et al. Citation2021). Practicing professionals are ever more concerned about wellness, burnout, and working conditions (West et al. Citation2016; Mihailescu and Neiterman Citation2019). Moreover, HPE is a resource intensive endeavor. Many stakeholders have reasonable expectations that the impact of education can be understood and improved through scholarship (Cleland et al. Citation2022). Big data has the potential to unlock critical insights and play a significant role in meeting HPE’s social accountability at a scale not seen before. Thus, it becomes part of the HPE accountability mandate to harness the potential of big data for the benefit of learners, patients, and society. This will require the cooperation of various stakeholders involved in the education data ecosystem.

Stakeholders in the data ecosystem

The HPE assessment ecosystem, however, is complex. It involves many individuals, institutions, and groups, all of whom have different agendas, which may overlap and/or compete with one another. At the micro-level are the generators of data – learners and faculty and other individuals. As clinical data are increasingly connected to education data, patients, families, and other clinicians are also represented at this level. While these individuals may stand to benefit from big data research (e.g. personalized or precision education; Luan and Tsai Citation2021; Markus and Topi Citation2015), they also face the most consequential risks. At the meso-level are the programs, institutions, and organizations that function as data custodians, overseeing the collection, storage, securing of data, as well as setting policy and enacting policy changes. These organizations can vary from local programs to entire schools or authorities that use data to advance their missions. Often, these are the organizations that have control over sharing practices and provide data. And at the macro-level are those institutions that operate across the broader HPE system. This includes national bodies and organizations, which serve as data custodians (e.g. licensing bodies, accreditation bodies), sources of data, and stakeholders in overall system improvement. The macro level may also engage regulatory organizations and even governmental bodies such as departments or ministries of health and training. Organizations such as advocacy groups, representative bodies (e.g. student government), communities, and other non-education specific groups (e.g. Society for Improvement of Diagnosis in Medicine) will intersect with stakeholders and agents at each level. Notably, HPE big data research – especially that which utilizes assessment data – scales both the risk and benefit across each of these levels. As such, there is growing concern that the pursuit of HPE big data will expose individuals, institutions, and communities to potential harms.

Balancing risks and benefits

Risks associated with HPE big data research exist within both the procedures of data-sharing and the potential for the negative outcomes that big data research may have for individuals and communities. Procedural risks are those associated with the conduct of data sharing and analysis and include issues related to maintaining privacy, engaging informed consent, and maintaining commitments to data sovereignty. Outcome risks relate to how big data results are used or interpreted and the ways these can harm various stakeholders. Of particular concern is the potential that education and assessment may be used in ways that perpetuate inequality or inequity; especially for those who have been historically marginalized and/or who have had limited control over the use of their data. These risks have been articulated in other work (Grierson et al. Citation2023) and will differ for different stakeholders at the micro, meso, and macro levels (see ).

Table 1. Consideration of process and outcome risks.

Understanding the ethical imperative of HPE big data research and the prevailing risks associated with each system level (micro, meso, and macro) is necessary for sustainable scholarship. This consensus statement addresses these tensions using each level as a departure point for recommendations. Big data can potentiate several different classes of disciplinary inquiry (social science, epidemiology, psychometric research, machine learning, etc.) or practical applications such as building assessment tools or evaluation.

Ultimately, the ethical conduct of data sharing and big data research relies on a foundation of trust between researchers, institutions, and participants. As big data becomes more available and useful, having a set of guidelines for the HPE community will provide researchers with key considerations as they move forward and to avoid the challenges faced by other research areas in HPE (Whitehead et al. Citation2013; Tolsgaard et al. Citation2020; Masters et al. Citation2022). We believe that the following recommendations support the development of this trust; such that, by incorporating them into research practice, big data scholarship in HPE will be maximally beneficial to improved education systems while also minimizing the harm to which individuals, institutions, and communities are exposed through the processes and outcomes of inquiry.

The need for guidelines

This consensus statement identifies critical ideas that researchers must consider in creating and supporting education big data scholarship. We define the key ethical tensions surrounding HPE big data and offer recommendations related to data governance, ethics and equity, and the logistics and data sharing techniques that can support their address. We do not provide specific policies or prescriptions, except where strongly warranted by evidence or theoretical considerations. Instead, each section provides principles and frameworks that can guide scholars as they consider their own big data project. Many consensus recommendations for HPE big data scholarship are the same as those in other domains of big data scholarship, and similar to other domains of scholarship. While we will not repeat these recommendations, we encourage all scholars to adhere to previously articulated best practices. Our focus here is on the areas of novelty or complexity inherent to the creation of big data through data sharing in HPE.

Methodology

The authorship team was identified from the period of January 2022 to March 2022 by identifying known experts in the field and peer-recommendations from experts. The team met six times to define the scope of the consensus statement, identify the areas consensus and tension in big data practices. From May to August 2022, a draft consensus statement was prepared and presented at the Ottawa Conference 2022 meeting. The consensus results were presented at a workshop attended by international participants and then presented at a symposium for feedback.

This consensus document is the result of collective expertise and experience from a group of scholars representing multiple perspectives on the question of data sharing and refined from participant feedback from workshops and symposia. We have been purposeful in creating a scholarly group that can identify important tensions in data sharing. We bring different prioritization of values, cultural and regulatory experiences of medical education, lived experiences as researchers, clinicians, and learners. The authorship group reflects the current predominance of individuals from the global north in medical education. We aimed for diverse and knowledgeable expert group to gather multiple perspectives. Additionally, this work was presented to a wide audience at the Ottawa Conference 2022, which included many individuals from global south. We acknowledge internationally that concerns around data and data sharing vary considerably and our hope is that this document will encourage discussion and development internationally. We anticipate that this consensus will change and develop rapidly as new developments and practices emerge.

The recommendations are grouped according to theme and include: Recommendations for Framing Scholarship and Research; Recommendations for Addressing Ethical Issues in Sharing Data; Recommendations on Creating Big Data; Recommendations for Governance and Oversight; Recommendations for Knowledge Translation (KT) and Dissemination; and Recommendations for Advancing the Quality of Scholarship in HPE.

Ottawa consensus statement recommendations

Recommendations for framing scholarship and research

Identify the purpose, type of scholarship, and intended impact of your big data inquiry

HPE scholarship is diverse with many different disciplines and underlying purposes. The direct practice of teaching and assessment, rigorous innovation and program evaluation, and discovery-oriented research can all be served by big data scholarship. For example, using big data to assist in competency decisions can serve the needs of decision-making on individual learners, improve program outcomes, or contribute to the larger education or health system by creating new knowledge, processes, and systems (Thoma et al. Citation2021). Notably, the impacts and risks of this scholarship will vary based on the level on which the analyses focus. At the micro-level, research will have direct implications for learners and teaching faculty; at the meso-level, it will impact programs and institutional initiatives. Moreover, these analyses will require different types of data, which will differ in their granularity (e.g. aggregated data vs. individual records), generalizability, and the degree to which their inclusion influences the complexity of the inquiry (see ). The focus of analysis will also dictate the inquiry’s data sharing requirements, which should limit fishing expeditions for significant effects. An evaluation study of a program may stay at the descriptive level and not require individual-level data sharing, whereas a study intended to test predictive analytics will require individual level data replete with viable personal identifiers (Duong et al. Citation2019). Accordingly, different purposes of scholarship are subject to different ethical and regulatory regimes (e.g. quality improvement or program evaluation utilizing big data from a single institution may not require ethical review). Clearly articulating the purposes of scholarship can assist stakeholders in evaluating the appropriateness of participating in data sharing. We recommend that scholars identify, a priori, the purpose, type of scholarship, and level of impact intended for their inquiry.

Table 2. Types data and levels of scope.

Ground scholarship in theory or conceptual frameworks

HPE has long recognized the value of theory and conceptual frameworks in guiding scholarship (Hodges and Kuper Citation2012). Big data scholarship has been suggested to replace theory drive work in favor of purely empirical (Mazzocchi Citation2015) scholarship. While empirical and exploratory approaches (e.g. data mining) can produce meaningful insights (Knight and Buckingham Citation2017; Bayazit et al. Citation2023), conceptual frameworks serve scholars by clarifying, in advance, key considerations for the project. This informs the collection and connection of relevant data and how findings will be translated for integration into practice (Wise and Shaffer Citation2015; Wise and Vytasek Citation2017). Theory can inform several different stages of inquiry (Chan et al. Citation2018). However, the best time to integrate theory is before the study begins. While some frameworks may de-emphasize theory in favor of articulating the intended outcome (e.g. the descriptive–diagnostic–predictive–prescriptive framework for learning analytics), a theoretical orientation can help researchers explain their methodological choices, address questions of validity, and enhance the generalizability of their findings. Data sharing can enhance the quality of theory driven work by allowing for theory driven variable selection or creation. Challenging issues – such as what data needs to be included or excluded, how missing data must be handled, addressing confounders, and analysis decisions – are made more practical and focused by the incorporation of an explicit theory or conceptual framework (Dawson 2014; Wise and Shaffer Citation2015). There are several examples of studies either driven by theory (Yang et al. Citation2021) or contributing back to theory (Asch et al. Citation2014) in education and HPE, which can inspire newly engaged scholars.

Identify data types and sources a priori

While assessment is an obvious setting for big data, the specific data sources that can inform and enhance such HPE scholarship are varied. We cannot identify every possible variable that may fall within the umbrella of big data and have potential relevance to improving HPE assessment or general HPE. However, prominent data featured in the extant and emerging literature include personal identifiers, social demographics, assessment outcomes, curriculum information, and clinical outcomes including, but not limited to, those expressed at the level of the patient. There is also an increasing emphasis on the utility of qualitative data for assessment and program evaluation. Notably, advances in analysis techniques, such as natural language processing, make this type of data amenable to big data approaches (e.g. Ginsburg et al. Citation2022).

As the variety and subsequent heterogeneity of data in scholarship increases, it is important to remember the initial underlying purposes of why data were collected as well as its affordances. A data source collected for one purpose and utilized for another requires clear articulation of why the data are currently meaningful for the new purpose and reflection on its limits. Routine data for example could be used for a number of different purposes but the onus is one the research to state why the data are fit for those purposes. Sometimes these purposes might be potential or require some empirical exploration. A framework for assisting researchers is provided by Ellaway et al. (Citation2019): (i) data collected purposefully for assessment purposes; (ii) data collected purposefully for non-assessment purposes; (iii) data from clinical contexts that can be linked to assessment and other education data; and (iv) other data collected for non-medical education purposes (see ). This framework can inform tools, like data dictionaries, which can hold this information to guide researchers in addition to other relevant information such as the format of data elements (e.g. numerical, qualitative).

Recommendations for addressing ethical issues in sharing data

Institutional review oversight and accountability in appraising risks and benefits of research is necessary but not always sufficient

Institution ethics review boards have long held the primary role of overseeing ethical research and scholarly practice. Sharing data across institutions – including those which may not be academic entities and thus without review boards (e.g. credentialing bodies, regulators, private vendors of education software) – creates new challenges for traditional ethics (Regan & Jesse Citation2019; Metcalf and Crawford Citation2016; Someh et al. Citation2019; Ferretti et al. Citation2021). While ethical approval should always be sought, it may not be sufficient to address contexts in which long-term data repositories and data sharing collaborations serve as the basis of scholarship. The risk to those whose data is held may change over time as their social and professional status changes. Thus, more sensitive and vigilant ethical monitoring maybe required (Grierson et al. Citation2023). Nevertheless, the onus is on scholars – as in the conduct of any research or scholarship – to understand and declare the procedural risks as well as the potential risks entailed in the interpretation and outcomes of big data to key institutional stakeholders before they execute data linkages and engage in analyses. This should occur in consultation with the appropriate experts and stakeholders. Institutions that provide data are encouraged to proactively request the articulation of these risks so that they may be adequately communicated to those whose data is held within datasets.

As the data elements change and as regulations and requirements evolve, it is up to researchers and institutions to maintain good oversight of ethical processes. This may be supported by research ethics experts, including but not limited to those who comprise institutional review boards (Grierson et al. Citation2023). Unique governance structures such as special governance committee or board may be necessary for large collaborations (Kalkman et al. Citation2022). These structures will have a role in appropriate management practices that ensure quality assurance of the data, appropriate reporting to stakeholders, and adherence to protocols that contemplate how data are stored, returned to original custodians, and/or deleted or disaggregated.

We recommend that mechanisms for continual review of practices be part of data sharing collaborations and become part of the structure of data sharing agreements (DSAs) and governance. This recommendation extends to work that may be considered exempt from institutional ethical review (e.g. quality assurance or evaluation). While this may be perceived as onerous, some form of ethical oversight is essential to preserving trust, accountability, and confidence in the HPE big data enterprise. The greater the transparency, the greater this will be possible.

Manage transparency and informed consent issues

Consent is not just a formality but is the basis of protecting participants and building trust in the research enterprise. Sustainable research requires that participants understand how their data are being used (Braunack-Mayer et al. Citation2020). We recommend that scholars and institutions think proactively about the quality of consent from the stakeholders who are represented in datasets (Kalkman, Mostert, Gerlinger, et al. Citation2019; Wilhite et al. Citation2020). This may require more significant planning and major investment from institutions and researchers. This is not to say that respect for informed consent as a principle must always be enacted in one specific prescribed manner (Barocas and Nissenbaum Citation2014). It is not always possible to retroactively seek consent from every single individual represented in a dataset. In some jurisdictions, informed consent is not always required for analysis routinely collected registry data as long as individual confidentiality and privacy are maintained (Cathaoir et al. Citation2022). Research that is uses big data for quality improvement and evaluation may also not require a priori informed consent. If sufficient protections are in place, such as deidentification, then informed consent can be managed by strong oversight of the use and reporting of data. For prospectively collected data, individuals could be asked to give consent for a variety of scholarly purposes when the researcher or institution wants to preserve room for emerging and evolving research questions and ongoing activities. Similarly, participants could be presented with consent options in the form of ‘opting-out’, rather than ‘opting-in’ (Vellinga et al. Citation2011). Given the rapid pace of data growth and the emergence of new data elements and datasets from data sharing, participants could also be given the opportunity for time limited consent. In this approach, broad consent is given but must be reconfirmed with an updated list of data elements and analysis of purposes, risks, and benefits, at a predetermined point in time. This approach also has the benefit of allowing participants to update data elements that may have changed about them. Regardless of the approach, we strongly encourage informed consent to be incorporated in data sharing collaborations as the foundation of building trust in big data research.

Make respect for autonomy and equity: a guiding principle in the pursuit of societal benefit

A major challenge for HPE big data research in assessment is the threat that the results can be used ‘against’ learners and faculty or other specific groups. Accordingly, we must acknowledge the potential that prosocial research benefits may be offset by unintended outcomes at all stages of the data-driven research continuum – from research question through to the interpretation of findings (Florea and Florea Citation2020). We must remember that many of the identified risks are amplified for individuals from groups that have been historically disadvantaged through data misuse and surveillance methods. Research conducted with the best intentions that centers or highlights the ‘deficiencies’ of a group can unintentionally reinforce these deficiencies conceptually. An example is offered by scholars of studies of race as a factor in health outcomes (Krieger Citation2020), which center racial differences in essentialist ways while ignoring larger societal structures that lead to health disparity. At the same time, failing to collect important identity data can help perpetuate continuing inequality by hiding them. Data collection, analysis, and interpretation are never truly ‘neutral’ or ‘objective’; a disassociation of data uses from the communities that provided the data can amplify oppression and marginalization (Crawford et al. Citation2014; Ben-Portah and Ben Shara Citation2017). Including identifiable socio-demographic variables can help illuminate many important questions and assist in the cause of more just and equitable education and healthcare. There are increasing calls for the purposeful collection and sharing of these data to evaluate and advance equity initiatives. Researchers and institutions must walk a fine line in balancing the collection and analysis of data while doing their utmost to prevent harm (Grierson et al. Citation2023). There are no universal prescriptions or technical solutions for managing this issue but one recommendation would be to engage stakeholders from especially vulnerable groups including learners to understand what data are to be collected and for what purpose.

As a principle, researchers and stakeholders must explicitly articulate the specific issues and tensions. We recommend that researchers be guided by the values of respect for autonomy and equity in data sharing collaborations (Kalkman, Mostert, Udo-Beauvisage, et al. Citation2019). This could include meaningful reflection on how data can be ‘weaponized’ against those impacted, engagement of knowledgeable members of the relevant communities early in the design of large research collaborations, and ultimate respect for the principle of data sovereignty (Someh et al. Citation2019). Where possible, data management, processing, and transformation techniques should focus on and/or accentuate the variance accounted for by education policies, procedures, and systems rather than on that which pertains to individuals in the data (insofar as the research question dictates). Where socio-demographic variables are available, researchers should reflect on the value these add to the theory and conceptual basis of their scholarship relative to the risks its inclusion presents to stakeholders. This may involve engaging directly with the appropriate representatives of the groups or communities who are most impacted by analyses. This can help mitigate unintentional biases and ensure that questions center the appropriate variables (Krieger Citation2020; Grierson et al. Citation2023). Ultimately, researchers should be mindful of the consequences of their work and articulate how inappropriate interpretation could limit human autonomy (Chen and Liu Citation2015).

Recommendations for governance and oversight

Governance should explicitly reference and be based on guiding ethical values and principles

Effective governance is required to support ethical practice, and a sound basis in ethical principles is the basis of sound governance (Lefaivre et al. Citation2019). Beyond the regulatory and procedural aspects of governance bodies, it is crucial that these not lose sight of the foundational values that support research (Grierson et al. Citation2023). Thus, while the specifics of governance structures may vary, we recommend researchers and institutions develop an explicit statement of values that include preservation of privacy and anonymity, learner autonomy, non-discrimination, informed consent, appropriate data collection methods, and appropriate research purposes.

Governance should be representative of stakeholders

Given the complexity of governance issues and the potential ethical risks, researchers alone cannot determine governance structures. Governance bodies should reflect the perspectives of all relevant stakeholders including but not limited to learners, practicing physicians, data stewards, regulators, education institutions, researchers, and community members/patients, including knowledgeable representatives of marginalized, historically oppressed, and/or under-represented communities. We draw special attention to learner representation and ensuring it is done so adequately. Not only will this help promote engagement, it can also provide input on appropriate ways to receive consent, communicate risk and value, and ensure research is used in ways that promote prosocial aims. It may not always be possible to have specific representation of each stakeholder group within a governance structure, but it is necessary that the perspectives of all groups are included in the construction and execution of governance and the oversight of DSAs, data management practices, and other procedures. An example of this approach is the UKMED collaboration, which illustrates the principles of governance (Dowell et al. Citation2018) (). The UKMED collaboration includes not just the major medical education bodies but also representatives of medical students and healthcare organizations.

Table 3. UKMED example.

Governance should be based on the principles of data sovereignty

Governance structures should be legitimately empowered to enact and oversee data sharing that aligns with the rules, policies, and laws of the sovereign jurisdictions in which the to-be-shared data have been collected. Standards and best practices will vary from nation to nation as a function of the prevailing data protection regulations. For example, the use of personal data in UKMED is not reliant on individual consent from data subjects, as it is not a necessary condition for processing under the UK General Data Protection Regulation (UK GDPR). The UK GDPR allows personal data to be used without consent where it is necessary for statutory functions. This contrasts with many jurisdictions where data collection requires more formal consent. It is incumbent on researchers and institutions to understand the specific requirements for jurisdictions. Several jurisdictions are developing data protection acts, which often operate for commercial data but have implications for research data. As commercial tools are often involved in collecting education data (e.g. learning management systems), researchers may also require an understanding of the complexities of connecting these data from commercial vendors to both other local data as well as data held by academic institutions.

An especially important case for respecting data sovereignty is the collection of data from Indigenous peoples in colonized jurisdictions. Several such Indigenous communities have articulated appropriate principles for research engagement. These principles emerge from the history of colonization, active marginalization, and instances in which research data were collected to little benefit or active harm of Indigenous communities (Walter et al. Citation2021). The CARE principles (i.e. Collective Benefit, Authority to control, Responsibility, Ethics; Carroll et al. Citation2020) provide guidance for how to engage Indigenous communities in research data collection; further frameworks are contextualized to specific communities, such as, the OCAP principles (Ownership, Control, Access, Possession) in Canada (The First Nations Information Governance Centre Citation2014) and those articulated by the Maori Data Sovereignty Network in New Zealand (Te Māori Data Sovereignty Network Te Mana Raraunga Citation2022). Researchers are advised to carefully engage these and other frameworks as well as to engage with these communities prior to conceptualizing a big data project. Good practice would require researchers to seek partnership, center, and co-create the scholarly work with the relevant Indigenous community.

Governance frameworks should be explicitly articulated and evolve to meet the context of data sharing

There are several approaches to governance of data sharing and big data (Elouazizi Citation2014). Frameworks outline the overall philosophy and dictate how specific policies and practices as well as existing resources for governance are enacted. For example, the non-invasive governance framework argues for leveraging existing resources and institutional policy as much as possible to facilitate access to data (Elouazizi Citation2014) instead of creating new governance structures. Such a framework might be appropriate for multi-institution collaborations, which share data of minimal risk or in facilitating data sharing across the various units of a large institution. More high-risk data sharing activities might require the creation of new roles and oversight, such as data or privacy officers. Additional considerations will include resourcing governance appropriately and evaluating the success of governance. Explicitly addressing these issues in governance and clearly identifying a framework that aligns with the contextual demands of data sharing will ensure appropriate governance, thereby minimizing uncertainty for all stakeholders. The specific practices and governance policies can and should evolve as the needs of data sharing collaborations change. New partners, data sources, and the evolution of scholarship supported by data sharing should be accompanied by review of the governance practices.

Recommendations on creating big data

D ata sharing agreements are paramount to facilitate sharing

A DSA, clearly defines the flow and maintenance of data throughout a study – whether single project or a longitudinal initiative – and encourages a culture of data collaboration within the scholarship (Piwowar et al. Citation2008; Polanin and Terzian Citation2018). It is essential to facilitate good governance when sharing data is the basis of the big data project. DSAs outline the permitted scope of inquiry, address issues of data management (e.g. security, storage, and access), and ensure an organizational agreement that provides the necessary stability for ongoing data maintenance. These agreements often require legal overview, but always demand an underlying foundation of trust between those sharing the data and the scholars engaged in research. They ensure researchers consider the compatibility and jurisdictional variation in data structures, meaning, management, and adhere to the critical legal and administrative nuances that govern data sharing. This is for their own protection as well as those potentially impacted by the sharing of data (i.e. learners, programs, and the larger system).

Data sharing agreements that support ongoing longitudinal research collaborations should have periodic reviews ensuring they are fit for purpose (Kalkman, Mostert, Udo-Beauvisage, et al. Citation2019). Such a review should be conducted by individuals such as data privacy officers and relevant stakeholders. Continual quality control of data from each source organization is necessary to maintain the viability of the research objectives. Changes, modification, and issues of ‘data drift’ (i.e. change in data element collection and meaning over time) or other issues should be documented and reported in research records. Overall, a robust DSA can help mitigate issues of risk and ethical breach, guide resolutions and governance, and help all parties understand the appropriate uses of the data. It is also important to note that compatibility and jurisdictional variation in approaches to data management do exist. Sharing data internationally has a unique set of challenges as variations in legislation and regulations will add complexity at all levels. For example, the age at which trainees are considered ‘adults’ capable of giving consent may vary between jurisdictions. A collaboration between Singapore where students are considered minors until age 21 and the UK, where the age of majority is 18 will require different procedures for consent.

Assess data readiness early and in collaboration with data owners

Many of the education- and assessment-relevant variables that hold promise for education big data research are difficult to access. Learner-level data are often retained behind firewalls and contemplated within confidentiality agreements. Where data are publicly available, they are usually aggregated in summary tables, rendering it insufficiently granular to serve researchers. In this regard, we recommend institutions and data custodians pay deliberate and strategic attention to improving ‘data readiness’ for HPE big data research. This includes building data inventories, maintaining high-quality data capture and storage systems, and ensuring that data availability for scholarship is a priority. Institutions may also engage in proactive data readiness planning to identify the data elements and sources that are most amenable for different types of education scholarship. The available data may be more suited to questions posed at one level of analysis than at another. For example, anonymized assessment data may not assist in clarifying questions on the micro level (i.e. impacting individual learners or groups of learners) but can be aggregated to study programs or institutions at the meso level. Recognizing these relationships can assist stakeholders in determining data uses that have the potential to advance or detract from their institutional mission. Lawrence (Citation2017) provides a framework that institutions can use for analyzing their data readiness, which highlights key considerations for data trustworthiness and accessibility.

Data access may require permissions through data owners and/or custodians (i.e. learners, faculty, programs, and patients), researchers, third-party vendors/developers, and possibly others. Accordingly, it is important that the ways in which their data may be used for scholarship – and their protections and rights – are communicated to these stakeholders at the time of data collection (Kalkman et al. Citation2022).

Big data studies often seem to take on faith that size can overcome deficiencies in data quality. There are some situations where this maybe the case such as in training artificial intelligence models. However, data that are poorly recorded, stored with missing elements, and/or tagged inaccurately can sometimes drive research that lands on erroneous interpretations: the so-called phenomenon of ‘rubbish in, rubbish out’. Accordingly, HPE big data researchers must also confront underlying issues in the data themselves. Assessment data with poor or absent validity evidence (e.g. significant measurement error) may have limited utility and subsequent readiness for inquiry unless the underlying issues are addressed.

Create data management plans that support linking

Linking involves connecting two or more data sets through a common variable. Tremendous research benefit is gained by creating big data through linkages of smaller data sets held by various programs or organizations (Reiter et al. Citation2012; O'Mara et al. Citation2015; Grierson et al. Citation2017; Schumacher et al. Citation2020). For example, Grierson et al. (Citation2017) were able to link the postgraduate certification examination performance of physicians to admissions data from multiple training programs in Ontario. The research team was successful because of a strong data management plan that addressed the confidentiality concerns of data holders and identified the process of linking that would provide a reliable association at the individual physician level. This process relies on the enactment of good data management principles. Data mapping, the creation and maintenance of data dictionaries, and thorough documentation of data history are all crucial for successful linkages. In this regard, understanding the equivalencies – and differences – that exist within data elements, prior to linkage, is vital. This process of data harmonization is critical step for creating shared sets (Kush et al. Citation2020). Different organizations often have similarly named data elements, which may or may not be equivalent in meaning and content. Recognizing the exploratory nature of some big data research projects, where possible, education programs and data stewards across the continuum of training should commit to a shared set of data management processes that reflect common data standards that promote interoperability and, in turn, data sharing. This is an important aspect of data readiness that can mitigate many ethical risks and harms.

Most often, linkages happen at the record level, usually anchored to common identifiers pertaining to the individual data of learners, faculty, or patients represented across various data sets. Linking at this micro level requires special attention to concerns of the safety, confidentiality, and privacy of individuals represented within the datasets. In these cases, only the host organization and/or the data controller should have access to identifiable data, facilitating dataset maintenance, updating, and the creation of research extracts. Ultimately, the data used for research should not be openly identifiable at the individual level, and often not at the organizational level either. Care must be taken that aggregation and disaggregation processes do not re-identify individuals (Barocas and Nissenbaum Citation2014). For example, reporting the number of learners who have disclosed a disability and identify as female may lead to a very small cell size and would thus be potentially open for re-identification. Thus, there may be a minimum number of data points required before reporting on specific analyses to avoid potential re-identification (Statistics Canada Citation2022).

Research and linkage of data sets may lead to the creation of new data. Common examples are derived variables, which are computed from other variables in a dataset. For example, ‘course difficulty’ could be computed from pass rates of courses and used to estimate how challenging the academic pathway of trainees can be. Intellectual property, open-source dissemination, copyrights, and ownership associated with new data, products, or tools developed as a result of research activities and data linkages need to be addressed directly in DSAs to ensure transparency. New datasets should be treated with the same consideration for security and quality as source datasets. Education programs can support scholarship by having clearly articulated data management processes, which require researchers to develop data management plans that address how data will be linked and how the anonymity of individuals and/or organizations is protected. Scholars are recommended to verify the relationship between these data elements across data sets as part of their data management plans.

Have a plan for responding to mistakes and for communicating with stakeholders

It is inevitable, as in any human enterprise, that mistakes can occur – usually without malicious intent but through human error or neglect. There are, of course, degrees of errors, with the most severe being de-anonymization or re-identification of individuals and their sensitive or confidential data. Think of regular stories in the press about individuals’ personal details being released inadvertently. More worryingly is the possibility of cyber attacks and criminal activity that might seek to hold data for ransom. What matters is how scholars and institutions respond to these errors. Plan for these ‘worst case’ scenarios with stakeholders. Incorporate specific protocols into DSAs, including temporarily or indefinitely suspending data sharing till the error is addressed. Clear and transparent communication about errors in handling or sharing data, mechanisms for accountability, and quality improvement processes to rectify future errors will be important when errors do occur. Proactive quality assurance of the data sharing and management processes can help avoid errors.

Recommendations for knowledge translation and dissemination

Use KT frameworks to help organize and accelerate translation of findings

We recommend that researchers use a structured approach when communicating the findings of HPE big data research. The translation of knowledge from big data studies should first be clear on the purpose of study and the unit(s) of analysis targeted (micro, meso, and macro). Second, a translational framework should be used; especially, when big data are being used to examine the effects and impact of an educational intervention (Rubio et al. Citation2010). McGaghie (Citation2010) has adapted the National Institutes of Health (U.S.) framework for use in medical education research. We provide a modification in the context of big data studies in medical education (see ). This framework provides clarity as to what translation phase authors should target in their study. Reporting frameworks such as the SQUIRE-EDU guidelines (SQUIRE-EDU Citation2022) should be included in scholarly publications. The choice or combination of what reporting framework to use will depend, in part, on the study’s primary purpose (Kalkman, Mostert, Udo-Beauvisage, et al. Citation2019). Big data conducted in the context of continuous quality improvement is especially important to communicate systematically so that results can influence systems and systems can feedback into data sources. Ultimately, the sustainability of big data approaches depends on communicating the value of the big data scholarship to stakeholders. Being transparent about the changes to practice, advances to theory, and benefits to stakeholders is a key imperative.

Table 4. NIH Stages for medical education (adapted from McGaghie Citation2010).

Engage stakeholders and data owners

Scholars need to be attentive to the possible sensitivities big data exposes such as differential attainment for particular groups of individuals or deficits in training programs. One approach to address the appropriateness of the results and interpretations is to provide affected stakeholders with an opportunity to provide feedback to studies during all stages of the research. In many cases, this may involve partnership and co-creation of the project. It may also be a feature of governance models – institutions and individuals sharing data may reasonably expect to understand the results and implications. Of course, academic freedom must also be protected. One potential of big data studies in HPE is to explore sensitive issues including those around identity, societal inequality, and institutional practices. If the study meets rigorous methodologic standards, all authors have an ethical obligation to report the results with concerns and cautions included.

Shape the interpretation with humility and care

It is important to practice humility in drawing conclusions and implications in all scholarship. This is not unique to big data studies. Some medical education studies tend to overstate or over-extrapolate the impact of their findings. With increasing sophistication of analysis techniques and data, prescriptive recommendations based on analytics or algorithms are simultaneously much more feasible and much more dangerous. HPE big data studies will most often involve cohorts (‘populations’) of learners (ten Cate et al. Citation2020). Herein, the application of epidemiologically sound principles for extrapolating population level data to individual learners will be crucial. It is always a bit tricky to apply results from large populations to single learners. A large national data set may uncover small differences between groups that are statistically significant that are not meaningful to future outcomes as in the case of the US Medical College Achievement Test vis-à-vis ethnicity and race (Davis et al. Citation2013). As with all studies, researchers must determine a priori the threshold for educational significance using theory and the available literature; a determination that upholds the ethical imperative to not limit learners’ autonomy through the interpretation and translation of big data findings. Other advances in open data sharing can also enhance the quality of interpretations.

Recommendations for advancing the quality of scholarship in HPE

Engage new resources, enhanced or novel skills, disciplines, and leadership in HPE

Big data scholarship requires institutional resources to bring data together plus the support of knowledgeable experts who can ensure effective, sustainable practices are developed. Building the foundations and infrastructure for this type of research is not glamorous; but it is necessary work (Demchenko et al. Citation2013). Funding needs to support researchers and institutions as they establish governance and develop approaches to identify, access, link, and share data. These skills are not always found in traditional research fields, nor are they part of the formal training for many education scholars. A multi-disciplinary approach involving collaborators from across the research enterprise is necessary. Experts in privacy, data management, information technology, and education can each provide essential insight that supports even small projects that share data between two institutions along the continuum of training (De Mauro et al. Citation2016). Approaches and concepts from other disciplines such as ethics and sociology can help address the social and cultural issues related to building trust and capacity (Shilton et al. Citation2021). New disciplinary approaches are also emerging, which use big data but also critique or question dominant big data practices. For example, critical perspectives such as those from critical digital humanities and other fields can help big data scholarship address issues of bias, power, and equity concerns (Sander Citation2020).

The emerging field of data science has long recognized the various roles necessary to support big data research. The adjacent health services and biomedical engineering research fields are becoming more and more savvy to the core issues of big data scholarship and may serve as an important resource for HPE scholarship. Continued recruitment of data scientists will likely be necessary as big data becomes more integrated into the substrate of HPE scholarship. Informational technology expertise is also vital to build the appropriate digital infrastructure for success.

Bringing together data and the individuals who can technically manage the complexities of data sharing requires investment and leadership. We recommend that HPE leaders develop strategies that promote big data, data science scholars, and advocate using big data to advance strategic goals.

Advance data sharing interoperability through meta-data

The information of big data systems that afford system management and allow users to find, record, and share data content is referred to as metadata – or ‘data about data’ (Riley Citation2017). The role of meta-data in big data research has drawn the attention of a cross-disciplinary group of scholars who recognize its function in standardization serves as an important prerequisite for data sharing and high-quality analytics (Sweet and Moulaison Citation2013; Levin, Wanderer and Ehrenfeld Citation2015; Ghiringhelli et al. Citation2017). Descriptive metadata helps the identification and understanding of a resource in big data. Structural metadata indicates how the data components relate to one another. Administrative metadata supports data management and the enactment of intellectual property rights (Riley Citation2017). Taking workplace assessment as an example, descriptive metadata may provide information about when, by whom, and from which device a narrative comment was inputted, while structural metadata could indicate how the assessment records are ordered and linked together via a trainee identifier. Administrative metadata, on the other hand, may facilitate the audit log of the assessment record. Metadata is crucial for data-sharing (Sweet and Moulaison Citation2013; Ghiringhelli et al. Citation2017) and data governance. For instance, administrative metadata may illuminate for which elements of data an individual has provided consent for secondary use (Shaw Citation2019).

Use advancing methods and analysis techniques

Assessment of methodological rigor of big data analysis is key to supporting its claims. Given the established risks associated with inadequate or inappropriate analysis, there is a need to justify the methodological rigor of HPE research involving big data to ensure the resulting conclusions are sound. While established statistical methods may be used (e.g. regression, factor analysis, and Rasch analysis), with approaches to mitigating errors of causal inference and to assessing model quality, for new methods (e.g. machine learning; deep learning) are emerging and may provide new approaches to rigorous research (Tolsgaard et al. Citation2023). These methods include the use of dataset splitting into training, validation, and test data sets to appraise how well models developed on training data generalize to ‘new’ or unseen data. The terminology also differs slightly in terms of how models are evaluated in terms of its recall (sensitivity) and precision (false positive rate). A major challenge for all types of modeling based on big data is the data drift that can cause model performance degradation. This occurs when testing models developed on one data set is applied to new populations or in new settings. In HPE, where performance and learning are both content and context dependent, the generalization of knowledge and results across data sets may be problematic. This makes the use of theory, which serves to define what data to capture, store, and analyze, even more important in HPE big data research (Tolsgaard et al. Citation2020).

Address bias in the data and use the data to address bias

A significant concern for big data is reproducing biases against different populations due to initial biases in data collection and historical or structural reasons (O’Neil Citation2016). Approaches to mitigating these biases include adherence to the recommendations listed above in the sections on Ethics and Dissemination. Researchers should adopt a position of reflexivity as they address sensitive questions (Ibrahim et al. Citation2020). Still, the possibility of bias in the data should not preclude engaging in scholarship. Harm can be caused by the omission of data and a failure to collect more data. Emerging literature uses big data to advance equity and mitigate bias (Wesson et al. Citation2022).

Conclusions

Data sharing and big data approaches in HPE are already here and will continue to grow. The potential beneficial impact and value added by examining HPE through this perspective creates an imperative to engage in this scholarship. At the same time, as the benefits scale, so do the tensions and risks. These risks are deeply concerning and are often the biggest obstacle to creating big data scholarship. Thus, recommendations of this consensus are intended to provide principles that can help build trust in the quality, value, and ethical aspects of data sharing that affords big data scholarship. Each collaboration or project will have unique contextual and regulatory issues to address. However, following the above principles and recommendations can assist in navigating these issues. Adopting these recommendations will support data sharing and scholarship that harnesses the power of big data. This in turn, can be transformative for the learners, teachers, and patients that will benefit from our scholarship.

Acknowledgements

Acknowledgements and gratitude to the participants of workshop and symposium of the Ottawa Conference 2022.

Disclosure statement

The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the article.

Additional information

Funding

The author(s) reported there is no funding associated with the work featured in this article.

Notes on contributors

Kulamakan (Mahan) Kulasegaram

Kulamakan (Mahan) Kulasegaram, PhD, Associate professor, University of Toronto, Toronto, Canada.

Lawrence Grierson

Lawrence Grierson, PhD, Associate professor, McMaster University, Hamilton, Canada.

Cassandra Barber

Cassandra Barber, PhD(c), Maastricht University, Maastricht, Netherlands.

Saad Chahine

Saad Chahine, PhD, Associate professor, Queen’s University, Kingston, Canada.

Fremen Chichen Chou

Fremen Chichen Chou, MD, PhD, China Medical University Hospital, Taichung City, Taiwan.

Jennifer Cleland

Jennifer Cleland, PhD, Professor of Medical Education Research, Lee Kong Chian School of Medicine, Singapore.

Ricky Ellis

Ricky Ellis, MBChB, PhD, Honorary Clinical Senior Lecturer, University of Aberdeen, Aberdeen, United Kingdom.

Eric S. Holmboe

Eric S. Holmboe, MD, Chief, Research, Milestones Development and Evaluation Officer, Accreditation Council for Graduate Medical Education, Chicago, USA.

Martin Pusic

Martin Pusic, MD, PhD, Associate Professor of Pediatrics and Emergency Medicine, Harvard School of Medicine, Boston, USA.

Daniel Schumacher

Daniel Schumacher, MD, PhD, Med, Professor of Pediatrics, Cincinnati Children’s Hospital Medical Center/University of Cincinnati College of Medicine, Cincinnati, Ohio, USA.

Martin G. Tolsgaard

Martin G. Tolsgaard, MD, PhD, DMSc, Professor of Obstetrics and Gynecology, Copenhagen Academy for Medical Education and Simulation; University of Copenhagen, Copenhagen, Denmark.

Chin-Chung Tsai

Chin-Chung Tsai, PhD, Professor, National Taiwan Normal University, Taipei, Taiwan.

Elizabeth Wenghofer

Elizabeth Wenghofer, PhD, Professor, School of Kinesiology and Health Sciences, Laurentian University, Sudbury, Canada.

Claire Touchie

Claire Touchie, MD, MHPE, Professor of Medicine, University of Ottawa/The Ottawa Hospital, Ottawa, Ontario, Canada.

References

  • [SQUIRE-EDU] Standards for Quality Improvement Reporting Excellence for Education. 2022. SQUIRE. SQUIRE EDU; [accessed 2022 Jul 6]. squire-statement.org.
  • Ahsaan Shafqat U, Kaur H, Naaz S. 2019. An empirical study of big data: opportunities, challenges and technologies. In: Patnaik S, Ip A, Tavana M, Jain V, editors. New paradigm in decision science and management: advances in intelligent systems and computing. Vol. 1005. Singapore: Springer.
  • Asch DA, Nicholson S, Srinivas SK, Herrin J, Epstein J. 2014. How do you deliver a good obstetrician? Outcome-based evaluation of medical education. Acad Med. 89(1):24–26. doi: 10.1097/ACM.0000000000000067.
  • Barber C, Hammond R, Gula L, Tithecott G, Chahine S. 2018. In search of black swans: identifying students at risk of failing licensing examinations. Acad Med. 93(3):478–485. doi: 10.1097/ACM.0000000000001938.
  • Barocas S, Nissenbaum H. 2014. Big data’s end run around anonymity and consent. In: Jane L, Stodden V, Bender S, Nissenbaum H, editors. Privacy, big data, and the public good. New York (NY): Cambridge University Press; p. 44–75.
  • Bayazit A, Ilgaz H, Gönüllü İ, Erden Ş. 2023. Profiling students via clustering in a flipped clinical skills course using learning analytics. Med Teach. 45(7):724–731. doi: 10.1080/0142159X.2022.2152663.
  • Ben-Portah S, Ben Shara TH. 2017. Introduction: big data and education: ethical and moral challenges. Theory Res Educ. 15(3):243–248.
  • Braunack-Mayer AJ, Street JM, Tooher R, Feng X, Scharling-Gamba K. 2020. Student and staff perspectives on the use of big data in the tertiary education sector: a scoping review and reflection on the ethical issues. Rev Educ Res. 90(6):788–823. doi: 10.3102/0034654320960213.
  • Burk-Rafel J, Reinstein I, Feng J, Kim MB, Miller LH, Cocks PM, Marin M, Aphinyanaphongs Y. 2021. Development and validation of a machine learning-based decision support tool for residency applicant screening and review. Acad Med. 96(11S):S54–S61. doi: 10.1097/ACM.0000000000004317.
  • Carroll SR, Garba I, Figueroa-Rodríguez OL, Holbrook J, Lovett R, Materechera S, Parsons M, Raseroka K, Rodriguez-Lonebear D, Rowe R, et al. 2020. The CARE principles for indigenous data governance. Data Sci J. 19:1–12. doi: 10.5334/dsj-2020-043.
  • Cathaoir KÓ, Gunnarsdóttir HD, Hartlev M. 2022. The journey of research data: accessing Nordic Health Data for the purposes of developing an algorithm. Med Law Int. 22(1):52–74. doi: 10.1177/09685332211046179.
  • Chahine S, Kulasegaram KM, Wright S, Monteiro S, Grierson LEM, Barber C, Sebok-Syer SS, McConnell M, Yen W, De Champlain A, et al. 2018. A call to investigate the relationship between education and health outcomes using big data. Acad Med. 93(6):829–832. Jun doi: 10.1097/ACM.0000000000002217.
  • Chan T, Sebok-Syer S, Thoma B, Wise A, Sherbino J, Pusic M. 2018. Learning analytics in medical education assessment: the past, the present, and the future. AEM Educ Train. 22(2(2):178–187.
  • Chen X, Liu CY. 2015. Big data ethics in education: connecting practices and ethical awareness. J Educ Technol Dev Exchange. 8(2):5–9. doi: 10.18785/jetde.0802.05.
  • Cleland JA, Cook DA, Maloney S, Tolsgaard MG. 2022. "Important but risky": attitudes of global thought leaders towards cost and value research in health professions education. Adv Health Sci Educ Theory Pract. 27(4):989–1001. doi: 10.1007/s10459-022-10123-9.
  • Cook DA, Kuper A, Hatala R, Ginsburg S. 2016. When assessment data are words: validity evidence for qualitative educational assessments. Acad Med. 91(10):1359–1369. doi: 10.1097/ACM.0000000000001175.
  • Crawford KK, Miltner K, Gray ML. 2014. Critiquing big data: politics, ethics, epistemology special section introduction. Int J Commun Health. 8:1663–1672.
  • Davis D, Dorsey JK, Franks RD, Sackett PR, Searcy CA, Zhao X. 2013. Do racial and ethnic group differences in performance on the MCAT exam reflect test bias? Acad Med. 88(5):593–602. doi: 10.1097/ACM.0b013e318286803a.
  • De Mauro, A, Greco, M, Grimaldi, M. 2016b. Beyond data scientists: A review of Big Data skills and job families. Proceedings of 11th International Forum of Knowledge Asset Dynamics: FKAD 2016. Towards a new architecture of knowledge: Big Data, culture and creativity (pp.1844–1857). IFKAD.
  • Demchenko Y, Grosso P, de Latt C, Membrey P. 2013. Addressing big data issues in scientific data infrastructure. In: 2013 International Conference on Collaboration Technologies and Systems (CTS). San Diego, CA, USA. May 20–24 2013; p. 48–55. doi: 10.1109/CTS.2013.6567203.
  • Dowell J, Cleland J, Fitzpatrick S, McManus C, Nicholson S, Oppé T, Petty-Saphon K, King OS, Smith D, Thornton S, et al. 2018. The UK Medical Education Database (UKMED) what is it? Why and how might you use it? BMC Med Educ. 18(1):6. doi: 10.1186/s12909-017-1115-9.
  • Downing SM, Haladyna TM. 2004. Validity threats: overcoming interference with proposed interpretations of assessment data. Med Educ. 38(3):327–333. doi: 10.1046/j.1365-2923.2004.01777.x.
  • Duong MT, Rauschecker AM, Rudie JD, Chen PH, Cook TS, Bryan RN, Mohan S. 2019. Artificial intelligence for precision education in radiology. Br J Radiol. 92(1103):20190389. doi: 10.1259/bjr.20190389.
  • Ellaway RH, Pusic MV, Galbraith RM, Cameron T. 2014. Developing the role of big data and analytics in health professional education. Med Teach. 36(3):216–222. doi: 10.3109/0142159X.2014.874553.
  • Ellaway RH, Topps D, Pusic M. 2019. Data, big and small: emerging challenges to medical education scholarship. Acad Med. 94(1):31–36. doi: 10.1097/ACM.0000000000002465.
  • Ellis R, Brennan P, Scrimgeour DS, Lee AJ, Cleland J. 2022. Performance at medical school selection correlates with success in part A of the intercollegiate Membership of the Royal College of Surgeons (MRCS) examination. Postgrad Med J. 98(1161):e19. doi: 10.1136/postgradmedj-2021-139748.
  • Ellis R, Brennan PA, Lee AJ, Scrimgeour DS, Cleland J. 2022. Differential attainment at MRCS according to gender, ethnicity, age and socioeconomic factors: a retrospective cohort study. J R Soc Med. 115(7):257–272. doi: 10.1177/01410768221079018.
  • Elouazizi N. 2014. Critical factors in data governance for learning analytics. Learn Anal. 1(3):211–222. doi: 10.18608/jla.2014.13.25.
  • Eva KW, Reiter HI, Rosenfeld J, Trinh K, Wood TJ, Norman GR. 2012. Association between a medical school admission process using the multiple mini-interview and national licensing examination scores. JAMA. 308(21):2233–2240. doi: 10.1001/jama.2012.36914.
  • Ferretti A, Ienca M, Sheehan M, Blasimme A, Dove ES, Farsides B, Friesen P, Kahn J, Karlen W, Kleist P, et al. 2021. Ethics review of big data research: what should stay and what should be reformed? BMC Med Ethics. 22(1):51. doi: 10.1186/s12910-021-00616-4.
  • Florea D, Florea S. 2020. Big data and the ethical implications of data privacy in higher education research. Sustainability. 12(20):8744. doi: 10.3390/su12208744.
  • Fyfe M, Horsburgh J, Blitz J, Chiavaroli N, Kumar S, Cleland J. 2021. The do’s, don’ts and don’t knows of redressing differential attainment related to race/ethnicity in medical schools. Perspect Med Educ. 11(1):1–14. doi: 10.1007/S40037-021-00696-3.
  • Gale TCE, Lambe PJ, Roberts MJ. 2017. Factors associated with junior doctors’ decisions to apply for general practice training programmes in the UK: secondary analysis of data from the UKMED project. BMC Med. 15(1):220. doi: 10.1186/s12916-017-0982-6.
  • Ghiringhelli LM, Carbogno C, Levchenko S, Mohamed F, Huhs G, Lüders M, Oliveira M, Scheffler M. 2017. Towards efficient data exchange and sharing for big-data driven materials science: metadata and data formats. Npj Comput Mater. 3(1):1–9. doi: 10.1038/s41524-017-0048-5.
  • Ginsburg S, Stroud L, Lynch M, Melvin L, Kulasegaram K. 2022. Beyond the ratings: gender effects in written comments from clinical teaching assessments. Adv Health Sci Educ Theory Pract. 27(2):355–374. doi: 10.1007/s10459-021-10088-1.
  • Grierson L, Cavanagh A, Youssef A, Lee-Krueger R, McNeill K, Button B, Kulasegaram K. 2023. Inter-institutional data-driven education research: consensus values, principles, and recommendations to guide the ethical sharing of administrative education data in the Canadian medical education research context. Canadian Medical Education. Nov 8;14(5):113–120. doi: 10.36834/cmej.75874.
  • Grierson LEM, Mercuri M, Brailovsky C, Cole G, Abrahams C, Archibald D, Bandiera G, Phillips SP, Stirrett G, Walton JM, et al. 2017. Admission factors associated with international medical graduate certification success: a collaborative retrospective review of postgraduate medical education programs in Ontario. CMAJ Open. 5(4):E785–E790. 24 doi: 10.9778/cmajo.20170073.
  • Han M, Hamstra SJ, Hogan SO, Holmboe E, Harris K, Wallen E, Hickson G, Terhune KP, Brady DW, Trock B, et al. 2023. Trainee physician milestone ratings and patient complaints in early posttraining practice. JAMA Netw Open. 6(4):e23758. 3 doi: 10.1001/jamanetworkopen.2023.7588.
  • Hodges BD, Kuper A. 2012. Theory and practice in the design and conduct of graduate medical education. Acad Med. 87(1):25–33. doi: 10.1097/ACM.0b013e318238e069.
  • Ibrahim SA, Charlson ME, Neill DB. 2020. Big data analytics and the struggle for equity in health care: the promise and perils. Health Equity. 4(1):99–101. doi: 10.1089/heq.2019.0112.
  • Jerant A, Fancher T, Fenton JJ, Fiscella K, Sousa F, Franks P, Henderson M. 2015. How medical school applicant race, ethnicity, and socioeconomic status relate to multiple mini-interview-based admissions outcomes: findings from one medical school. Acad Med. 90(12):1667–1674. doi: 10.1097/ACM.0000000000000766.
  • Jerant A, Fancher T, Henderson MC, Griffin EJ, Hall TR, Kelly CJ, Peterson EM, Franks P. 2021. Associations of postbaccalaureate coursework with underrepresented race/ethnicity, academic performance, and primary care training among matriculants at five California medical schools. J Health Care Poor Underserved. 32(2):971–986. doi: 10.1353/hpu.2021.0075.
  • Kalkman S, Mostert M, Gerlinger C, van Delden JJM, van Thiel GJMW. 2019. Responsible data sharing in international health research: a systematic review of principles and norms. BMC Med Ethics. 20(1):21. doi: 10.1186/s12910-019-0359-9.
  • Kalkman S, Mostert M, Udo-Beauvisage N, van Delden JJ, van Thiel GJ. 2019. Responsible data sharing in a big data-driven translational research platform: lessons learned. BMC Med Inform Decis Mak. 19(1):283. doi: 10.1186/s12911-019-1001-y.
  • Kalkman S, van Delden J, Banerjee A, Tyl B, Mostert M, van Thiel G. 2022. Patients’ and public views and attitudes towards the sharing of health data for research: a narrative review of the empirical evidence. J Med Ethics. 48(1):3–13. doi: 10.1136/medethics-2019-105651.
  • Knight S, Buckingham S. 2017. Theory and learning analytics. In: Lang C, Siemens AF, Wise AF, et al., editors. Handbook of learning analytics. Society for Learning Analytics Research. SoLAR, Vancouver, BC. doi: 10.18608/hla22.
  • Krieger N. 2020. Measures of racism, sexism, heterosexism, and gender binarism for health equity research: from structural injustice to embodied harm—an ecosocial analysis. Annu Rev Public Health. 41(1):37–62. doi: 10.1146/annurev-publhealth-040119-094017.
  • Kush RD, Warzel D, Kush MA, Sherman A, Navarro EA, Fitzmartin R, Pétavy F, Galvez J, Becnel LB, Zhou FL, et al. 2020. FAIR data sharing: the roles of common data elements and harmonization. J Biomed Inform. 107:103421. doi: 10.1016/j.jbi.2020.103421.
  • Lawrence N. 2017. Data readiness levels. arXiv:1705.02245.
  • Lefaivre S, Behan B, Vaccarino A, Evans K, Dharsee M, Gee T, Dafnas C, Mikkelsen T, Theriault E. 2019. Big data needs big governance: best practices from Brain-CODE, the Ontario-Brain Institute’s neuroinformatics platform. Front Genet. 10:191. doi: 10.3389/fgene.2019.00191.
  • Levin MA, Wanderer JP, Ehrenfeld JM. 2015. Data, big data, and metadata in anesthesiology. Anesth Analg. 121(6):1661–1667. doi: 10.1213/ANE.0000000000000716.
  • Luan H, Tsai CC. 2021. A review of using machine learning approaches for precision education. Educ Technol Soc. 24(1):250–266.
  • Markus ML, Topi H. 2015. Big data, big decisions for science, society, and business. Arlington (VA): National Science Foundation.
  • Masters K, Taylor D, Loda T, Herrmann-Werner A. 2022. AMEE Guide to ethical teaching in online medical education: AMEE Guide No. 146. Med Teach. 44(11):1194–1208. doi: 10.1080/0142159X.2022.2057286.
  • Masters K. 2019. Artificial intelligence in medical education. Med Teach. 41(9):976–980. doi: 10.1080/0142159X.2019.1595557.
  • Mazzocchi F. 2015. Could big data be the end of theory in science? A few remarks on the epistemology of data-driven science. EMBO Rep. 16(10):1250–1255. Oct doi: 10.15252/embr.201541001.
  • McGaghie WC. 2010. Medical education research as translational science. Sci Transl Med. 2(19):19cm8. doi: 10.1126/scitranslmed.3000679.
  • Metcalf J, Crawford K. 2016. Where. are human subjects in big data research? The emerging ethics divide. Big Data Soc. 3(1):205395171665021. doi: 10.1177/2053951716650211.
  • Mihailescu M, Neiterman EA. 2019. A scoping review of the literature on the current mental health status of physicians and physicians-in-training in North America. BMC Public Health. 19(1):1363. doi: 10.1186/s12889-019-7661-9.
  • Norcini JJ, Boulet JR, Opalek A, Dauphinee WD. 2020. Specialty board certification rate as an outcome metric for GME training institutions: a relationship with quality of care. Eval Health Prof. 43(3):143–148. doi: 10.1177/0163278718796128.
  • Nundy S, Cooper LA, Mate KS. 2022. The quintuple aim for health care improvement: a new imperative to advance health equity. JAMA. 327(6):521–522. doi: 10.1001/jama.2021.25181.
  • O’Neil C. 2016. Weapons of math destruction: how big data increases inequality and threatens democracy. New York (NY): Crown Publishing Group.
  • O'Mara DA, Canny BJ, Rothnie IP, Wilson IG, Barnard J, Davies L. 2015. The Australian Medical Schools Assessment Collaboration: benchmarking the preclinical performance of medical students. Med J Aust. 202(2):95–98. doi: 10.5694/mja14.00772.
  • Ona FF, Amutah-Onukagha NN, Asemamaw R, Schlaff AL. 2020. Struggles and tensions in antiracism education in medical school: lessons learned. Acad Med. 95:S163–S168. doi: 10.1097/ACM.0000000000003696.
  • Piwowar HA, Becich MJ, Bilofsky H, Crowley RS, on behalf of the caBIG Data Sharing and Intellectual Capital Workspace. 2008. Towards a data sharing culture: recommendations for leadership from academic health centers. PLoS Med. 5(9):e183. doi: 10.1371/journal.pmed.0050183.
  • Polanin J, Terzian M. 2018. A data-sharing agreement helps to increase researchers’ willingness to share primary data: results from a randomized controlled trial. J Clin Epidemiol. 106:60–69. doi: 10.1016/j.jclinepi.2018.10.006.
  • Pusic MV, Birnbaum RJ, Thoma B, Hamstra SJ, Cavalcanti RB, Warm EJ, Janssen A, Shaw T. 2023. Frameworks for integrating learning analytics with the electronic health record. J Contin Educ Health Prof. 43(1):52–59. doi: 10.1097/CEH.0000000000000444.
  • Regan PM, Jesse M. 2019. Ethical challenges of edtech, big data and personalized learning: twenty-first century student sorting and tracking. Ethics Inf Technol. 21(3):167–179. doi: 10.1007/s10676-018-9492-2.
  • Reiter HI, Lockyer J, Ziola B, Courneya CA, Eva K, Canadian Multiple Mini-Interview Research Alliance (CaMMIRA). 2012. Should efforts in favor of medical student diversity be focused during admissions or farther upstream? Acad Med. 87(4):443–448. doi: 10.1097/ACM.0b013e318248f7f3.
  • Riley J. 2017. Understanding metadata. Washington (DC): National Information Standards Organization; p. 23. http://www.niso.org/publications/press/UnderstandingMetadata.pdf
  • Rubio DM, Schoenbaum EE, Lee LS, Schteingart DE, Marantz PR, Anderson KE, Platt LD, Baez A, Esposito K. 2010. Defining translational research: implications for training. Acad Med. 85(3):470–475. doi: 10.1097/ACM.0b013e3181ccd618.
  • Sander I. 2020. What is critical big data literacy and how can it be implemented? Internet Policy Rev. 9(2):1–22. doi: 10.14763/2020.2.1479.
  • Schumacher DJ, West DC, Schwartz A, Li S, Millstein L, Griego EC, Turner TL, Herman BE, Englander R, Hemond J, for the Association of Pediatric Program Directors Longitudinal Educational Assessment Research Network General Pediatrics EPAs Study Group. 2020. Longitudinal assessment of resident performance using entrustable professional activities: a multi-site study. JAMA Netw Open. 3(1):e1919316. doi: 10.1001/jamanetworkopen.2019.19316.
  • Shaw DM. 2019. Defining data donation after death: metadata, families, directives, guardians and the route to big consent. In: The ethics of medical data donation; p. 151–159. eds. Krutzinna J & Floridi L. Springer Cham.
  • Shilton K, Moss E, Gilbert SA, Bietz MJ, Fiesler C, Metcalf J, Vitak J, Zimmer M. 2021. Excavating awareness and power in data science: a manifesto for trustworthy pervasive data research. Big Data Soc. 8(2):205395172110407. doi: 10.1177/20539517211040759.
  • Smirnova A, Chahine S, Milani C, Schuh A, Sebok-Syer SS, Swartz JL, Wilhite JA, Kalet A, Durning SJ, Lombarts KMJMH, et al. 2023. Using resident-sensitive quality measures derived from electronic health record data to assess residents’ performance in pediatric emergency medicine. Acad Med. 98(3):367–375. doi: 10.1097/ACM.0000000000005084.
  • Smith BK, Yamazaki K, Tekian A, Holmboe E, Hamstra SJ, Mitchell EL, Park YS. 2021. The use of learning analytics to enable detection of underperforming trainees: an analysis of National Vascular Surgery Trainee ACGME Milestones Assessment Data. Ann Surg. 277(4):e971–e977. doi: 10.1097/SLA.0000000000005243.
  • Someh I, Davern M, Breidbach CF, Shanks G. 2019. Ethical issues in big data analytics: a stakeholder perspective. Commun Assoc Inform Syst. 44(1):718–747. doi: 10.17705/1CAIS.04434.
  • Statistics Canada. 2022. [accessed 2022 Jul 31]. https://www150.statcan.gc.ca/n1/pub/75-202-x/2009000/know-savoir-eng.htm.
  • Sweet LE, Moulaison LH. 2013. Electronic health records data and metadata: challenges for big data in the United States. Big Data. 1(4):245–251. doi: 10.1089/big.2013.0023.
  • Tamblyn R, Abrahamowicz M, Dauphinee D, Wenghofer E, Jacques A, Klass D, Smee S, Blackmore D, Winslade N, Girard N, et al. 2007. Physician scores on a national clinical skills examination as predictors of complaints to medical regulatory authorities. JAMA. 298(9):993–1001. doi: 10.1001/jama.298.9.993.
  • Te Māori Data Sovereignty Network Te Mana Raraunga. 2022. [accessed 2022 Aug 1]. https://www.temanararaunga.maori.nz/#:∼:text=The%20purpose%20of%20Te%20Mana,can%20be%20safeguarded%20and%20protected.
  • ten Cate O, Dahdal S, Lambert T, Neubauer F, Pless A, Pohlmann PF, van Rijen H, Gurtner C. 2020. Ten caveats of learning analytics in health professions education: a consumer’s perspective. Med Teach. 42(6):673–678. doi: 10.1080/0142159X.2020.1733505.
  • The First Nations Information Governance Centre. 2014. Ownership, Control, Access and Possession (OCAP™): the path to first nations information governance. Ottawa: The First Nations Information Governance Centre.
  • Thelen AE, George BC, Burkhardt JC, Khamees D, Haas MR, Weinstein D. 2023. Improving graduate medical education by aggregating data across the medical education continuum. Acad Med. 4:10–97. doi: 10.1097/ACM.0000000000005313.
  • Thoma B, Ellaway RH, Chan TM. 2021. From Utopia through dystopia: charting a course for learning analytics in competency-based medical education. Acad Med. 96(7S):S89–S95. 1 doi: 10.1097/ACM.0000000000004092.
  • Tolsgaard MG, Boscardin CK, Park YS, Cuddy MM, Sebok-Syer SS. 2020. The role of data science and machine learning in health professions education: practical applications, theoretical contributions, and epistemic beliefs. Adv Health Sci Educ Theory Pract. 25(5):1057–1086. doi: 10.1007/s10459-020-10009-8.
  • Tolsgaard MG, Pusic MV, Sebok-Syer SS, Gin B, Svendsen MB, Syer MD, Brydges R, Cuddy MM, Boscardin CK. 2023. The fundamentals of artificial intelligence in medical education research: AMEE Guide No. 156. Med Teach. 45(6):565–573. doi: 10.1080/0142159X.2023.2180340.
  • Triola MM, Pusic MV. 2012. The education data warehouse: a transformative tool for health education research. J Grad Med Educ. 4(1):113–115. doi: 10.4300/JGME-D-11-00312.1.
  • Vellinga A, Cormican M, Hanahoe B, Bennett K, Murphy AW. 2011. Opt-out as an acceptable method of obtaining consent in medical research: a short report. BMC Med Res Methodol. 11(1):40. doi: 10.1186/1471-2288-11-40.
  • Walter M, Lovett R, Maher B, Williamson B, Prehn J, Gawaian B, Lee V. 2021. Indigenous data sovereignty in the era of big data and open data. Aust J Social Issues. 56(2):143–156. doi: 10.1002/ajs4.141.
  • Wenghofer E, Klass D, Abrahamowicz M, Dauphinee D, Jacques A, Smee S, Blackmore D, Winslade N, Reidel K, Bartman I, et al. 2009. Doctor scores on national qualifying examinations predict quality of care in future practice. Med Educ. 43(12):1166–1173. doi: 10.1111/j.1365-2923.2009.03534.x.
  • Wenghofer EF, Hogenbirk JC, Timony PE. 2017. Impact of the rural pipeline in medical education: practice locations of recently graduated family physicians in Ontario. Hum Resour Health. 15(1):16. doi: 10.1186/s12960-017-0191-6.
  • Wesson P, Hswen Y, Valdes G, Stojanovski K, Handley MA. 2022. Risks and opportunities to ensure equity in the application of big data research in public health. Annu Rev Public Health. 43(1):59–78. doi: 10.1146/annurev-publhealth-051920-110928.
  • West CP, Dyrbye LN, Erwin PJ, Shanafelt TD. 2016. Interventions to prevent and reduce physician burnout: a systematic review and meta-analysis. Lancet. 388(10057):2272–2281. doi: 10.1016/S0140-6736(16)31279-X.
  • Whitehead CR, Hodges BD, Austin Z. 2013. Captive on a carousel: discourses of 'new’ in medical education 1910–2010. Adv Health Sci Educ Theory Pract. 18(4):755–768. doi: 10.1007/s10459-012-9414-8.
  • Wilhite JA, Altshuler L, Zabar S, Gillespie C, Kalet A. 2020. Development and maintenance of a medical education research registry. BMC Med Educ. 20(1):199. doi: 10.1186/s12909-020-02113-5.
  • Wise AF, Shaffer DW. 2015. Why theory matters more than ever in the age of big data. Learn Anal. 2(2):5–13. doi: 10.18608/jla.2015.22.2.
  • Wise AF, Vytasek J. 2017. Handbook of learning analytics. Society for Learning Analytics Research. SoLAR Vancouber BC.
  • Yang CC, Chen IY, Ogata H. 2021. Toward precision education. Educ Technol Soc. 24(1):152–163.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.