2,066
Views
3
CrossRef citations to date
0
Altmetric
OPEN PEER COMMENTARES

Structural Disparities in Data Science: A Prolegomenon for the Future of Machine Learning

This article refers to:
Identifying Ethical Considerations for Machine Learning Healthcare Applications

As disparities and data science researchers, we write in response to Char and colleagues (Citation2020) paper on “Identifying Ethical Considerations for Machine Learning Healthcare Applications.” While their overall interest in establishing an ethical framework for machine learning (ML) is welcome, the focus on the pipeline of ML research and development risks reifying the existing approaches and neglects alternative voices and approaches. We contend that by examining health disparities and bias as the starting point, a richer discussion can be produced in regards to impact, ethics and outcomes of ML approaches.

Char and colleagues raise the issue of diverse stakeholders in their paper but they never make clear what diversity entails or the implications of structural disparities on data science and machine learning. Structural disparities and bias take many forms and can come in many dimensions. Racial, ethnic, sex, socioeconomic, sexual orientation, gender identity and disability are among some of the most widely acknowledged axes through which structural disparities are experienced. In addition, it is critical to acknowledge that it is often at the intersection points of these identities that the most toxic forms of bias and inequity emerge (Milburn et al. Citation2019).

One key example from recent experience is the report about racial bias in a commercial insurance algorithm (Benjamin Citation2019; Obermeyer et al. Citation2019). Briefly, the algorithm was designed to predict costs of care as a proxy for health care needs. Because our current health care system spends less resources on Black patients with the same medical risk as White patients, the Black patients tended to be much sicker with providers spending less on their care. The tautological result was that these Black patients ended up getting triaged to lower levels of support in hospitals that used the algorithm. In the wake of this finding, the company moved to adjust its risk calculation but the overall impact of this biased algorithm on the care provided by its member hospitals is unknown.

In response to the rapid development of artificial intelligence (AI) models focused on the COVID-19 pandemic, concerns have been raised about propagation of biased algorithms (Röösli et al. Citation2020). Röösli and colleagues argue for the use of bias assessment tools like PROBAST (Wolff et al. Citation2019) and for greater regulatory oversight of these tools as a means of mitigating bias. While these moves are welcome and potentially beneficial, they are still largely centered on the practices within the data science community and as with Char and colleagues paper seem to look for solutions within the narrow confines of the AI and data science community’s standards.

We feel it is essential to forefront attention to health disparities and equity in the pipeline model that Char and colleagues advocate. First, it is critical that data scientists partner with health equity researchers at the beginning of their work. While errors may happen, the current structure of algorithm development followed by correction should be the exception and not the norm. Early partnership between researchers can potentially prevent biased models from being released. As shown in the insurance example, these tools have impacts soon after they are released and to allow bias to cause injury has real costs to populations that are already facing disparities. Another rationale for partnership at the beginning is based on the acknowledgement that the field of data science and the technology sector more generally have a history of poor minority representation (Myers Citation2018) especially in leadership roles that often make choices about research and product design (Curtis Citation2019).

In considering the ways that bias can enter into the data, it is important to consider the types of data at play. In our research using data from the electronic health record (EHR), we broadly consider data as either structured (ie. laboratory, discrete fields) or unstructured (ie. textual notes, narrative reports, images). Both of these types of data are prone to bias. Vyas and colleagues have complied a list of race corrections that have been undertaken in some fields of medicine (Vyas et al. Citation2020). More of these associate with structured data. For example, one risk model predicted a higher glomerular filtration rate (eGFR) in Blacks by as much as 20% over White counterparts. The concern raised is that this misclassifies Blacks as having better kidney function and may delay referral to a kidney specialist. Some hospitals have stopped the practice of race-based eGFR in the wake of critical reports and assessments (Zoler Citation2020). The use of race-based eGFR is just one example of a model derived from structured data that is biased, and its use at scale with machine learning within risk calculators or other clinical decision support systems can over- or under-estimate risk in large numbers of individuals when automated. The science underlying structured data can then create co-dependencies once integrated into widely available software tools or repositories.

For unstructured data such as clinical notes, we should draw attention to biased language in describing patient experience, symptomology and characteristics. Our own work has focused on the development of risk classifiers to identify cases of substance misuse from notes collected in the EHR during usual care. We have tried to look carefully at the potential for bias in subgroups affected by our models during our work in validating the models’ overall discrimination and calibration against a manually screened reference cohort of patients. In working with health disparities researchers, we have come to understand that these measures alone may not be sufficient but we are limited to the data available in the EHR. Therefore, better data are needed to assess both explicit and implicit provider bias as well as bias perceived by patients themselves. The methods from equity research and capturing the right data to support these approaches are needed to inform the fairness of our models. Since the methods from equity research require data not readily available in the data warehouses of health systems, there is an impetus to change our data governance so we may better examine bias in our new era of digital health. Such an approach is time consuming but represents one of the few ways that we can start to triangulate on the structural and interpersonal biases that are inherent in our systems of care.

The greater challenge is what Rajkomar and colleagues articulate in trying to incorporate concepts of distributive justice and fairness in machine learning (Rajkomar et al. Citation2018). Such approaches require additional effort and time, and to some degree they run against the tendency of academic research and technological developers to be the first to develop a novel model. Only by rewarding just and fair models that are developed to reduce disparities and improve equity can we hope to promote higher quality science. This could be encouraged by having academic journals require authors to address these issues during peer review prior to publication of ML models.

Finally, there is need to build new pathways for underrepresented minorities to be part of the ML community and contribute toward the goal of advancing high-quality science at the intersection of equity and data science. Only by developing better and more inclusive pipelines can we hope to create a diverse workforce with expertise in both data analytics and equality research to discern bias during development and prevent propagation of biased tools throughout systems.

DISCLOSURE STATEMENT

Matthew M. Churpeck declares patent pending (ARCD. P0535US.P2) for a risk model; research support from EarlySense; NIH R01 GM123193. No conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by the National Center for Advancing Translational Sciences [KL2TR002387, UL1TR002389], National Institute of General Medical Sciences [R01GM123193], National Institute on Alcohol Abuse and Alcoholism [K23AA024503], National Institute on Drug Abuse [R01DA041071, UG1DA049467], National Institute on Minority Health and Health Disparities [U54MD010711].

REFERENCES