1,167
Views
16
CrossRef citations to date
0
Altmetric
Brief Report

Transforming big data into computational models for personalized medicine and health care

La transformación de los macrodatos en modelos computacionales para la medicina personalizada y la atención en salud

Transformer les bases de données en modèles Informatiques pour la médecine personnalisée et les soins de santé

&

Abstract

Health care systems generate a huge volume of different types of data. Due to the complexity and challenges inherent in studying medical information, it is not yet possible to create a comprehensive model capable of considering all the aspects of health care systems. There are different points of view regarding what the most efficient approaches toward utilization of this data would be. In this paper, we describe the potential role of big data approaches in improving health care systems and review the most common challenges facing the utilization of health care big data.

Los sistemas de atención de salud generan un enorme volumen de distintos tipos de datos. Debido a la complejidad y desafíos inherentes al estudio de la información médica, todavía no es posible crear un modelo comprensible capaz de incluir todos los aspectos de los sistemas de atención en salud. Existen diferentes puntos de vista acerca de cuáles serían las aproximaciones más eficientes para la utilización de esta información. En este artículo se describe el papel potencial de las aproximaciones de los macrodatos para mejorar los sistemas de atención de salud y se revisan los desafíos más comunes que enfrenta la utilización de los macrodatos en la atención de salud.

Les systèmes de santé génèrent un volume énorme de différents types de données. En raison de la complexité et des difficultés liées à l'etude des informations médicales, il n'est pas encore possible de créer un modèle complet prenant en compte tous les aspects des systèmes de santé. Les points de vue diffèrent sur les façons les plus efficaces d'utiliser ces données. Dans cet article, nous décrivons leur rôle potentiel dans l'amélioration des systèmes de santé et nous analysons les difficultés les plus courantes liées à l'utiiisation des données de santé.

Introduction

Recently, the term “big data” has been used more and more in topics related to the analysis of huge amounts of information. Characteristics of big data—including medical data—are volume (large), variety, velocity, and veracity. In this case, volume refers to the size of the data, variety refers to different types/sources of data, velocity refers to the speed of data generation, and veracity refers to the quality of data or data uncertainty due to factors such as noise, artifacts, and missing data. In the health care system, a variety of resources—such as randomized controlled clinical trials, wearable devices (eg, clothing and accessories incorporating sensors that measure activity or parameters such as blood pressure), video streams (eg, a video-based system for detecting fall events in elderly persons living alone at home), personal genomic services, imaging devices, and social media or Internet searches—provide data that could be useful for many applications.Citation1 Such applications include drug and medical device safety surveillance, quality of care and performance measurement, making of diagnoses and prediction of prognosis, population management, decision support and precision medicine, and public health and research applications.Citation2,Citation3

Over the last decade, medical researchers have taken into account the heterogeneity of data in their work, where the genetics of subjects have been studied as a function of epistasis, and family history and personal life events have been used to predict clinical evolution. Big data technology should expand this fascinating field of multivariate approach research and overcome the inability of existing approaches to effectively gather, share, and use information in a more comprehensive manner within the health care system.Citation2 In order to utilize health care big data, research groups and organizations have designed and implemented many frameworks/ methods. One of the most established frameworks is Hadoop, which supports the analysis of large data sets. This framework has been used in the implementation of various applications, such as disease prediction in patients, diagnosis of cancer, patient emergency alerts, generation of disease decision rules, medical data quality assessment, and personalized recommendation systems.Citation4-Citation10

In precision medicine, a patient's unique characteristics are used to tailor treatment in a manner that might be more elaborate than the standard course. For example, cardiologists currently use an algorithm that for a given patient predicts the occurrence of a myocardial infarction within 5 or 10 years based on body weight, arterial pressure, smoking status, blood lipid analysis results, and personal and family cardiovascular history. Precision medicine can be used in the diagnosis and prevention of disease, such as cancer, owing to advances in next-generation sequencing (NGS), liquid biopsy technology, computational biology methods, high-throughput functional screening, and analytical approaches.Citation11

In the abovementioned domains, big data mining techniques have led to interesting results. For example, performance with such techniques is comparable to that of medical experts. It will be interesting to follow studies on the efficiency of these mining techniques in comparison with usual clinical management.

In this article, we briefly review data analysis methods for health care systems and examine challenges facing the utilization of this data.

Computational approaches toward personalized medicine

Although the concept of personalized medicine is not new, the emergence of powerful analytical tools has recently opened new avenues to predictive, preventive, participatory, and personalized medicine, known as P4 medicine.Citation12 The hope is to reduce cost and improve the quality of care. Personalized medicine was involved in more than 25% of novel new drugs approved by the US Food and Drug Administration (FDA) in 2015,Citation13 which shows that personalized medicine is moving toward becoming a substantial component of treatment products.

Research groups have investigated different aspects of personalized medicine, such as diagnosis, prognosis, and pharmacogenomics, through computational approaches or through improving/revising standards and regulations. Many of these research works, such as the “Baseline Study” project by Google Inc., the Cancer Genome Atlas, and the 100 000 Genomes Project (100KGP), are focused on high-throughput genomic analysis to achieve personalized health care by developing computational methods.Citation11,Citation14,Citation15 Genomic mutations can be exploited in the development of drugs that target a protein to treat disease.

By analyzing large amounts of data, Forkan et al showed that there is a trend or pattern in each individual patient's data.Citation16 A use case in this model was used to identify the true abnormal conditions of patients with variations in blood pressure and heart rate. Vidyasagar reviewed machine learning techniques for predicting a drug response and found that there are biomarkers, even some without biological significance, that could predict a drug response.Citation17 Krishnan and Westhead, in a study of the application of machine learning and probabilistic approaches to the prediction of functional effects of single-nucleotide polymorphisms (SNPs), found that machine learning methods could outperform probabilistic methods.Citation18 An integration of clinical variables such as race (white vs nonwhite), intensive care unit (ICU) type (medical vs surgical), sex, and age has been used in developing multivariate logistic regression models to estimate a personalized initial dose of heparin.Citation19 Using these models, investigators observed statistically significant associations between sub- and supratherapeutic activated partial thromboplastin time (aPTT), the aforementioned clinical variables, heparin dose, and sequential organ failure assessment scores (SOFA), with area under the curve (AUG; also called area under a receiver operating characteristics [ROC] curve, a two-dimensional depiction of classifier performance.) of 0.78 and 0.79 respectively.

None of the state-of-the-art big data-driven approaches have reported an accuracy (the ratio between correctly identified/classified samples and the total number of samples) of 100%, and this is probably due to challenges such as missing data, the quality of data, and variations in experimental results addressed in the next section.

Challenges

Besides general challenges inherent to the analysis of big data—such as missing data, erroneous/imprecise data, and heterogeneous data—employing big data in health care systems imposes new challenges, including the lack of reliability and repeatability of some (but by no means all) biological data; issues of privacy, ownership (ie, determining owner(s) of data), and confidentiality; inadequate data from randomized controlled clinical trials; and low quality of data in general.Citation1,Citation17,Citation18 To address the technical challenges, such as missing data and imprecise data, statistical as well as machine learning methods have been investigated.Citation20-Citation26 However, there is no unique solution to these problems; similar to other approaches, the efficacy of statistical and machine learning methods needs to be proven for new medical applications.

Another challenge is disparity in ethnic and socioeconomic status, which results in inequalities in health care; indeed, utilization of “omic” technologies is costly and might not be affordable for resource-poor populations. Integrating molecular pathology, epidemiology, and social sciences could be a strategy to explore health disparities linked to social environments.Citation27 However, any influence on the global health setting from such future studies will only be effected if their results are reflected in political and economic decisions made.

To develop disease-specific models applicable to personalizing therapeutic interventions, we need to incorporate biomarkers (indicators of normal biological processes, pathogenic processes, or pharmacological responses to therapeutic interventionCitation12) from DNA sequencing and improve the quality of data. However, in some diseases, such as cancer, cell heterogeneity in a single tumor makes detection of low-level mutations difficult, and a chemotherapy selected on the basis of specific genetic characteristics of that patient's cancer might be impractical.Citation28 To reveal a correlation between results of DNA studies and disease type, more samples from different cells at different locations would be required, a procedure with low feasibility.Citation28

Another challenge is the lack of knowledge about the human system. From a big data perspective, understanding the functionality of each part of this system needs to be converted to computational models and then integrated with other models of the human body. Understanding the biological networks and molecular processes, and thus the treatment outcome, in neuropsychiatry disorders has been severely hampered by limited access to the brain. Major big data projects such as BRAIN (Brain Research through Advancing Innovative Neurotechnologies), HBP (Human Brain Project), and TVB (The Virtual Brain),Citation10 have been undertaken to enable investigators to fully understand the activity and connectivity of neuronal systems. However, these projects are far from complete, and various aspects of brain functionality may remain unresolved. For instance, understanding placebo effects at the psychological level, as well as in terms of neuroimaging, and neurobiological/physiological changes, is an ongoing and fascinating field of research.

Discussion and conclusion

With technological advances, different research groups and organizations are generating and using increasingly complex and diverse data sets in health care systems. However, as the human system is very complex, a comprehensive model is required in order to achieve P4 medicine. To develop such a model, new sensors, methods, platforms, and unique biomarkers for diagnosis, and therapeutic outcome prediction are required.Citation29 There is still a need for devices and sensors able to provide good quality reports of relevant information on patient health. For instance, no thoroughly validated device for measuring cardiac output is currently available.Citation30 To design a personalized model applicable to P4 medicine, more investment is required toward understanding the human body and relevant correlations so that it can be described with computational models. Moreover, in order to design an accurate model, more studies to investigate the influence of parameters such as environmental factors, family history, and lifestyle on health are warranted. However, this might be particularly challenging in the fields of neurology and psychiatry.

The authors would like to thank Craig Biwer and Samuel Habbo-Gavin for their valuable comments.

REFERENCES

  • AlemayehuD.BergerM.Big data: transforming drug development and health policy decision making. 2016 Mar 5. Epub ahead of print. doi:10.1007/s10742-016-0144-xHealth Serv Outcomes Res Method.
  • BelleA.ThiagarajanR.SoroushmehrSM.NavidiF.BeardD.NajarianK.Big data analytics in healthcare. 2015;2015:370194. doi:10.1155/2015/370194Biomed Res Int.
  • RumsfeldJ.JoyntK.MaddoxT.Big data analytics to improve cardiovascular care: promise and challenges.Nat Rev Cardiol.201613635035927009423
  • KuoMH.ChrimesD.MoaB.HuW.Design and construction of a big data analytics framework for health applications.IEEE/ACM Trans Comput Biol Bioinform.201613354955627295638
  • IstephanS.SiadatMR.Unstructured medical image query using big data -an epilepsy case study.J Biomed Inform.20165921822626707450
  • BonnerS.McGoughAS.KureshiI.et alData quality assessment and anomaly detection via map/reduce and linked data: a case study in the medical domain. Paper presented at: 2015 IEEE International Conference on Big Data (Big Data); October 29-November 1, 2015; Santa Clara, CA, USA.
  • LeeB.JeongE.A design of a Patient-customized healthcare system based on the Hadoop with text mining (PHSHT) for an efficient disease management and prediction.Int J Software Eng Applications.201488131150
  • ZhangS.DongY.ChenX.WangS.Personalized recommendation system on Hadoop and HBase. In: Chen W, Yin G, Zhao G, eds. Big Data Technology and Applications. Singapore; 2016:34-45. Communications in Computer and Information Science; vol 590
  • ChennamsettyH.ChalasaniS.RileyD.Predictive analytics on electronic health records (EHRs) using Hadoop and Hive. Paper presented at: 2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT). March 5-7, 2015; Coimbatore, India.
  • FalconMl.JirsaV.SolodkinA.A new neuroinformatics approach to personalized medicine in neurology.Curr Opin Neurol.201629442943627224088
  • KenslerT.SpiraA.GarberJ.et alTransforming cancer prevention through precision medicine and immune-oncology.Cancer Prev Res (Phila).20169121026744449
  • HoodL.FriendS.Predictive, personalized, preventive, participatory (P4) cancer medicine.Nat Rev Clin Oncol.20118318418721364692
  • NiceEC.From proteomics to personalized medicine: the road ahead.Expert Rev Proteomics.201613434134326905403
  • IbrahimR.PasicM.YousefGM.Omics for personalized medicine: defining the current we swim in.Expert Rev Mol Diagn.201616771972226959799
  • ViciniP.FieldsO.LaiE.et alPrecision medicine in the age of big data: the present and future role of large-scale unbiased sequencing in drug discovery and development.Clin Pharmacol Ther.201599219820726536838
  • ForkanA.KhalilI.IbaidaA.TariZ.BDCaM: big data for context-aware monitoring - a personalized knowledge discovery framework for assisted healthcare.IEEE Trans Cloud Comput.2015991
  • VidyasagarM.Identifying predictive features in drug response using machine learning: opportunities and challenges.Annu Rev Pharmacol Toxicol.201555153425423479
  • KrishnanV.WestheadD.A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function.Bioinformatics.200319172199220914630648
  • GhassemiM.RichterS.EcheI.ChenT.DanzigerJ.CeliL.A data-driven approach to optimized medication dosing: a focus on heparin.Intensive Care Med.20144091332133925091788
  • WangY.ChenR.GhoshJ.et alRubik: knowledge guided tensor factorization and completion for health data analytics.Proc 21th ACM SIGKDD Intl Conference Knowledge Discovery Data Mining; Sydney, Australia; KDD'15.201512651274
  • ZhangZ.FangH.WangH.Multiple imputation based clustering validation (MlV)for big longitudinal trial data with missing values in eHealth.J Med Syst.201640614627126063
  • ÖzdemirV.DoveE.GürsoyU.et alPersonalized medicine beyond genomics: alternative futures in big data—proteomics, environtome and the social proteome. 2015 Dec 8. Epub ahead of print. doi:10.1007/s00702-015-1489-yJ Neural Transrn (Vienna).
  • PriyaM.KumarPR.A novel intelligent approach for predicting atherosclerotic individuals from big data for healthcare.Int J Production Res.2015532475177532
  • LangeK.PappJC.SinsheimerJS.SobelEM.Next-generation statistical genetics: modeling, penalization, and optimization in high-dimensional data.Annu ftet/Sfaf App.201411279300
  • MardaniM.MateosG.GiannakisGB.Subspace learning and imputation for streaming big data matrices and tensors.IEEE Trans Signal Process.2015631026632677
  • JerezJM.MolinaI.Garcia-LaencinaPJ.et alMissing data imputation using statistical and machine learning methods in a real breast cancer problem.Artif Intell Med.201050210511520638252
  • NishiA.MilnerD.GiovannucciE.et alIntegration of molecular pathology, epidemiology and social science for global precision medicine.Expert Rev Mol Diagn.2015161112326636627
  • KruglyakKM.LinE.OngFS.Next-generation sequencing and applications to the diagnosis and treatment of lung cancer.Exp Med Biol.2016890123136
  • ByrlingJ.AnderssonB.Marko-VargaG.AnderssonR.Cholangiocarcinoma - current classification and challenges towards personalised medicine.Scand J Gastroenterol.201651664164326806118
  • JohnsonA.GhassemiM.NematiS.NiehausK.CliftonD.CliffordG.Machine learning and decision support in critical care.Proc IEEE.20161042444466