51
Views
0
CrossRef citations to date
0
Altmetric
ORIGINAL RESEARCH

Ensemble Learning for Higher Diagnostic Precision in Schizophrenia Using Peripheral Blood Gene Expression Profile

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon show all
Pages 923-936 | Received 09 Nov 2023, Accepted 12 Mar 2024, Published online: 02 May 2024
 

Abstract

Introduction

Stigma contributes to a significant part of the burden of schizophrenia (SCZ), therefore reducing false positives from the diagnosis would be liberating for the individuals with SCZ and desirable for the clinicians. The stigmatization associated with schizophrenia advocates the need for high-precision diagnosis. In this study, we present an ensemble learning-based approach for high-precision diagnosis of SCZ using peripheral blood gene expression profiles.

Methodology

The machine learning (ML) models, support vector machines (SVM), and prediction analysis for microarrays (PAM) were developed using differentially expressed genes (DEGs) as features. The SCZ samples were classified based on a voting ensemble classifier of SVM and PAM. Further, microarray-based learning was used to classify RNA sequencing (RNA-Seq) samples from our case-control study (Pune-SCZ) to assess cross-platform compatibility.

Results

Ensemble learning using ML models resulted in a significantly higher precision of 80.41% (SD: 0.04) when compared to the individual models (SVM-radial: 71.69%, SD: 0.04 and PAM 77.20%, SD: 0.02). The RNA sequencing samples from our case-control study (Pune-SCZ) resulted in a moderate precision (59.92%, SD: 0.05). The feature genes used for model building were enriched for biological processes such as response to stress, regulation of the immune system, and metabolism of organic nitrogen compounds. The network analysis identified RBX1, CUL4B, DDB1, PRPF19, and COPS4 as hub genes.

Conclusion

In summary, this study developed robust models for higher diagnostic precision in psychiatric disorders. Future efforts will be directed towards multi-omic integration and developing “explainable” diagnostic models.

Graphical Abstract

Abbreviations

ANOVA, Analysis of Variance; AUROC, Area under the Receiver Operating Characteristic curve; CNT, Healthy controls; CPM, Counts per Million; DEG, Differentially Expressed Genes; DGEA, Differential Gene Expression Analysis; DSM-5, Diagnostic and Statistical Manual of Mental Disorders, 5th Edition; GEO, Gene Expression Omnibus; HISAT2, Hierarchical Indexing for Spliced Alignment of Transcripts limma, linear models for microarray data; MCC, Maximal Clique Centrality; ML, Machine learning; PAM, Prediction Analysis of Microarrays; PANSS, Positive And Negative Syndrome Scale; PAST, PAleontological STatistics; PBMCs, Peripheral blood mononuclear cells; PPI, Protein-Protein Interaction; RC, Raw Counts; SCID-5-RV: Structured Clinical Interview for the DSM-5 Research Version; SCZ, Schizophrenia; SD, Standard Deviation; STRING: Search Tool for the Retrieval of Interacting Genes/proteins; SVM, Support Vector Machine; TPM, Transcripts per Million.

Data Sharing Statement

The datasets generated for this study can be found in INDA-CA, INCARP000275.

The R scripts used for the analysis are available on GitHub. (https://github.com/macdlab/2023_VW_SCZ_Ensemble).

Compliance with Ethical Standards

Two independent ethical committees approved the study protocol: the KEM Hospital Research Centre Ethics Committee (KEMHRC ID No. 2001) and Symbiosis International (Deemed University) Independent Ethics Committee (SIU/IEC/99). Written informed consent was obtained from all participants. For participants with schizophrenia, written informed consent was supported by written informed consent of a spouse or a first-degree relative aged 18 and above. Parents, siblings, and children were considered first-degree relatives. Clinical interviews were administered by a trained psychiatrist and a psychologist in private. The diagnosis was confirmed by a senior psychiatrist. The identity of the participants was protected by using a unique identification number. The data collected in the study are securely stored with restricted access. Any data sharing with other researchers will prioritize participant confidentiality, ensuring identities remain undisclosed. All the participants were compensated for their travel and time.

Acknowledgments

We sincerely thank all the participants, their parents, relatives, and caretakers for their time and generous participation in making this project possible. We thank Deepa Raut, phlebotomist KEMHRC for her valuable contribution to the blood collection process. We would also like to thank Paul Tooney (Associate professor, New Castle University, Australia) for sharing data on request. VVW thanks UGC, New Delhi for the research fellowship.

Author Contributions

All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.

Disclosure

The authors state that there is no conflict of interest.

Additional information

Funding

The study was funded by an intramural research grant (MjRP/19-20/1516) from Symbiosis Centre for Research & Innovation (SCRI), SIU, Pune, India.