1,150
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Using Saliva Epigenetic Data to Develop and Validate a Multivariable Predictor of Esophageal Cancer Status

, , , ORCID Icon, , , , , ORCID Icon, & ORCID Icon show all, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , & show all
Pages 109-125 | Received 10 Jul 2023, Accepted 04 Jan 2024, Published online: 16 Jan 2024

Figures & data

Figure 1. CONSORT diagram.
Figure 1. CONSORT diagram.
Figure 2. Plot of Kolmogorov–Smirnoff distance metric for biological duplicates showing distances between duplicates (x-axis) and distances from each array to a summarized average array (y-axis).
Figure 2. Plot of Kolmogorov–Smirnoff distance metric for biological duplicates showing distances between duplicates (x-axis) and distances from each array to a summarized average array (y-axis).

Table 1. Risk stratification of patients and controls in the discovery, testing and independent validation datasets.

Figure 3. Heat map plot of p-values of first ten principal component analysis components (A) before and (B) after plate-fitted residual correction for experiment batch, array row, age, sex, disease diagnosis and epithelial cell content.

The components were statistically tested against experiment, array plate row, age, sex, disease status and epithelial cell content. The plate-fitted residuals show p = 1.00. Blue = low p-value; red = high p-value.

Figure 3. Heat map plot of p-values of first ten principal component analysis components (A) before and (B) after plate-fitted residual correction for experiment batch, array row, age, sex, disease diagnosis and epithelial cell content.The components were statistically tested against experiment, array plate row, age, sex, disease status and epithelial cell content. The plate-fitted residuals show p = 1.00. Blue = low p-value; red = high p-value.
Figure 4. Module membership after repeated classification with modification of five samples (3%) of the discovery cohort.
Figure 4. Module membership after repeated classification with modification of five samples (3%) of the discovery cohort.

Table 2. Area under the curve for training set and the two holdout sets.

Figure 5. Receiver operating characteristic curve of the most successful classifiers for the discovery set, testing set and validation dataset.
Figure 5. Receiver operating characteristic curve of the most successful classifiers for the discovery set, testing set and validation dataset.

Table 3. Area under the curve using only probes and only covariates of age and sex.

Supplemental material

Supplemental Figure 1

Download PNG Image (555.5 KB)

Supplemental Figure 2

Download PNG Image (487.3 KB)

Supplemental Table 1

Download MS Word (13.6 KB)

Data sharing statement

Raw data, processed data and accompanying metadata have been deposited to the Gene Expression Omnibus (GEO) database under the accession code GSE232332.