1,412
Views
0
CrossRef citations to date
0
Altmetric
Research Paper

Whole human genome 5’-mC methylation analysis using long read nanopore sequencing

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 1961-1975 | Received 22 Oct 2021, Accepted 29 Jun 2022, Published online: 20 Jul 2022

Figures & data

Figure 1. Diagrammatic representation of the general workflow used for benchmarking 5’-mC methylation values obtained by nanopore sequencing versus 450k microarray, DREAM, and RRBS. The numbers indicate the total number of CpG sites available at each step within the workflow. The analysis tools (liftOver[Citation29], Bedtools[Citation30], corrr[Citation31], corrgram[Citation32], AnnotatR[Citation35], VennDiagram[Citation36], and ggplot2[Citation37]) used in the workflow are represented by grey boxes.

Figure 1. Diagrammatic representation of the general workflow used for benchmarking 5’-mC methylation values obtained by nanopore sequencing versus 450k microarray, DREAM, and RRBS. The numbers indicate the total number of CpG sites available at each step within the workflow. The analysis tools (liftOver[Citation29], Bedtools[Citation30], corrr[Citation31], corrgram[Citation32], AnnotatR[Citation35], VennDiagram[Citation36], and ggplot2[Citation37]) used in the workflow are represented by grey boxes.

Figure 2. Distribution of methylation values obtained in nanopore sequencing, 450k microarray and DREAM, according to the CpG genomic annotations. a, nanopore_ngmlr; b, nanopore_minimap2; c, 450k microarray; d, DREAM. In CpG islands the majority of interrogated CpGs are unmethylated. In CpG shores, there is a balanced distribution between unmethylated and methylated CpG sites, whereas in CpG shelves and inter-CGI regions there is a predominance of methylated CpG sites. Plots were generated using package ggplot2[Citation37] in R.

Figure 2. Distribution of methylation values obtained in nanopore sequencing, 450k microarray and DREAM, according to the CpG genomic annotations. a, nanopore_ngmlr; b, nanopore_minimap2; c, 450k microarray; d, DREAM. In CpG islands the majority of interrogated CpGs are unmethylated. In CpG shores, there is a balanced distribution between unmethylated and methylated CpG sites, whereas in CpG shelves and inter-CGI regions there is a predominance of methylated CpG sites. Plots were generated using package ggplot2[Citation37] in R.

Figure 3. Correlation analysis of methylation values between nanopore sequencing, DREAM and 450k microarray. The CpG sites interrogated by nanopore_ngmlr, nanopore_minimap2, 450k microarray, and DREAM were intersected pairwise as well as in a serial manner to generate a set of 5416 CpG sites common to all datasets. Pearson correlations were determined for each pair of methylation datasets. The p-value obtained in each correlation was <2.2e-16. a, Venn diagram showing the number of CpG sites resulting from the intersection of the various datasets. b, Heat plot indicating Pearson correlation values (left) and the corresponding number of CpGs (right) for pairwise comparisons of distinct datasets. c, Heat plot indicating Pearson correlation values between pairwise datasets for the same set of overlapping CpG sites (5416). The plots were generated using packages VennDiagram[Citation36] (v1.6.20) and ggplot2[Citation37] in R.

Figure 3. Correlation analysis of methylation values between nanopore sequencing, DREAM and 450k microarray. The CpG sites interrogated by nanopore_ngmlr, nanopore_minimap2, 450k microarray, and DREAM were intersected pairwise as well as in a serial manner to generate a set of 5416 CpG sites common to all datasets. Pearson correlations were determined for each pair of methylation datasets. The p-value obtained in each correlation was <2.2e-16. a, Venn diagram showing the number of CpG sites resulting from the intersection of the various datasets. b, Heat plot indicating Pearson correlation values (left) and the corresponding number of CpGs (right) for pairwise comparisons of distinct datasets. c, Heat plot indicating Pearson correlation values between pairwise datasets for the same set of overlapping CpG sites (5416). The plots were generated using packages VennDiagram[Citation36] (v1.6.20) and ggplot2[Citation37] in R.

Figure 4. Correlation analysis of methylation frequencies between nanopore sequencing and 450k microarray resulting from the overlapping CpG sites within different genomic regions of the hg38 human genome reference sequence. Pearson correlation values and corresponding plots for four different genomic contexts are presented. The plots were generated using package corrgram[Citation32] in R.

Figure 4. Correlation analysis of methylation frequencies between nanopore sequencing and 450k microarray resulting from the overlapping CpG sites within different genomic regions of the hg38 human genome reference sequence. Pearson correlation values and corresponding plots for four different genomic contexts are presented. The plots were generated using package corrgram[Citation32] in R.
Supplemental material

Supplemental Material

Download Zip (4.4 MB)

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.