29
Views
0
CrossRef citations to date
0
Altmetric
Research Paper

Innovative construction of the first reliable catalogue of bovine circular RNAs

ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon show all
Pages 52-74 | Accepted 26 Jun 2024, Published online: 11 Jul 2024

Figures & data

Figure 1. Analytical pipeline used to characterize circRNAs.

The main part of this pipeline (represented in brown) combines the uses of the CircDetector in two steps and several small manual process (represented in green). The input files for the CD are represented in grey frames, while the output files are represented in double brown frames. The first part of this pipeline is managed exclusively by CD, and is shown horizontally at the top. For this detection step, users can select parameters to exclude sporadic circularization events and loci that are too small to be reliable circRNAs. In addition of the main CDdetection output file (detection.bed), CD produces a second file reporting all statistics of STAR mapping. In a second step, CD is able to identify several types of circRNAs. In our approach, we retained three lists provided by CD (exonic circRNAs, intronic circRNAs and a list of sub-exonic circRNAs deriving from genes identified in the gtf_file as small non-coding (snc) RNA). We defined other_circRNAs as those not included in any of these three lists. The lists of circRNAs retained after manual verification are shown in green rectangles. The circRNAs excluded by this manual curation do not join the other_circRNAs list, but are declassified (symbolized by a trashcan). For more details and examples, see M&M_Adoc-1-4.
The source code of the CD is available from https://github.com/GenEpi-GenPhySE/circRNA.git.
Mini-fastq files were constructed from the Chimeric.out.junction files derived from the STAR-SE alignments (top left, in pink). These files represent in-silico enrichment of reads spanning a circular junction. The analyses with two other tools (represented by yellow cylinders) were integrated into the pipeline: CIRCexplorer-2 (CE2, top) and CIRI2 (top right).
Figure 1. Analytical pipeline used to characterize circRNAs.

Figure 2. Overview of circRNAs detected in the 117 samples considered.

(A) circRNAs detected in 117T (symbolized by the green frame) (A1) 61,083 circRNAs were retained after the characterization by CD (with a minimum genomic size of 70 nt and whose presence has been attested by at least 5 reads supporting the circular junction in at least one sample). After the examination of the annotation suggested by CD, we put in a new category, other_circRNAs 36,215 circRNAs for further analysis symbolized by the orange rectangle). (A2) We retained 23,926 exonic circRNAs (blue disk), 191 ciRNAs, 146 intron circles and 108 sub-exonic circRNAs from snc genes (represented by three black discs). (B) Distribution of the 61083 circRNAs by type. (C) Number of circular RNAs. The first histograms at the left concern the full-virtual sample, named 117T. The other seven histograms consider data from six individual samples from six different tissues. NN=neonate. The three deeply sequenced samples are marked with a green label ‘XL sequencing’ above the histograms. (C1) Number of circRNAs validated by the detection of 5 CCRs in the considered sample. (C2) Number of circRNAs validated per million reads uniquely mapped. (D) Distribution of expression based on circRNA type. (E) Distribution of the 6,982 non-redundant circRNAs by type.
Figure 2. Overview of circRNAs detected in the 117 samples considered.

Figure 3. Analysis of circRNAs detected in mRNAseq.

(A) Among the circRNAs detected in the 63 total-RNAseq dataset (63T, symbolized by the purple frame), we recognized 17,025 exonic circRNAs, 194 intronic circRNAs, and 17,956 other_circRNAs already identified in the 117T datasets. (B) In the 63 mRNAseq dataset (63 m), 4,579 circRNAs were detected (they were represented by three red triangles), of which 2,901 (63.4%) had never been described before, i.e. identified in 118T. Neither miscellaneous circRNAs were detected in 63 m (represented by three black discs. Among the 4,341 other_circRNAs identified in 63 m, 2,812 are novel. Among the 86 exonic circRNAs identified in 63 m and already detected in 117T, 10 had not been detected in 63T.
Figure 3. Analysis of circRNAs detected in mRNAseq.

Figure 4. Analysis of circRNAs detected by CIRI2.

(A) CircRNAs detected by CIRI2 in 117T were represented by a green circle. Those have already been detected and annotated by CD were highlighted by a bleu rectangle (CD-exonic circRNAs) and by an orange rectangle (CD-other_circRNAs). (B) Among the CIRI-circRNAs from 117T, we identified 20,531 exonic circRNAs already identified by CD, 15 miscellaneous circRNAs (represented by three black discs) and 10,081 CIRI-other_circRNAs. (C) 1,560 CIRI-circRNAs were detected* in 63 m (represented by a red triangle) (D) 707 CIRI-circRNAs were detected* and validated** in 63 m (represented by two red triangles corresponding to exonic circRNA and other-circRNAs respectively). No miscellaneous circRNAs detected in 63 m (represented by one black disc).
*CIRI2 retained a circRNA ‘as detected’ when at least two reads spanning the circular junction in at least one individual dataset. ** for the circRNAs detected in 63 m by CIRI2, we considered as ‘validated’ only those detected by at least five reads spanning the circular junction in at least one individual dataset.
Figure 4. Analysis of circRNAs detected by CIRI2.

Figure 5. Analyses of the possible presence of 23,737 exonic circRNAs in bovine tissues/samples.

All available samples for 15 + 3 tissues were considered in the left part and the 117 samples for the box plot shown in the right part. (A) and (B) represent a number of exonic circRNAs per million of reads uniquely mapped. (C) is dedicated to the observed expression, which is a number of CCRs per million of reads uniquely mapped. We defined ‘notable expression’ as expression above 0.05. To make these 3 diagrams easier to read, they are also available in large format in Res_Adoc-6.
The three tissue samples from neonate animals (jejunum-female, rumen-male, pancreas-male) that were sequenced at great depth are indistinguishable from the others.
Figure 5. Analyses of the possible presence of 23,737 exonic circRNAs in bovine tissues/samples.

Figure 6. Expression analysis of 23,737 exonic circRNAs in four samples.

The expression of a circRNA is defined as the number of CCRs per million of reads uniquely mapped. (A) Transcriptome composition and comparison of the four samples. (B) Schematic representation of four individual transcriptomes at the same scale. Other analyses concerning the jejunum neonate female and the adult testis are shown in .
Figure 6. Expression analysis of 23,737 exonic circRNAs in four samples.

Figure 7. Benchmarking of 23,737 reliable exonic circRNAs and additional exonic circRNAs.

(A) A list of 23,737 reliable exonic circRNAs was established and was extensively benchmarked. All were validated by CD (at least 5 reads spanning the circular junction and in at least one sample among the 117 analysed). (A1) Details of this benchmarking. To complete the analyses performed in this study, we used the list of exonic circRNAs validated with CE2+CIRI2 published in 2021 [Citation27]. (A2) Histograms comparing the composition of the list of 23,737 reliable exonic circRNAs, the list of 189 unreliable exonic circRNAs and the panel Top-150 with 1,749 reliable exonic circRNAs.
(B) A second list of exonic circRNAs was constructed by merging (1) The 3,830 exonic circRNAs validated by CIRI2 in 117T or those validated by CE2 in MFQ117 and not found in the list of those validated by CD (23,737 reliable + 189 non-reliable), (2) The 3,834 exonic circRNAs validated by CE2+CIRI2 in a study involving 33 samples from three tissues published in 2021 and not found in the first list. (B1) When the classical threshold of 5CCRs was applied for CE2 data (number of validated exonic circRNAs was 15,075). (B2) We considered 10 CCRs to be a more appropriate threshold for CE2 and the number of validated exonic circRNAs was 5,756. This led to the proposal of an additional list of 9,206 exonic circRNAs.
Figure 7. Benchmarking of 23,737 reliable exonic circRNAs and additional exonic circRNAs.

Figure 8. Hierarchical clustering analysis (HCA).

This HCA was built using the ‘ward’ agglomeration method and Pearson correlations as distance on the expression of 1,749 exonic circRNAs (panel top-150, composition in Ext_Atab-6) in 96 samples. Each sample was labelled with a name composed as ‘tissue-age-sex’ where age = N (neonate) or J (juvenile) or A (adult). When the clustering corresponds exactly at the expected (by tissue) the corresponding tissue was underlined in green (5 or 6 animals) or in yellow (2 or 4 animals).
Figure 8. Hierarchical clustering analysis (HCA).

Figure 9. Principal component analyses (PCA).

Both PCA were built on the expression of 1,749 exonic circRNAs (panel top-150). The plots show the individual factor maps, dimensions 1 and 2 on the left and dimensions 3 and 4 on the right. The readability of the labels on these plots has been manually improved. Samples from neonates were labelled -N, and -Nm- or -Nf when sex precision was useful. Samples from juveniles were symbolized by -J, and by -Jmc- or -Jf when the precision of the sex is useful (castrated male and female). Samples from adults were denoted -A, and -Am- or -Af when sex precision is useful. (A) Six tissues were considered: uterus (UT), uterine horn (Uh), ovary (OV), adrenal gland (AD), pituitary gland (PG), and testis (Tes). (B) Only samples from five tissues were considered (the six samples from PG were removed).
Figure 9. Principal component analyses (PCA).

Figure 10. List and characteristics of different events leading to the formation of a circular junction.

Six (hypothetical) events leading to the identification of artificial circRNA are listed on an orange or yellow background. Backsplicing leading to exonic circRNA is described on a green background.
Additional information: (1) The transcript containing the ‘circular junction’ exists but is not circular. (2) The transcript containing the circular junction is not present. (3) The cDNA containing the circular junction is not present. (4) The transcript containing the circular junction is present and is circular. (5) In addition, the junction may have been created after RNase-R action.
Figure 10. List and characteristics of different events leading to the formation of a circular junction.
Supplemental material

Data availability statement

All data obtained concerning exonic and intronic circRNAs are available in several tables (all Ext_Atab) deposited in Dataverse repository (doi: 10.57745/XORQHK). The list of other_circRNAs is not available, as we were unable to distinguish between reliable and unreliable other_circRNAs. List of exons generated by the BovReg consortium and used in this study is available on upon request from CC. Datasets generated by the BovReg consortium and analysed during the current study are listed in Atab-1. We built 117 paired mini-fastq files (R1 and R2) to provide all sequences allowing a global and rapid characterization of circular RNAs present in bovine tissues (doi: 10.57745/IUJ40P).