685
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Exome-wide comparative analyses revealed differentiating genomic regions for performance traits in Indian native buffaloes

, ORCID Icon, , , , & show all

Abstract

In India, 20 breeds of buffalo have been identified and registered, yet limited studies have been conducted to explore the performance potential of these breeds, especially in the Indian native breeds. This study is a maiden attempt to delineate the important variants and unique genes through exome sequencing for milk yield, milk composition, fertility, and adaptation traits in Indian local breeds of buffalo. In the present study, whole exome sequencing was performed on Chhattisgarhi (n = 3), Chilika (n = 4), Gojri (n = 3), and Murrah (n = 4) buffalo breeds and after stringent quality control, 4333, 6829, 4130, and 4854 InDels were revealed, respectively. Exome-wide FST along 100-kb sliding windows detected 27, 98, 38, and 35 outlier windows in Chhattisgarhi, Chilika, Gojri, and Murrah, respectively. The comparative exome analysis of InDels and subsequent gene ontology revealed unique breed specific genes for milk yield (CAMSAP3), milk composition (CLCN1, NUDT3), fertility (PTGER3) and adaptation (KCNA3, TH) traits. Study provides insight into mechanism of how these breeds have evolved under natural selection, the impact of these events on their respective genomes, and their importance in maintaining purity of these breeds for the traits under study. Additionally, this result will underwrite to the genetic acquaintance of these breeds for breeding application, and in understanding of evolution of these Indian local breeds.

Introduction

In India, there are 20 distinct registered buffalo breeds (Bubalus bubalis) flourishing under many agro-ecological conditions with various breeding, and management techniques.Citation1 However, they probably vary in their agro-economic traits and adaptation to local environments. These breeds are settled in not so salubrious conditions under tropical and subtropical parts of the country experiencing hot and humid climatic conditions. These buffaloes catered to variety of specialized uses, including dairy, draft, meat, and manure.Citation2 Each unique breed is likely to harbor genetic variants specific to above functions and to adaptation. These could be a valuable genetic resource for locating genes that are sustained under long-term natural selection.Citation3–5 Such a diverse germplasm that has undergone an extensive adaptation to a wide range of agro-climatic conditions. This is believed to be result of accumulation of advantageous mutations through natural selection and survival of fittest. Exome-wide comparative analysis can be utilized to comprehend how these spontaneous mutations have modified the genetic structure of these populations over a long term.

Indian local and unique germplasm like Chhattisgarhi, Gojri, and Chilika breeds from India have not yet been explored fully. To date, only a few studies are reported in Murrah breed such as identification of genes for lactation and fertility traits using GWAS (ANKRD44, CCSER1, GRIA3, etc),Citation6 several dairy candidate genes identification using ddRAD sequencing (PLCE1 and CACNB2),Citation7 and selection signature for milk production and immunity using ddRAD sequencing (SLC37A1, PDE9A, etc).Citation8 Few studies in Chilika, Gojri and Chhattisgarhi buffalo have been done earlier by our group namely, comparative expression profiling studies of heat stress genes (GLUT-1 and HSP60) in Chilika buffalo,Citation9 polymorphic study on Exon-40 of FASN gene in Gojri and Chhattisgarhi.Citation10

However, these native breeds have been reported to be strikingly different. This could be due to the difference in their utility and might have been selected accordingly. The Chhattisgarhi breed exhibits dual-purpose characteristics, evincing a daily yield of 4 kg of milk in hot and humid conditions, withstanding a low-input system. Nevertheless, in females there is delayed age at first calving that varies from 60 to 72 months.Citation11 The "Bathaan system" is practiced during the paddy sowing period, in their breeding tract. Gojri is a multi-utility and semi-migratory breed that is primarily raised by the Gujjar community of Punjab and Himachal Pradesh. They are uniquely adapted to grazing and agricultural operations on hilly terrain due to their sure footing.Citation12 The Chilika buffalo, an indigenous and insular breed, exhibits significant adaptability to brackish water and is geographically secluded.Citation13 Reared under extensive management systems, these dual-purpose animals serve as a vital source of subsistence for farmers in the region, owing to their capacity to endure high temperatures and humidity. Average milk yield of Chilika buffalo is between 2-6 liters per day. The dairy products derived from their milk boast a commendable shelf life.Citation14 Recent advances in next generation sequencing (NGS) technology and sequence analysis tools have allowed sequencing to become a feasible tool to explore potential variants underlying the above phenotypic traits in livestock populations.

Fast and inexpensive sequencing of exome and genome analysis makes population based genetic study an attractive option. These new techniques enable the investigator to evaluate genome-wide variation underlying performance attributes in buffalo populations. Further these tools aid in comparing the genetic makeup of these populations and cataloguing the change in frequencies of the selected alleles adapted to diverse climatic zones.Citation15 Estimating the degree of genetic differentiation among populations would help to understand the genome and to make inferences about their evolutionary history which is also one of the objectives of population genetic studies.Citation16,Citation17 Population differentiation probably occurs due to long-term natural selection, migration, mutation, and genetic drift in a population offering them unique genetic structure. The most widely used population genetic statistical tools includes admixture analysis, principal components analysis (PCA),Citation18 fixation index (FST), nucleotide diversity (π) and Tajima’s D (TD) by utilizing the commonly used bioinformatical software.

One of the many sequencing tools, whole exome sequencing technique aims solitary on highly enriched exome part of the genome. Exome is best-characterized region that correlates to phenotypes as it explores highly interpretable allelic variants underlying the economic traits. According to various studies it can be concluded that exome data set is most suitable to execute evolutionary studies, as both recent and ancient genetic differences, change in allele frequencies, and identification of the functional variant or gene responsible for adaptation could be marked in a population.Citation19–25 Functional variants like InDels can be precisely extracted from exome data set and are considered second best after single nucleotide polymorphisms-based data.Citation26

As far as we are aware, no comprehensive whole exome analysis has been done so far to scrutinize the genetic structure and population differentiation in these local buffalo germplasms of India. It is hypothesized that the meticulous exploration of exome will uncover the large number of variants and candidate gene that have been naturally selected for their performance. This present study was conducted to explore population structure and differentiation under selection and to further explore insights into the biological mechanisms underlying these attributes specific to Chhattisgarhi, Chilika, Gojri, and Murrah breeds of buffalo using whole exome based sequencing technique.

Material and methods

Sample collection and whole-exome sequencing

In the present study, a total of four rare buffalo germplasm were taken, namely Chhattisgarhi (n = 3), Chilika (n = 4), Gojri (n = 3), and Murrah (n = 4) from their respective native breeding tracts. According to their geographical distribution, breeds under study are located into four different regions namely, central hilly and plateau region of Chhattisgarh (Chhattisgarhi), eastern-costal region of Odisha (Chilika), northern sub-Himalayan region (Gojri), and northern plain region of parts of Haryana (Murrah). For each breed, sampling was done ensuring that entire breeding tract area was covered. Pre-capture libraries were created from whole genomic DNA from 14 blood samples of buffalo by utilizing All Exon Bovine ((Agilent SureSelect (catalog no. 5067- 5582)) capture kit according to the manufacturer’s instructions. In-solution capture method was used for genomic DNA fragmentation, template preparation and exome capture. 150 × 2 bp reads were generated using Illumina NovaSeq 6000 platform.

Alignment statistic and variant calling

Raw data files were quality checked in FastQC for each sample and sequence reads (FASTQ files) were later trimmed for adapters. Then aligned to the reference genome Bubalus bubalis (accession ID GCF_019923935.1 (www.ncbi.nlm.nih.gov)) using Burrows-Wheeler Aligner tool.Citation27 Samtools were utilized to create BAM filesCitation28 thereby duplicate reads were removed using Samblaster.Citation29 Per-base depth of coverage in the whole exome apprehended areas in each alignment file was assessed using depth of coverage utility of the Genome Analysis Toolkit (GATK).Citation30

Base Quality Score Recalibration (BQSR) was carried out prior to variant calling using GATK’s BaseRecalibrator. GATK’s AnlyzeCovariates tool was used for recalibration around the InDels. Then, Haplotype Caller tool of GATK was used for calling the InDels under the usual recommendations for whole exome sequencing.Citation31 CNNScoreVariants was used to score the filtered variants. We chose the InDels that met our quality control requirements, which included a call rate of more than 99% and a HWE with a P value more than 10−4.

Linkage disequilibrium pruning of datasets and population structure analysis

By applying –indep-pairwise algorithm of plink (version 1.9),Citation32 Linkage Disequilibrium (LD) pruning was implemented that filtered the LD areas having r2 values more than 0.1. It was performed using multiple sliding window widths to remove related markers to find the optimal parameters (the suggested admixture procedure sliding window size is 50 kb, and an extended 10,000 kb) for STRUCTURE and PCA analysis. For 10,000 kb window, r2 threshold of 0.1 to allow pruning of InDels was applied.

Admixture analysis was performed using STRUCTURE software (v2.3.4), that clusters using Bayesian approachCitation33 in order to investigate admixture between the populations. After 20,000 burn-ins for 10 iterations, 30,000 MCMC runs were carried out, and the most frequent representation was chosen as the final instance of the population structure. PCA was utilized to investigate the genetic connection between studied breeds of buffaloes. PCA was performed using PLINK v1.9. For each population, the LD pruned data was used for PCA analysis. PCA plots were visualized by the ggplot2 package in R software.Citation34

Population genetic analysis

The average number of nucleotide changes per site on a population’s chromosomes is measured as nucleotide diversity. π = ∑ijxixjπij; where i,j €{1, …, n}, and n is the total number of sample sequences. Additionally, xi and xj stand for the ith and jth sequences respective frequencies, while πij stands for the amount of nucleotide changes per nucleotide site between the ith and jth sequences. π of each breed over the entire sequence was performed using VCFtools (in windows of 100 kb)Citation35 and visualized using R software. To test the significance of the nucleotide diversity parameter for each of the population, one-way ANOVA (analysis of variance) was implemented. There is comparison between average number of pairwise nucleotide differences (θT = π) with the total number of segregating sites (θw = S/an), such that D = (θT - θw)/√Var (θT - θw), is known as Tajima’s D statistic.Citation36 Tajima’s D analysis was performed using VCFtools, with the parameter “–Tajima’s D” (in windows of 100 kb).Citation37

Using exome-wide FST values, genetic differentiation analyses across populations

Finally, examination of population differentiation by estimating exome-wide FST, using Weir and Cockerham method (1984), which is based on Wright’s F-statistics.Citation38,Citation39 FST is a descriptive statistic, extensively applied, and is executed to estimate genetic differentiation between populations under study.Citation2,Citation3,Citation40 Previously, the weighted FST values were calculated for each variant using VCFtools with the parameter “–weir-fst-pop group1 –weir-fst-pop group2”.Citation41 Here, group1 indicates the evaluation of FST for the particular population and group 2 comprises of remaining three breeds population. For each comparison, FST values were then averaged over sliding window of 100-kb along with step size of 25-kb. To detect accurately highly differentiating loci, FST values were Z-transformed as follows ZFST = ((FST - µFST)/ΣFST). Finally, the top 0.5% of windows with extraordinarily high FST values (top 0.5% of ZFST distributions) were candidate selection zones that probably subject to natural selection.

Gene annotation, enrichment, and network analysis

Based on ZFST values, a gene was considered to have evidence of selection if it spanned an outlier genomic window. Genome annotation was done using NCBI’s Genome Data Viewer. Gene enrichment analysis was performed using PANTHER v.14.1.Citation42 Examination of genes that were enriched under functional categories of GO-slim Biological Process term (BP) and panther pathways as being overrepresented by PANTHER using fisher’s exact test with false discovery rate correction at 5%. Gene network analysis was also used as a corresponding method to investigate how these reported genes were connected functionally by using STRING v11.Citation43

Results

Sample collection, exome sequencing, and variant calling

Exome sequencing produced approximate 15 to 21 Gbp/sample of paired-end raw reads with a length of 159-bp from four buffalo breeds. Around 45.5–85.5 million clean paired-end reads were attained per sample per breed post quality control criteria (). Almost all the clean reads were mapped to the most recent buffalo reference genome (Bubalus bubalis (GCF_019923935.1)), which was around 98.50% of the total ().

Table 1. Summary of sequence data of all four buffalo breeds.

Table 2. Summary of sequencing and mapping results of all four buffalo breeds.

Approximately, 63327 in Chhattisgarhi, 115765 in Chilika, 57371 in Gojri and 105306 in Murrah breed InDels were extracted. Based on the stringent thresholds, 4333, 6829, 4130, and 4854 InDels were retained for further processing in Chhattisgarhi, Chilika, Gojri, and Murrah breeds, respectively. The largest number (3878) of unique InDels was detected in the Chilika breed and Gojri breed displayed the lowest number (1457) (). The length of InDels ranged from −135 bp (deletion) to +130 bp (insertion) in Chhattisgarhi, −104 bp (deletion) to +137 bp (insertion) in Chilika, −116 bp (deletion) to +127 bp (insertion) in Gojri and −90 bp (deletion) to +104 bp (insertion) in Murrah breeds. However, maximum number of the identified InDels have the lengths less than 4 bp. Strikingly, high probability of detection of InDels was present on several chromosomes, such as on chromosomes 2, 14, and 17 in Chhattisgarhi, on chromosomes 4,5,6 and 23 in Gojri, on chromosomes 3, 4,5,9 and 19 in Murrah, and on chromosomes 6, 9, 17 and 18 in Chilika breed ().

Figure 1. (a) Venn diagram depicting common and unique InDels in all four breeds of buffalo. (b) Exome-wide probability of InDels detection in four breeds of buffalo. The content of the buffalo reference genome sequence was represented by the outermost circle. Proceeding from outer to inner of the circle showed the exome-wide spreading of InDels in Chhattisgarhi, Gojri, Murrah, and Chilika, respectively.

Figure 1. (a) Venn diagram depicting common and unique InDels in all four breeds of buffalo. (b) Exome-wide probability of InDels detection in four breeds of buffalo. The content of the buffalo reference genome sequence was represented by the outermost circle. Proceeding from outer to inner of the circle showed the exome-wide spreading of InDels in Chhattisgarhi, Gojri, Murrah, and Chilika, respectively.

LD Pruning of datasets and population structure analysis

Around 442, 769, 420, and 542 InDels were retained in Chhattisgarhi, Chilika, Gojri and Murrah breeds post LD pruning. A maximum of three populations were proposed by the number of potential sub-populations assessed using the Evanno K technique, where modal value of delta K at true K was missing (). Genetic structure of all the four breeds was inferred by STRUCTURE analysis (). At K = 2, Chilika breed clustered separately, while rest of three populations were grouped into single population. At K = 3, Murrah clustered separately from the other two breeds that were grouped as single population. At K = 4, all four populations were genetically distinct and clustered into four different breeds without any admixture. PCA result illustrates a more scattered pattern of all four breeds. The first eigenvector (PC1) explained 21.8% and subsequently PC2 and PC3 explained 18.2% and 16.2% of the total variance, respectively ().

Figure 2. Population structure and relationships of Chhattisgarhi, Chilika, Gojri and Murrah buffaloes. (a) The optimal number of clusters present in the analyzed buffalo germplasms was estimated using the Evanno technique employing estimates of the number of sub-populations (K) using various statistics. (b) Buffalo breeds model-based clustering with K = 2 to 4 using the STRUCTURE programme. (c) Using principal component analysis (PCA), the comparison of PC1 and PC2 and PC3 is shown.

Figure 2. Population structure and relationships of Chhattisgarhi, Chilika, Gojri and Murrah buffaloes. (a) The optimal number of clusters present in the analyzed buffalo germplasms was estimated using the Evanno technique employing estimates of the number of sub-populations (K) using various statistics. (b) Buffalo breeds model-based clustering with K = 2 to 4 using the STRUCTURE programme. (c) Using principal component analysis (PCA), the comparison of PC1 and PC2 and PC3 is shown.

Population genetic analysis

Average nucleotide diversity of Chilika, Chhattisgarhi, and Murrah was significantly higher in comparison to Gojri population (p < 0.01) (). However, the average nucleotide diversity of Chilika, Chhattisgarhi and Murrah was significantly similar (p < 0.01) (). Chromosome-wise analysis depict maximum nucleotide diversity on chromosome number 18 and 23 in Chhattisgarhi, on chromosome number 18 in Chilika, on chromosome number 12 and 18 in the Gojri breed, and chromosome number 21 in Murrah. Mean values of TD were around zero for all four breeds () suggesting populations were under slight balancing selection. Chilika buffalo showed more positive value (TD = 0.267374), followed by Gojri (TD = 0.25392), subsequently by Chhattisgarhi (TD = 0.22114) and least showed by Murrah (TD = 0.21571).

Figure 3. Summary statistics for population genetics. (a) Histogram showing frequency of different nucleotide diversity values throughout chromosome. Summary statistics for population genetics. (b) Box plots displaying distribution of nucleotide diversity, measured exome-wide across 100-kb windows in all four breeds. (c) Histogram showing frequency of different Tajima’s D values throughout exome.

Figure 3. Summary statistics for population genetics. (a) Histogram showing frequency of different nucleotide diversity values throughout chromosome. Summary statistics for population genetics. (b) Box plots displaying distribution of nucleotide diversity, measured exome-wide across 100-kb windows in all four breeds. (c) Histogram showing frequency of different Tajima’s D values throughout exome.

Table 3. The summary of nucleotide diversity in all four buffalo breed populations.

Genetic differentiation analysis across populations using global FST values

To minimize false positives, only the top 0.5% of windows with high ZFST values were identified as distinguishing areas in the empirical distributions for each breed. Therefore, 27, 98, 38, and 35 outlier windows (ZFST ≥ 2.64, 2.67, 2.55, and 2.86; corresponding FST ≥ 0.343, 0.334, 0.343, and 0.342) were detected in the Chhattisgarhi, Chilika, Gojri and Murrah buffalo, respectively (). Comparing their outlier windows, it was depicted that most of these differentiating loci were breed-unique, portraying different phenotypic evolutions under adaptations to the local environments. Summary of identified genes for various performance traits under the studied breeds is depicted in .

Figure 4. Whole exome distribution of ZFST values estimated in 100-kb sliding windows with 25-kb increments in all four breeds of buffalo.

Figure 4. Whole exome distribution of ZFST values estimated in 100-kb sliding windows with 25-kb increments in all four breeds of buffalo.

Table 4. Summary of genes identified for performance traits in all four studied breeds.

Gene annotation, enrichment, and network analysis

After the genome annotation, a total of 38 (Chhattisgarhi), 37 (Chilika), 23 (Gojri), and 29 (Murrah) genes were found in the putative outlier windows of each breed. However, under GO-slim biological processes and panther pathways, none of the genes were significantly enriched, after correcting for fisher’s exact test. These genes were classified into many biological processes, top 10 biological processes like the dopamine biosynthetic process (GO:000658) metabolic process (GO:0008152), cellular process (GO:0065007), etc., is presented in . Using STRING database, from annotated 127 genes, 41 were found to be clustered in the main network (). The 41 genes that make up the primary gene network were identified by the centrality analysis as having the most connections with various other candidate genes. depicts top 10 genes showing maximum number of interactions. Based on the results of population structure and differentiation, biological functions of the genes, and data from research that have been published, many genes were potentially accountable for the performance attributes in these unique buffalo breeds and are presented in greater detail below.

Figure 5. (a) Summary of the top 10 GO-slim biological processes into which the candidate genes were classified. (b) Network analysis of genes identified in present study in all four breeds.

Figure 5. (a) Summary of the top 10 GO-slim biological processes into which the candidate genes were classified. (b) Network analysis of genes identified in present study in all four breeds.

Table 5. Top 10 genes based on the number of interactions identified in the STRING database.

Discussion

Buffaloes have spread widely around the world as a result of human migration and the growth of agricultural commerce, especially in Asia and developing nations like India. They can endure tough environments and circumstances while assisting low- and middle-income farmers in maintaining their way of life.Citation44 Present study was laid with objective to explore large number of variants and genes that are under long-term natural selection for performance traits that will provide valuable genetic resource for improvement in these breeds in future. This is the foremost whole-exome based comparative study including diverse individuals from all four breeds. For population genetic analysis, with decreasing sequencing cost, ddRAD is now almost obsolete as it results in much missing data depicting biased results.Citation45 Pre-selected SNPs in the array for the BEADCHIP dataset may not fundamentally be a uniform representation of the populations under study. Also, an enlarged percentage of ancestry informative markers in the data will estimate biased results for statistics related to population genetic studies.Citation25 With whole genome data, the identification of lot many uninformative variants (neutral or weak effect) along with the associated cost and complexity of data analysis makes its accessibility limited.Citation46 Foremost, whole exome data is not affected by any of the biases discussed above and is most preferred for population based genetical studies.Citation24,Citation25,Citation47

Wider prospective of genome understanding of how the natural selection has shaped the pattern of genetic variation that not only explores insights into mechanism of evolutionary changes but also sightsee population structure associated with complex traits that further elucidate genetic evaluation, environmental adaptation, and protection of the genetic resources.Citation48,Citation49,Citation50–53 So far limited studies in Chhattisgarhi, Chilika, Gojri, and a few in Murrah have been carried out for performance attributes.Citation6,Citation9,Citation10,Citation54,Citation55 The most significant qualities of buffalo are their performance aspects, which are also among their most significant economic traits. Quantitative qualities such as performance are regulated by polygenes.Citation56 In a DNA sequence, insertions and deletions (indels) are additions or deletions of one or more nucleotides. InDels cause similar levels of variation as SNPs and have widely variable effects on gene expression and thus can modify the function of genes.Citation57 InDels are thought to encompass more polymorphic base pairs than SNP.Citation58,Citation59 InDels are being widely used in population genetics studies.Citation60–62

In present study, 3 to 4 samples per breed were taken ensuring sample collection was from different parts of their entire breeding tract. Reportedly, very small sample size of n = 2 is optimum to get precise estimate of FST in population genomics of non-model organism using NGS markers.Citation63 Small sample sizes (n = 3 to 4) of random individuals for particular breed are required to yield reliable estimates of differentiation with high precision when using high-throughput markers.Citation64,Citation65 Limited sample size in present study did not only reduce the cost of our research project but also enabled us to analyze greater variety of populations at the same price. Further, it provided a significant benefit in population genetic research on breeds like the Chilika whose sample collection is tedious.

The foundation of certain population genetic methods, like admixture analysis and PCA, is the premise of linkage equilibrium. Due to the fact that closely connected markers cause some haplotypes to occur more frequently than predicted due to LD and because huge blocks of markers in complete LD can significantly affect the eigenvector/eigenvalue structure, it is crucial to remove linked markers from datasets.Citation66 This is crucial for the exome dataset in particular since genes may be crammed into tiny, transcriptionally active euchromatin areas of the chromosome. Modal value of delta K was vague at true K. This might be due to less sample size used in our study. Later, STRUCTURE analysis clearly distinguished Chilika from other population at K = 2, indicating Chilika buffalo have unique evolutionary features. Chilika is primarily riverine but few hybrids (riverine x swamp) were also reported in the region.Citation67 This might be the probable reason that Chilika got separated from the other three riverine breeds. The rest of the 3 populations might have a closer ancestral relationship so were assigned to one population. At K = 3, Murrah separates itself, and there was no genetic differentiation between Chhattisgarhi and Gojri breeds. At K = 4, STRUCTURE assigned all four populations into four different breeds without any admixture. This indicated that studied populations were quite differentiated. Probable reason for this might be due to their different geographical locations, living environment of breeds, and long-term natural mating state, with minimal artificial selection apart from Murrah breed. Our result of clear genetic differentiation between Gojri and Murrah breed was consistent with previous studies where Gojri breed was having pure blood of 99.4% of Gojri with 0.4% of admixture from Murrah breed, indicating Gojri is a rare breed with high genetic variability.Citation55 To further elucidate the genetic relationship between studied populations, PCA was explored. The total variance explained by all three eigenvectors was around 56.2%, which substantially contributes to variation. Because each breed sample was taken from different regions of their breeding tract, it is possible that the PCA analysis of these breeds showed a more scattered pattern indicating the more genetic distance between individuals of the same breed.

In present study, Gojri showed a reduced level of nucleotide diversity compared to other three breeds. This might be probably due to reason that individuals of Gojri are more genetically related and thus within population genetic diversity is less. Rest three breeds had almost similar level of nucleotide diversity that might be likely influenced by natural and environmental selection in each region, thus resulting in different genomic characteristics. TD statistics result depicts the presence of slight balancing selection in all breeds population. The slight balancing status in the studied population might be probably elucidated by the nonoccurrence of any breeding related program in Chilika, Chhattisgarhi, and Gojri during the past years. Similar results were shown in Mangalarga Marchador horses that were under strong balancing selection.Citation67 After the above analyses, unequivocal evidence of pronounced genetic variation exists in the breeds under study. Thus, further FST statistics was explored for population differentiation associated with performance traits. Despite the possibility of false positive results from genome scans based on population differentiation, such as FST, this technique is most effective for identifying differentiating loci in populations with unknown phenotypes.Citation68–71 Several breed-specific unique and highly differentiating loci were identified in the present study.

Candidate genes associated with milk yield

Genomic region encompassing CAMSAP3 (Calmodulin regulated spectrin associated protein family member 3) (ZFST=2.85) was found to be involved in a variety of GO-slim BP including cellular component organization (GO:0022613), complex of protein subunit organization (GO:0043933), ribosomal large subunit biogenesis, etc., in Chhattisgarhi breed. Previous studies suggested that CAMSAP3 was found to be related with milk yield in Egyptian buffaloes.Citation72 CAMSAP3 played central role impacting the lactation process in Bos taurus, Ovis aries, and Bubalus bubalis.Citation73 Average milk yield was reported to be 4 kg per day with 7 to 12 months of lactation length in Chhattisgarhi.Citation11 CAMSAP3 could be one probable reason for minorly affecting milk yield in this breed.

Milk protein associated candidate gene

Whey protein and casein make up the majority of the milk proteins, and their synthesis is controlled by a number of hormones and biological processes. Several genes were identified as differentiating genomic regions under natural selection. Among these, CLCN1 (Chloride channel protein) (ZFST=2.64) encodes ion channel, CASP2 (Caspase 2) (ZFST=2.64) encodes protease in Chhattisgarhi, NUDT3 (Diphosphoinositol polyphosphate phosphohydrolase 1) (ZFST=2.73) encodes phosphatase in Chilika and LPAR6 (Lysophosphatidic acid receptor 6) (ZFST=2.93) encodes G protein-coupled receptor in Murrah. These genes were enriched under GO-slim BP like cellular process (GO:0009987). CASP2 and CLCN1 both were informed to be related with protein yield in milk of Holstein bovine of North American.Citation74 These genes might be related with increase in nutritional quality of milk in Chhattisgarhi.

Previous study reported that NUDT3 responsive mRNAs were discovered to have a role in cell motility, and the stability of integrin 6 and lipocalin-2 mRNAs was directly correlated with NUDT3 decapping.Citation75 NUDT3 was previously associated with 305-day yield of milk by impacting protein of milk production through its involvement in mRNA decay in Holstein cows.Citation76 There is a huge demand for milk of Chilika due to its nutritional qualities, and most probable reason might be they solely feed on natural vegetation that are highly nutritionally enriched. This gene could also be one of the reasons for promoting nutritional value of Chilika milk. LPAR6 gene was identified to be the most promising gene that influences milk protein content in Chinese Holstein.Citation77 Therefore, LPAR6 might be most likely responsible for milk protein percentage in Murrah buffaloes.

Candidate genes associated with milk fat

Milk contains 99% milk triglycerides, a superior natural fat that is crucial to milk’s composition.Citation78 Genomic region differentiated in Murrah breed encompassed FADS1 (fatty acid desaturase 1) (ZFST=2.86). FADS1 adds double bond at 5th and 6th position of long-chain polyunsaturated fatty acids in Canadian Holstein cattle’s milk.Citation79 This gene in accordance with previous studies, might be related with milk fat yield in Murrah.

Reproduction trait associated candidate genes

Several genes were recognized in highly differentiating exonic regions under selection. These were enriched to various GO-slim BP like cell morphogenesis involved in differentiation, developmental process, cellular process, cell-substrate adhesion, actin cytoskeleton reorganization, etc. Among these, DBI (Acyl-CoA-binding domain-containing protein 5) (ZFST=2.76) encodes transfer/carrier protein in Gojri and PTGER3 (Prostaglandin E2 receptor EP3 subtype) (ZFST=2.86) encodes G-protein coupled receptor and PARVG (Parvin gamma) (ZFST=2.86) encodes actin and actin related protein in Murrah breed. Previous studies reported DBI gene was involved in the regulation of mitochondrial steroidogenesis and was predominantly found immunodetected in cells with secretory activity, such as hepatocytes, steroid-producing cells in the adrenal cortex, and testis, etc in mammals.Citation80 PTGER3 gene was related to fertility traits in bovinesCitation81 and PARVG gene was associated to a number of viable embryos in dairy cattle.Citation82 These genes might be related with reproductive processes in Gojri and Murrah breeds.

Candidate genes associated with saline adaptation in Chilika buffalo

Genomic regions encompassing few genes that were enriched to GO-slim BP like regulation of biological process, chemical synaptic transmission modulation, cellular processes, ion transmembrane transport, potassium ion transmembrane transport, etc. HMGA1 (High mobility group AT-hook 1) (ZFST=2.73) encodes endodeoxyribonuclease, GRM4 (Glutamate metabotropic receptor 4) (ZFST=2.67) encodes G-protein coupled receptor, and KCNA3 (Potassium voltage-gated channel subfamily A member 3) (ZFST=2.67) encodes voltage-gated ion channel in Chilika breed. HMGA1 reported to affect human height and making it feasible for certain anthropometric features like adult height.Citation83,Citation84 Chilika has the unique ability of a behavioral pattern of walking into the marshy area without any difficulty.Citation54 HMGA1 might be responsible for anthropometric features adapted to walking into marshy lands. GRM4 was earlier said for regulating mineral content in the muscle of Nellore cattle.Citation85 Normal effects of muscle mineral composition on nerve impulse transmission, osmotic pressure, bone and skeletal tissue function, and the preservation of acid-base equilibrium. GRM4 may be responsible for such homeostasis under saline conditions. KCNA3 was previously reported to be associated with adaptation to alkaline waters in Amur Ide.Citation86 This gene might be one of the reasons for homeostasis and osmoregulation in the Chilika breed inhabiting saline conditions.

Candidate genes associated with a longer shelf life of Indian Chilika curd

Highly differentiating genomic region encompassing RPS28 (40S ribosomal protein S28) (ZFST=2.73) that encodes for ribosomal protein was identified in Chilika breed. This gene was enriched for GO-slim BP like cellular aromatic compound metabolism, RNA metabolism, and nucleobase-containing chemical metabolism, etc. Earlier studies suggested that RPS28 gene expression pattern recommends the possible higher antioxidant activity of milk in Kashmiri cattle.Citation87 Dairy products like curd and yogurt derived from the milk of the Chilika breed have a longer shelf life. primarily, Chilika curd possesses beneficial microflora like Lactobacillus, Lactococcus, Streptococcus, Leuconostoc, and yeast which shows anti-fungal activity by the secreting number of anti-fungal compounds such as 3- hydroxyl fatty acid, caproic acid, etc. Subsequently, Chilika curd is enriched with thermo-tolerant lactobacilli inoculum, Dendrocin- an antifungal protein isolated from a fresh bamboo shoot.Citation14,Citation88 This gene might also be one of the reasons for imparting longer shelf life of the curd prepared from milk of Chilika buffalo.

Candidate genes associated with high altitude adaptation

Few differentiating genomic regions that were under selection were enriched for GO-Slim BP like cellular process, tyrosine metabolism, catecholamine metabolism, cellular biogenic amine metabolism, organic substance metabolism, etc. APBB2 (Amyloid beta precursor protein binding family B member 2) (ZFST=2.55), TH (Tyrosine 3-hydroxylase) (ZFST=2.55) encodes oxidoreductase and CTU1 (Cytoplasmic tRNA 2-thiolation protein 1) genes in Gojri buffalo were retrieved from this part of exome. Few genes were previously reported to be associated with altitude adaptation namely ALKBH8 (Alkylated DNA repair protein alkB homolog 8) (ZFST=2.76), INS (insulin) (ZFST=2.55) encode growth factor, IGF2 (insulin-like growth factor 2) (ZFST=2.55), CALD1 (Caldesmon 1) (ZFST=2.76), HABP2 (Hyaluronan Binding Protein 2) (ZFST=2.55) were also identified in our study located to this region of genome in Gojri. ALKBH8 controls the expression of selenocysteine-protein to prevent damage from reactive oxygen species in mice (ROS).Citation89 Low oxygen pressure and an increase in the production of reactive oxygen are caused by high altitude exposure, and these changes are frequently accompanied by an increase in oxidative damage to lipids, proteins, and DNA. Gojri are semi-migratory and move to top hills to avoid peak summers of plain areas. Change in altitude from plain to hills might leads to low oxygen pressure which ultimately generates ROS.

ALKBH8 gene might be probably responsible for preventing any damage caused by free radicals. INS gene regulates insulin biosynthesis in pancreatic beta cells in humans.Citation90 IGF2 was reported to encode an essential growth factor and plays an important role in muscle growth in different developmental stages in Chinese Qincuan beef cattle breed.Citation91 CALD1 expressed in smooth muscle, it inhibits vascular smooth muscle tone in people, regulates smooth muscle contraction.Citation92 HABP2 was related to cell growth and proliferation in Brazilian Nellore beef cattle.Citation93 TH was reported to be activated in the adrenal medulla during the time of prolonged stress in bovine adrenal medullary cultured cells.Citation94 CTU1 was said to have an impact on both production and non-production qualities, making it a crucial marker for a dairy cattle selection.Citation95 APBB2 involved in amyloid precursor protein processing and involvement in cell cycle regulation in different cells lines.Citation96 Dry and non-pregnant Gojri, migrate from the plains/foothills of Punjab and Himachal Pradesh to the top hills of Kangra and Chamba divisions of Himachal Pradesh. Typically, it takes them 10 to 15 days to finish their voyage. Highland has a frigid climate, which is feasible for buffaloes to manifest symptoms of heat and have a high conception rate. Most buffaloes used to be pregnant when they traveled back. The migratory path is extremely undulating and challenging to traverse. Its most difficult to avoid toppling down and slipping when pregnant and moving downhill. These buffaloes have the peculiar feature of “sure footing”. This makes them walk and graze hilltops without slipping. Identified genes might be responsible for the development and strengthening of muscles of limbs and other body parts enabling laid-back mobility even during pregnancy.

Candidate genes related to immunity

Two genes were identified to be under natural selection, LCN2 (Neutrophil gelatinase-associated lipocalin) encoding transfer/carrier protein in Chilika and ITGB1BP1 (Integrin beta-1-binding protein 1) (ZFST=2.86) in Murrah. These were enriched for the pathways like the control of cellular processes, the organization or biogenesis of cellular components, etc. LCN2 reported to act as a natural bacteriostatic agent against bacterial infection and inflammatory conditions in mammals.Citation97,Citation98 Chilika has a very low mortality rate without any medications. This gene might be one of the most probable causes of low mortality in this breed. ITGB1BP1 reportedly associated with disease immune mechanisms and adaptation to the tropical environment in Nigerian cattle.Citation99 This gene might be related to the adaptation of Murrah buffaloes to tropical climatic conditions and resistance to various diseases occurring in such climates.

Further, gene to gene interaction indicated that NUDT3 interacted (confidence score = 0.613) with RPS28 (increase in the shelf life of curd candidate) gene. INS interacted with a confidence score of 0.416 with the TH gene and with a confidence score of 0.424 with the LCN2 candidate gene. KCNA3 interacted (confidence score = 0.559) with the RPS28 candidate gene. KCNA3 is a voltage-gated ion channel gene. ALKBH8 interacted with a confidence score of 0.953 with the CTU1 candidate gene. ALKBH8 gene prevents free radical-based damage to the cells. This suggests that most of identified genes might have interrelated functions. Few genes were interacted with one or two or with none of the genes suggesting these genes might have independent functions or might be interacting with the other genes that are not included in this interaction.

Conclusion

Exome sequencing, which is preferred for population genetic analysis has provided evidence of overall patterns of genetic structure and differentiation, dissecting apart the genetic variants and genomic regions showing signals of natural selection specific to each of the studied breeds. Exome sequencing and analysis unmasked several genes that may have role in the regulation of performance and adaptation traits in the studied rare (Gojri, Chilika and Chhattisgarhi) buffalo breeds. These genes can be further validated with candidate gene approach to establish their roles in regulating the traits. The results of this study will enable breeders and geneticists in planning breeding and conservation programs in Chhattisgarhi, Chilika, and Gojri that later will pave the way for establishing the set of genomic markers which can be utilized in execution of genomic selection and breeding in the studied breeds.

Author contributions

VV conceptualized the research work, sample collection, critical evaluation and editing of the manuscript. SC and VU analyzed the data. VU and VV wrote the manuscript. AS, RG, VD and VP visualized the manuscript. The article’s submission was approved by all authors who contributed to it.

Acknowledgements

The Directors of ICAR-NDRI, Karnal, and ICAR-CIRB, Hisar are thanked by the authors for providing the necessary assistance and funding required to complete the work.

Disclosure statement

The study’s authors affirm that there were no financial or commercial ties that may be viewed as having a possible conflict of interest.

Additional information

Funding

The authors acknowledge the funding and assistance provided by the director of ICAR-NDRI, Karnal, in order to complete the work.

References

  • https://nbagr.icar.gov.in/en/home/.
  • Bonhomme M, Chevalet C, Servin B, et al. Detecting selection in population trees: the Lewontin and Krakauer test extended. Genetics. 2010;186(1):241–262.
  • Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L. Natural selection has driven population differentiation in modern humans. Nat Genet. 2008;40(3):340–345.
  • Coyne JA, Orr HA. Speciation. Sunderland, MA: Sinauer Associates Incorporated; 2004.
  • Fisher RA. The Genetical Theory of Natural Selection. Oxford: The Clarendon Press; 1930.
  • Vohra V, Chhotaray S, Gowane G, et al. Genome-wide association studies in Indian Buffalo revealed genomic regions for lactation and fertility. Front Genet. 2021;12:696109.
  • Surya T, Vineeth MR, Sivalingam J, et al. Genomewide identification and annotation of SNPs in Bubalus bubalis. Genomics. 2019;111(6):1695–1698.
  • Tyagi SK, Mehrotra A, Singh A, et al. Comparative signatures of selection analyses identify loci under positive selection in the Murrah Buffalo of India. Front Genet. 2021;12:673697.
  • Singh R, Rajesh C, Mishra SK, et al. Comparative expression profiling of heat-stress tolerance associated HSP60 and GLUT-1 genes in Indian buffaloes. Indian J Dairy Sci. 2018;71(2):183–186.
  • Vohra V, Kumar M, Chopra A, Niranjan SK, Mishra AK, Kataria RS. Polymorphism in exon-40 of FASN gene in lesser-known buffalo breeds of India. J Anim Rese. 2015;5(2):325–328.
  • Vohra V, Singh M, Mukherjee K, Kataria RS. Identification and characterization of Chhattisgarhi buffalo population in India. Indian J Anim Sci. 2017;87(2):182–185.
  • Vohra V, Niranjan SK, Mishra AK, Kataria RS. Monograph on buffalo genetic resources of India-Gojri. National Bureau of Animal Genetic Resources, Karnal. 2015;(88):1-37.
  • Patro BN, Mishra PK, Rao PK. Chilika buffaloes in Orissa: a unique germplasm. Anim Genet Resour Inf. 2003;33:73–79.
  • Nanda DK, Singh R, Tomar SK, et al. Indian Chilika curd–A potential dairy product for Geographical Indication registration. Indian J Tradit Knowl. 2013;12(4):707–713.
  • Hellenthal G, Busby GB, Band G, et al. A genetic atlas of human admixture history. Science. 2014;343(6172):747–751.
  • Luikart G, England PR, Tallmon D, Jordan S, Taberlet P. The power and promise of population genomics: from genotyping to genome typing. Nat Rev Genet. 2003;4(12):981–994.
  • Charlesworth B. Elements of Evolutionary Genetics. Englewood, CO: Roberts Publishers; 2010.
  • Novembre J, Stephens M. Interpreting principal component analyses of spatial population genetic variation. Nat Genet. 2008;40(5):646–649.
  • Tennessen JA, Madeoy J, Akey JM. Signatures of positive selection apparent in a small sample of human exomes. Genome Res. 2010;20(10):1327–1334.
  • Yi X, Liang Y, Huerta-Sanchez E, Jin X, Cuo ZXP, Pool JE, Xu X. Sequencing of 50 human exomes reveals adaptation to high altitude. Science 2010;329(5987):75–78.
  • Teer JK, Mullikin JC. Exome sequencing: the sweet spot before whole genomes. Hum Mol Genet. 2010;19(R2):R145–R151.
  • Mathieson I, McVean G. Differential confounding of rare and common variants in spatially structured populations. Nat Genet. 2012;44(3):243–246.
  • Nelson SC, Crouch JM, Bamshad MJ, Tabor HK, Yu JH. Use of metaphors about exome and whole genome sequencing. Am J Med Genet A. 2016;170A(5):1127–1133.
  • Song S, Yao N, Yang M, et al. Exome sequencing reveals genetic differentiation due to high-altitude adaptation in the Tibetan cashmere goat (Capra hircus). BMC Genomics. 2016;17(1):122.
  • Maroti Z, Boldogkői Z, Tombácz D, Snyder M, Kalmár T. Evaluation of whole exome sequencing as an alternative to BeadChip and whole genome sequencing in human population genetic analysis. BMC Genomics. 2018;19(1):778.
  • Lin M, Whitmire S, Chen J, Farrel A, Shi X, Guo J. Effects of short indels on protein structure and function in human genomes. Sci Rep. 2017;7(1):9313.
  • Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25(14):1754–1760.
  • Li H, Handsaker B, Wysoker A, et al. 1000 genome project data processing subgroup. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–2079.
  • Faust GG, Hall IM. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics. 2014;30(17):2503–2505.
  • McKenna A, Hanna M, Banks E, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–1303.
  • DePristo MA, Banks E, Poplin R, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43(5):491–498.
  • Purcell S, Neale B, Todd-Brown K, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–575.
  • Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155(2):945–959.
  • Wickham H. Ggplot2: Elegant Graphics for Data Analysis. 2nd ed. New York: Springer; 2009.
  • Nei M, Li WH. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc Natl Acad Sci U S A. 1979;76(10):5269–5273.
  • Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989;123(3):585–595.
  • Schmidt D, Pool J. The effect of population history on the distribution of the Tajima’s D statistic. Population English Edition. 2002:1–8.
  • Weir BS, Cockerham CC. Estimating F-statistics for the analysis of population structure. Evolution. 1984;38(6):1358–1370.
  • Wright S. The genetical structure of populations. Ann Eugen. 1951;15(4):323–354.
  • Fariello MI, Boitard S, Naya H, SanCristobal M, Servin B. Detecting signatures of selection through haplotype differentiation among hierarchically structured populations. Genetics. 2013;193(3):929–941.
  • Danecek P, Auton A, Abecasis G, et al. 1000 genomes project analysis group. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–2158.
  • Mi H, Huang X, Muruganujan A, et al. PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements. Nucleic Acids Res. 2017;45(D1):D183–D189.
  • Szklarczyk D, Franceschini A, Wyder S, et al. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015;43(Database issue):D447–D452.
  • Borghese A. Buffalo production and research. REU Technical Series 2005;67:1–315.
  • Friel J, Bombarely A, Fornell CD, Luque F, Fernández-Ocaña AM. Comparative analysis of genotyping by sequencing and whole-genome sequencing methods in diversity studies of Olea europaea L. Plants. 2021;10(11):2514.
  • Gibbon A, Saunders CJ, Collins M, Gamieldien J, September AV. Defining the molecular signatures of Achilles tendinopathy and anterior cruciate ligament ruptures: A whole-exome sequencing approach. PLoS One. 2018;13(10):e0205860.
  • Shen Y, Ha W, Zeng W, Queen D, Liu L. Exome sequencing identifes novel mutation signatures of UV radiation and trichostatin A in primary human keratinocytes. Sci Rep. 2020;10(1):4943.
  • Otto SP. Detecting the form of selection from DNA sequence data. Trends Genet. 2000;16(12):526–529.
  • Przeworski M, Hudson RR, Di Rienzo A. Adjusting the focus on human variation. Trends Genet. 2000;16(7):296–302.
  • Vitalis R, Dawson K, Boursot P. Interpretation of variation across marker loci as evidence of selection. Genetics. 2001;158(4):1811–1823.
  • Nielsen R. Statistical tests of selective neutrality in the age of genomics. Heredity (Edinb). 2001;86(Pt 6):641–647.
  • Groeneveld LF, Lenstra JA, Eding H, Globaldiv Consortium, et al. Genetic diversity in farm animals–a review. Anim Genet. 2010;41 Suppl 1(s1):6–31.
  • Bruford MW, Ginja C, Hoffmann I, et al. Prospects and challenges for the conservation of farm animal genomic resources, 2015-2025. Front Genet. 2015;6:314.
  • Singh R, Mishra SK, Gurao A, et al. Current status and unique attributes of Indian Chilika bufalo for adaptation to brackish water ecology. Trop Anim Health Prod. 2021;53(6):544.
  • Vohra V, Singh NP, Chhotaray S, Raina VS, Chopra A, Kataria RS. Morphometric and microsatellite-based comparative genetic diversity analysis in Bubalus bubalis from North India. PeerJ. 2021;9:e11846.
  • Barton NH, Keightley PD. Understanding quantitative genetic variation. Nat Rev Genet. 2002;3(1):11–21.
  • Ng PC, Levy S, Huang J, et al. Genetic variation in an individual human exome. PLoS Genet. 2008;4(8):e1000160.
  • Mullaney JM, Mills RE, Pittard WS, Devine SE. Small insertions and deletions (INDELs) in human genomes. Hum Mol Genet. 2010;19(R2):R131–R136.
  • Cheng Z, Ventura M, She X, et al. A genome-wide comparison of recent chimpanzee and human segmental duplications. Nature. 2005;437(7055):88–93.
  • Hinds DA, Kloek AP, Jen M, Chen X, Frazer KA. Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat Genet. 2006;38(1):82–85.
  • Wu K, Yang M, Liu H, Tao Y, Mei J, Zhao Y. Genetic analysis and molecular characterization of Chinese sesame (Sesamum indicum L.) cultivars using Insertion-Deletion (InDel) and Simple Sequence Repeat (SSR) markers. BMC Genet. 2014;15(1):35.
  • Zhou G, Zhang Q, Tan C, Zhang XQ, Li C. Development of genome-wide InDel markers and their integration with SSR, DArT and SNP markers in single barley map. BMC Genomics. 2015;16(1):804.
  • Nazareno AG, Bemmels JB, Dick CW, Lohmann LG. Minimum sample sizes for population genomics: an empirical study from an Amazonian plant species. Mol Ecol Resour. 2017;17(6):1136–1147.
  • Willing EM, Dreyer C, Van Oosterhout C. Estimates of genetic differentiation measured by FST do not necessarily require large sample sizes when using many SNP markers. 2012.
  • Qu WM, Liang N, Wu ZK, Zhao YG, Chu D. Minimum sample sizes for invasion genomics: Empirical investigation in an invasive whitefly. Ecol Evol. 2020;10(1):38–49.
  • Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2(12):e190.
  • Singh R, Lava Kumar S, Mishra SK, et al. Mitochondrial sequence‐based evolutionary analysis of riverine-swamp hybrid bufaloes of India indicates novel maternal diferentiation and domestication patterns. Anim Genet. 2020;51(3):476–482.
  • Santos WB, Schettini GP, Maiorano AM, et al. Genome-wide scans for signatures of selection in Mangalarga Marchador horses using high-throughput SNP genotyping. BMC Genomics. 2021;22(1):737.
  • Chen H, Patterson N, Reich D. Population differentiation as a test for selective sweeps. Genome Res. 2010;20(3):393–402.
  • Narum SR, Hess JE. Comparison of FST outlier tests for SNP loci under selection. Mol Ecol Resour. 2011;11 Suppl 1(s1):184–194.
  • Hoban S, Kelley JL, Lotterhos KE, et al. Finding the genomic basis of local adaptation: pitfalls, practical solutions, and future directions. Am Nat. 2016;188(4):379–397.
  • Du C, Deng T, Zhou Y, et al. Systematic analyses for candidate genes of milk production traits in water buffalo (Bubalus Bubalis). Anim Genet. 2019;50(3):207–216.
  • Farhadian M, Rafat SA, Panahi B, Mayack C. Weighted gene co-expression network analysis identifies modules and functionally enriched pathways in the lactation process. Sci Rep. 2021;11(1):2367.
  • Pedrosa VB, Schenkel FS, Chen SY, et al. Genomewide association ana lyses of lactation persistency and milk production traits in Holstein cattle based on imputed whole-genome sequence data. Genes (Basel). 2021;12(11):1830.
  • Grudzien‐Nogalska E, Kiledjian M. New insights into decapping enzymes and selective mRNA decay. Wiley Interdiscip Rev: RNA. 2017;8(1):e1379.
  • Clancey E, Kiser JN, Moraes JGN, Dalton JC, Spencer TE, Neibergs HL. Genome-wide association analysis and gene set enrichment analysis with SNP data identify genes associated with 305-day milk yield in Holstein dairy cows. Anim Genet. 2019;50(3):254–258.
  • Li C, Cai W, Zhou C, et al. RNA-Seq reveals 10 novel promising candidate genes affecting milk protein concentration in the Chinese Holstein population. Sci Rep. 2016;6(1):26813.
  • Bionaz M, Loor JJ. Gene networks driving bovine milk fat synthesis during the lactation cycle. BMC Genomics. 2008;9(1):366.
  • Ibeagha-Awemu EM, Akwanji KA, Beaudoin F, Zhao X. Associations between variants of FADS genes and omega-3 and omega-6 milk fatty acids of Canadian Holstein cows. BMC Genet. 2014;15(1):25.
  • Gossett RE, Frolov AA, Roths JB, Behnke WD, Kier AB, Schroeder F. Acyl‐CoA binding proteins: Multiplicity and function. Lipids. 1996;31(9):895–918.
  • Nuttinck F, Gall L, Ruffini S, et al. PTGS2-related PGE2 affects oocyte MAPK phosphorylation and meiosis progression in cattle: late effects on early embryonic development. Biol Reprod. 2011;84(6):1248–1257.
  • Capitan A, Michot P, Baur A, et al. Genetic tools to improve reproduction traits in dairy cattle. Reprod Fertil Dev. 2014;27(1):14–21.
  • Lango Allen H, Estrada K, Lettre G, et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010;467(7317):832–838.
  • Berndt SI, Gustafsson S, Mägi R, et al. Genome-wide meta-analysis identifies 11 new loci for anthropometric traits and provides insights into genetic architecture. Nat Genet. 2013;45(5):501–512.
  • Afonso J, Shim WJ, Boden M, et al. Repressive epigenetic mechanisms, such as the H3K27me3 histone modification, were predicted to affect muscle gene expression and its mineral content in Nelore cattle. Biochem Biophys Rep. 2023;33:101420.
  • Xu J, Li J-T, Jiang Y, et al. Genomic basis of adaptive evolution: the survival of Amur Ide (Leuciscu s waleckii) in an extremely alkaline environment. Mol Biol Evol. 2017;34(1):145–159.
  • Bhat SA, Ahmad SM, Ibeagha-Awemu EM, et al. Comparative transcriptome analysis of mammary epithelial cells at different stages of lactation reveals wide differences in gene expression and pathways regulating milk synthesis between Jersey and Kashmiri cattle. PLoS One. 2019;14(2):e0211773.
  • Sahoo S, Maji UJ, Mohanty S. Incidence and preliminary characterization of Lactic acid bacteria as potential probiotic strains from an artisanal milk product, Chilika curd of Odisha. IJDS. 2020;73(2):123–130.
  • Endres L, Begley U, Clark R, et al. Alkbh8 regulates selenocysteine-protein expression to protect against reactive oxygen species damage. PLoS One. 2015;10(7):e0131335.
  • Liu M, Sun J, Cui J, et al. INS-gene mutations: from genetics and beta cell biology to clinical disease. Mol Aspects Med. 2015;42:3–18.
  • Huang Y-Z, Zhan Z-Y, Sun Y-J, et al. Intragenic DNA methylation status down-regulates bovine IGF2 gene expression in different developmental stages. Gene. 2014;534(2):356–361.
  • Humphrey MB, Herrera-Sosa H, Gonzalez G, Lee R, Bryan J. Cloning of cDNAs encoding human caldesmons. Gene. 1992;112(2):197–204.
  • Mudadu MA, Porto-Neto LR, Mokry FB, et al. Genomic structure and marker-derived gene networks for growth and meat quality traits of Brazilian Nelore beef cattle. BMC Genomics. 2016;17(1):492.
  • Stachowiak MK, Hong JS, Viveros OH. Coordinate and differential regulation of phenylethanolamine N-methyltransferase, tyrosine hydroxylase and proenkephalin mRNAs by neural and hormonal mechanisms in cultured bovine adrenal medullary cells. Brain Res. 1990;510(2):277–288.
  • Xiang R, MacLeod IM, Bolormaa S, Goddard ME. Genome-wide comparative analyses of correlated and uncorrelated phenotypes identify major pleiotropic variants in dairy cattle. Sci Rep. 2017;7(1):9248.
  • Penna I, Vassallo I, Nizzari M, et al. A novel snRNA-like transcript affects amyloidogenesis and cell cycle progression through perturbation of Fe65L1 (APBB2) alternative splicing. Biochim Biophys Acta. 2013;1833(6):1511–1526.
  • Goetz DH, Holmes MA, Borregaard N, Bluhm ME, Raymond KN, Strong RK. The neutrophil lipocalin NGAL is a bacteriostatic agent that interferes with siderophore-mediated iron acquisition. Mol Cell. 2002;10(5):1033–1043.
  • Shashidharamurthy R, Machiah D, Aitken JD, et al. Differential role of lipocalin 2 during immune complex–mediated acute and chronic inflammation in mice. Arthritis Rheum. 2013;65(4):1064–1073.
  • Mauki DH, Tijjani A, Ma C, et al. Genome-wide investigations reveal the population structure and selection signatures of Nigerian cattle adaptation in the sub-Saharan tropics. BMC Genomics. 2022;23(1):306.