Whole genome resequencing of large yellow croaker populations
We collected a total of 198 large yellow croaker individuals (Table S1). Of these, 50 individuals were captured in the Zhoushan Sea (the red dot in Fig. 1a) and 48 individuals had been farmed in Zhoushan (the orange dot in Fig. 1a). A further 38 individuals were captured in the Ningde Sea (the blue dot in Fig. 1a). and 62 individuals had been farmed in Ningde (the green dot in Fig. 1a). We performed whole-genome resequencing of these 198 large yellow croaker individuals. We obtained 1.42 Penta base-pairs of genomic DNA, representing about 11 × sequencing depth of the genome per individual. Raw reads were trimmed and aligned to the genome sequence. After variant calling and filtering, a total of 6,302,244 single nucleotide polymorphisms (SNPs) were identified. Using this SNP information, we performed the following population genomic analyses.
Population structure and relationship of large yellow croaker. (a) Geographic map indicating the sample origins of the large yellow croaker in this study. The gross appearance of a large yellow croaker individual is shown in the picture. The sampling area is highlighted by the red broken line. The dots of different color stand for different population. The number of individuals is given in parentheses after the population name. The geographical maps were generated by using R packages of maps v3.3.0 (https://cran.r-project.org/web/packages/maps) and mapdata v2.3.0 (https://cran.r-project.org/web/packages/mapdata). (b) PCA plot (PC1 and PC2) showing the genetic structure of the 198 large yellow croaker individuals. The degrees of explained variance is given in parentheses. Colors reflect the geographic regions in (a). (c) UMAP of the 198 large yellow croaker individuals. Colors reflect the geographic regions in (a).
Genetic population structure of the large yellow croaker individuals
In order to examine the genetic population structure of the large yellow croaker individuals, we performed principal component analysis (PCA). In the first component of the PCA, the Zhoushan farmed population separated from the other three populations (Fig. 1b). In the second component of the PCA, the Zhoushan sea-captured population formed a cluster. Also, the Ningde farmed population formed a cluster. The Ningde sea-captured population had a wider distribution than the other populations. Then, we performed uniform manifold approximation and projection (UMAP), a non-linear dimensionality method (Fig. 1c). The result of UMAP is similar to the result of PCA. UMAP showed that the Zhoushan farmed population formed a distinct cluster, and the Zhoushan sea-captured population and Ningde farmed population formed more scattered clusters. UMAP also showed that the Ningde sea-captured population had a wider distribution than the other populations.
The evolutionary history of the individuals was inferred with the neighbour-joining (NJ) tree. The NJ tree contains two large groups (Fig. 2a). The first group was formed by the individuals of the Zhoushan farmed population plus several individuals of the Zhoushan sea-captured population. The other group was formed by the individuals in the other three groups. In this group, individuals of the Zhoushan sea-captured formed a distinct cluster from the individuals of the Ningde sea-captured population and those of the Ningde farmed population. The individuals of the Ningde sea-captured population and those of the Ningde farmed population together formed several small groups.
Neighbor-joining tree and admixture analysis using whole-genome SNP data. (a) Neighbor-joining tree of the 198 large yellow croaker individuals. The color scheme follows Fig. 1. The scale bar represents pairwise distances between different individuals. (b) Cross-validation error in the admixture analysis. The x-axis represents K values and the y-axis represents the corresponding cross-validation error. The cross-validation error was lowest at K = 3. (c) Admixture plot (K = 2, 3, 4) for the 198 large yellow croaker individuals. Each individual is shown as vertical bar divided into K colors. The color scheme follows Fig. 1. Individuals are arranged by population.
We performed unsupervised clustering analysis with ADMIXTURE to evaluate the relatedness of the populations. Cross-validation error was lowest at K = 3, suggesting that the population genetic structure of our samples is best modelled by considering the admixture of the three genetic components (Fig. 2b). The individuals of the Zhoushan farmed population are composed of relatively uniform genetic components (Fig. 2c). The individuals of the Ningde farmed population are composed of genetic components that are also relatively uniform but different from those of the Zhoushan farmed population. Both the individuals of the Zhoushan sea-captured population and those of the Ningde sea-captured population were a mixture of the three genetic components.
Trends of effective population size
We evaluated the extents of linkage disequilibrium for SNP pairs. The average r2 values of linkage disequilibrium decreased by increasing the marker distance between pairwise SNPs, with a rapidly declining trend observed over the first 500 kb (Fig. 3a). Using this information, we estimated the change of the effective population size over the past 1000 generations (Fig. 3b). All the four populations showed decreasing trends of effective population sizes, suggesting that their genetic diversities remain at a low level.
Trends of effective population sizes. (a) LD decay (r2) from 0 to 4000 kb for four populations. The x-axis represents marker distances between pairwise SNPs. The y-axis represents r2 values of linkage disequilibrium. (b) Effective population sizes of four populations over the past 1000 generations. All of the four populations showed decreasing trends.
Detection of putative genes associated with the adaptation to different sea environments of the Zhoushan Sea and Ningde Sea
To identify the genetic markers to differentiate individuals of the Zhoushan sea-captured and Ningde sea-captured, we calculated fixation index (Fst) values for each SNP. We identified total 819 SNPs as genetic markers (Table S2). To identify the genes associated with adaptation to the different living environments between these two regions, we calculated average Fst values in 40 kb windows with 10 kb steps (Fig. 4). We identified 47 regions with significant Fst values. The total size of these regions is 3.6 Mb. The sizes of the significant regions were between 40 kb to 0.31 Mb. These regions contained 88 genes (Table S3). We categorised the functions of these genes based on their gene ontology (GO) term annotations (Table S4). These genes include those involved in muscle structure development (GO:0061061) such as pdlim3a (pdz and lim domain 3). This gene is located in the region from 26,673,301 to 26,662,947 bp on chromosome 10, and is reported to be highly expressed in muscle and involved in the crosslinking of actin filaments15. We identified three upstream variants of this gene which are located at 26,675,034 bp, 26,675,134 bp, and 26,678,221 bp on chromosome 10 (Fig. 4). We also identified one downstream variant located at 26,660,973 bp on chromosome 10. Besides muscle structure development (GO:0061061), there are also some enriched GO terms such as regulation of response to external stimulus (GO:0032101) and cell–cell signalling (GO:0007267).
Genomic regions associated with regional differentiation of large yellow croaker between Zhoushan sea and Ningde sea. Manhattan plot for average Fst values in 40 kb windows with 10 kb steps between Zhoushan sea-captured population and Ningde sea-captured population. The x-axis represents chromosomal positions and the y-axis represents the average Fst values.
Detection of putative genes under selective sweep between the Zhoushan sea-captured population and farmed population
To identify the genes under selective sweep in the domestication process, we analysed single Fst values for single SNPs and average Fst values in 40 kb windows with 10 kb steps separately both in the Zhoushan and Ningde regions. Between the Zhoushan sea-captured population and farmed population, we identified 23,862 SNPs with significant Fst values by single SNP analysis (Table S5). In the sliding window analysis, the number of significant regions was 317, and the total size of significant regions was 59 Mb (Fig. 5a). The sizes of significant regions were between 40 kb to 8.1 Mb. These regions contain 1709 genes (Table S6). We identified the strong peak of Fst signal on chromosome 11, which contains 423 genes such as hsp90ab1 (heat shock protein 90 alpha family class B member 1). GO analysis showed that genes involved in the regulation of fatty acid oxidation (GO:0031998), the steroid hormone mediated signalling pathway (GO:0043401), fatty acid metabolic processes (GO:0006631), membrane lipid metabolic processes (GO:0006643), regulation of fatty acid metabolic processes (GO:0019217), and long-chain fatty acid transport (GO:0015909). These GO terms include plenty of lipid metabolism-related genes such as ppara (peroxisome proliferator activated receptor alpha), pnpla2 (Patatin like phospholipase domain containing 2). It is worth mentioning that there were plenty of genes related to carbohydrate derivative metabolic processes (GO:1901135) with differences between the Zhoushan sea-captured population and farmed populations (Table S7). Additionally, a number of the growth relative genes include the developmental growth involved in morphogenesis (GO:0060560). Genes were found related to embryo development ending in birth or egg hatching (GO:0009792). Additionally, 47 genes related immune system development (GO:0002520) were obtained, such as taf3 (tata-box binding protein associated factor 3), irf7 (interferon regulatory factor 7) and rps7 (ribosomal protein s7) (Table S7).
Genomic regions associated with domestication of large yellow croaker between Zhoushan sea or Ningde sea. (a) Manhattan plot for average Fst values in 40 kb windows with 10 kb steps between Zhoushan sea-captured and Zhoushan farmed. (b) Manhattan plot for average Fst values in 40 kb windows with 10 kb steps between Ningde sea-captured and Ningde farmed. The x-axis represents chromosomal positions and the y-axis represents the average Fst values.
Moreover, we found that anxa2a (annexin a2a; from 16,718,332 bp to 16,713,531 bp on chromosome 21) have a splice donor site variant at 16,715,408 bp on chromosome 21. This mutation is located at the fifth intron of anxa2a, and is predicted to lead to a premature truncation. The anxa2a gene encodes a phospholipid-binding protein, and is involved in variety of intracellular processes including endocytosis, exocytosis, membrane domain organisation, actin remodelling, signal transduction, protein assembly16. This batch of samples came from breeding selection for a freeze-resistant population. We identified nine downstream mutations (16,713,395 bp, 16,713,442 bp, 16,713,443 bp, 16,713,593 bp, 16,715,408 bp, 16,715,741 bp, 16,716,027 bp, 16,716,216 bp and 16,717,363 bp on chromosome 21) of ice2 (interactor of little elongation complex ELL subunit 2) gene, which is located in the region from 16,727,361 to 16,718,192 bp on chromosome 21. This gene is involved in cold acclimation and determines freezing tolerance17.
Detection of putative genes under selective sweep between the Ningde sea-captured and farmed population
For the Ningde farmed population, we identified 660 SNPs with significant Fst values (Table S8). In the sliding window analysis, the number of significant regions was 42, and the total size of significant regions was 7.8 Mb (Fig. 5b). The sizes of significant regions were between 40 kb to 2.0 Mb. These regions contain 238 genes (Table S9). GO analysis showed identified genes related to the reproduction process such as female gonad development (GO:0008585), i.e. esr1 (estrogen receptor 1), foxo3 (forkhead box O3); the development of primary female sexual characteristics (GO:0046545) and embryonic appendage morphogenesis (GO:0035113), such as mbnl1 (muscle blind like splicing regulator 1); as well as embryonic limb morphogenesis (GO:0030326) and the response to steroid hormones (GO:0048545). Additionally, genes related to digestive tract development (GO:0048565) were enriched, such as hnf1b (hnf1 homeobox b) (Table S10). As per the results of SNPs with the highest Fst analysis between the Ningde sea-captured and farmed population, we identified a downstream variant of esr1, which is located at 9,103,629 bp on chromosome 11. This gene is located in the region from 9,129,853 and 9,108,464 bp on chromosome 11 and encodes estrogen receptor 1, which plays a critical role in responding to steroid hormones (Fig. 5b). Genes involved in visual system development (GO:0150063) such as prox1 (prospero-related homeobox1), nr2e1 (nuclear receptor subfamily 2 group e member 1) and znf513a (zinc finger protein 513a) were also enriched. The znf513a gene is located in the region from 11,664,515 to 11,657,703 bp on chromosome 11 and has a downstream variant located at 11,652,743 bp on this chromosome (Fig. 5b).
Source: Ecology - nature.com