in

Genomic characterization between strains selected for death-feigning duration for avoiding attack of a beetle

The present study compared DNA sequences in a whole genome between the long strain and standard genome samples as references or the short strain and standard ones in T. castaneum. The results of resequencing analysis showed variations of DNA sequence from the reference sequence in both long and short strains, and the variations were detected more frequently in the long strain in a whole genome. Small nucleotide variants (SNV), multi-nucleotide variants (MNV), deletion, insertion, and replacement were detected in a whole genome in long and short strains. The same DNA sequence variants sharing between long and short strains were removed for the analyses. The numbers of small variants in total were larger in long strains than short strains (Fig. 1, Tables S1 and S2). The most frequent type of small variants was SNV, and the proportions of SNV were 82.7% (93,233/112,783) in long strains and 82.8% (13,817/16,697) in short strains, respectively (Fig. 1A). The SNVs compared with the reference nucleotide occurred frequently between adenine and guanine or cytosine and thymine in both long and short strains (Fig. 1B), and the frequencies were up to three times as large as other base combinations, indicating more frequent transition and fewer transversion variants. Deletion and insertion ranged from one to nine bases in both long and short strains, with one base was frequently deleted or inserted (Fig. 1C). Homozygosity presented more frequently than heterozygosity in all linkage groups, but the rate of homozygosity to heterozygosity depended on the linkage groups (Fig. 1D). Homozygosity of variants was more frequent in linkage groups 3 (LG3), 5 (LG5) and 7 (LG7) than other linkage groups in both strains. The ratios of homozygosity to heterozygosity were the largest in LGX and LG2 in long and short strains, respectively.

Figure 1

Analytical results of small variants of DNA sequence in a whole genome level in long and short strains. Proportion of small variants as SNV, MNV, deletion, insertion, and replacement in long and short strains (A). The numbers of small variants are indicated as the diameter of a pie graph. Frequencies of the SNVs in both long and short strains were compared with the reference nucleotide (B). Insertion and deletion ranged from one to nine bases in both long and short strains (C). Frequency of homozygosity or heterozygosity and its ratio in all linkage groups in long and short strains (D).

Full size image

The variants distributed in cording and non-cording regions. Figure 2A shows the results of narrowing down the variants in genic region from the variants in a whole genome in the long and short strains, and then aggregating the variants information in the exon, intron, URT and other regions. In all genic region, numbers of variants were larger in long strain than short strain. Then, genes containing these variants were counted in each strain (Fig. 2B). In exon region, genes with nonsynonymous variants were more numerous in the long strain (3243) than the short strain (844), and 464 common genes containing different DNA sequence variants between the strains were detected (Fig. 2B). In the genes with synonymous variants or the genes with variants in intron or UTR, the numbers of genes in long strain were constantly larger than those in short strain (Fig. 2B). The functions of long-unique, short-unique and common genes with variants were sorted into four categories by enrichment analyses as gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) ongoloty (KO) terms (Fig. 2C, Table S3). In the biological process, cellular component, and molecular function, and KEGG pathway, characteristics of nonsynonymous variants in long-unique, short-unique and common genes did not basically overlap among them, indicating specific selection of gene characteristics for each strain. Characteristics of synonymous variants were also sorted, but the synonymous variants may not influence the amino acid sequence of the gene and structure of the protein translated, rather these characteristics may be necessary to maintain the strain and preserved under artificial selection. Variants in intron and UTR may have potential effects on the gene expression, but should be investigated in detail in future study. Analyses of cis-regulatory elements might be important to understand regulation of gene expression, but the information on this region in T. castaneum is not available, therefore, the variants in cis-regulatory elements could not be analyzed.

Figure 2

Analytical results of the position of small variants in a whole genome in long and short strains (A) Numbers of variants in genic region including exon region, intron, UTR and other non-cording regions were indicated. As shown in parentheses, some ncRNAs and tRNAs were contained in exon, intron, and UTR regions. In short strain, there were five regions where two different genes overlap in 5′-UTR and 3′-UTR, respectively. Numbers of genes with variants in exon, intron and UTR regions in long and short strains (B). Numbers of long-unique, short-unique and common genes were shown by Venn diagrams. Common genes contain variants with different DNA sequences between long and short strains. Enrichment analyses of the function of genes with variants sorted into four categories (biological process, cellular component, molecular function, and KEGG pathway) (C). The heatmap is generated using the R package “gplots” (version 3.1.1, https://cran.r-project.org/web/packages/gplots/index.html). The list of each ontology shows the ID and term. The KO id is shown by a three- or four-letter organism code, the first-letter of the genus name and the first two- or three-letters of the species name of the scientific name of the organism, with pathway number. For example, Neuroactive ligand-receptor interaction of Tribolium castaneum is shown as “tca04080”.

Full size image

To explore the position of genes with variants associated with duration of death feigning in linkage groups, bulk segregant analysis was carried out (Fig. 3). The red approximate lines of the plot data crossed over the green threshold lines (P < 0.05) in linkage groups X, 2, 3, 5, 6, 7, 8, and 9, but did not linkage groups 4 and 10 (Fig. 3A, Table S4). Further, the approximate lines in linkage groups X, 2, 3, 5, 7, 8, and 9, crossed over the orange threshold lines (P < 0.01), but those in linkage group 6 did not. Therefore, the genes in linkage groups X, 2, 3, 5, 7, 8, and 9 were candidates selected artificially on the basis of the duration of death feigning. Enrichment analyses of the gene characteristics extracted KO and GO terms including “neuroactive ligand-receptor interaction” and “G-protein coupled receptor activity”, these terms seem to be associated with monoamine receptor activity (Fig. 3B, Table S5).

Figure 3

Analytical results of MutMap approach in each linkage group (A) and enrichment analysis (B). In (A), the red line indicates an approximate line of plot data. The green and orange threshold lines indicate 95% and 99% significance, respectively. The dot plot in (B) is generated using the R package “ggplot” (version 3.3.5, https://cran.r-project.org/web/packages/ggplot2/index.html).

Full size image

Structural variations including large-scale insertion and deletion (InDel), copy number variation (CNV), and presence/absence variation (PAV) were analyzed in a whole genome level in both strains (Fig. 4, Tables S6–S11). Large-scale insertions and deletions were analyzed in 10–393 bases, and 11–20 bases of insertion and deletion were the most frequent in both strains (Fig. 4A). CNV deletions were present more frequently in sizes ranging from 5 to 14 kbases than in other size scales that we examined in both strains, whereas CNV duplications were constantly less frequent at 0 or 1 cases (Fig. 4B). In a larger size scale of nucleotides, up to 7000 kbases, the presence of variations less than 500 kbases of nucleotide sizes was most frequent in the long strain (Fig. 4C).

Figure 4

Structural variations including large-scale insertion and deletion (InDel) in a whole genome (A), copy number variation (CNV) (B), and presence/absence variation (PAV) (C) in long and short strains. Large-scale InDels were shown diagrammatically from 10 to 390 bases. CNV duplication and deletions were shown diagrammatically from 4 to 80 kbases. PAV were shown diagrammatically from 0 to 7000 kbases.

Full size image

All of these are illustrated on each linkage group in Fig. 5A, indicating large-scale insertions and deletions constantly appearing in each linkage group (A and B), less frequent CNV duplications (C), and more frequent CNV deletions (D). Large CNV deletions were present in LG6 and LG7 in the long strain and in LG2 in the short strain, respectively. These variations were sorted into GO and KO terms (Fig. 5B, Table S12). The term of “neuroactive ligand-receptor interaction” had the largest statistical value in the long strain.

Figure 5

Structural variations on each linkage group in long and short strains (A). The circos plot is drawn using Circos (version 0.69-9, http://circos.ca/). Structural variations include large-scale insertion and deletion, CNV duplication and deletion, and presence and absence of variation. The long and short strains are indicated by orange and blue lines, respectively. GO and KO terms are from function of genes with variants (B).

Full size image

A protein–protein interaction (PPI) network including enzymes involved in dopamine metabolism was constructed (Fig. 6). Tyrosine hydroxylase (Th) was connected with DOPA decarboxylase (Ddc) and dopamine N-acetyltransferase (Dat), and these enzymes have been reported as differentially expressed genes in the long strain analyzed by RNA-seq13. Th also had variations of DNA sequence (nonsynonymous variants) in the short strain (Fig. 6). Among the PPI network, proteins with nonsynonymous variants were more frequent in the long strain. Yellow-like protein had nonsynonymous variants in the long strain, and it was indirectly connected with Ddc and Th and directly with Dat.

Figure 6

Protein–protein interaction (PPI) network including enzymes involved in dopamine metabolism. Lines indicate the relationships between genes. The network map is drawn using the STRING (version 11.0, https://string-db.org/). Stars and triangles indicate genes with nonsynonymous variants and differentially expressed genes (DEGs), respectively.

Full size image

Pathways containing genes with nonsynonymous variants in both long and short strains were analyzed in “caffeine metabolism (tca00232)” (Fig. 7A), “tyrosine metabolism (tca00350)” (Fig. 7B), “tryptophan metabolism (tca00380)” (Fig. 7C), “metabolism of xenobiotics by cytochrome P450 (tca00980)” (Fig. 7D), “longevity regulating pathway—multiple species (tca04213)” (Fig. 7E), and “circadian rhythm—fly (tca04711)” (Fig. 7F). Tyrosine metabolism and longevity-regulating pathways have been listed as pathways containing focal genes with different expressions between long and short strains as detected by RNA-seq13. The numbers of variants of genes in these pathways were larger in the long strain than the short strain (Fig. 7).

Figure 7

Functional genes with frequency of nonsynonymous variants in KEGG pathways. Each pathway map is generated using the R package “Pathview” (version 1.30.1, https://bioconductor.org/packages/release/bioc/html/pathview.html). Caffeine metabolism (A), tyrosine metabolism (B), tryptophan metabolism (C), metabolism of xenobiotics by cytochrome P450 (D), longevity regulating pathway (E) and circadian rhythm (F) are indicated. Reference nucleotide (Tcas5.2) obtained from the NCBI genome database. The color gradient in the rectangular box represents nonsynonymous variant burden or differences in the intensity of fold change by RNA-seq between the long and short strains (left, long variant burden; center, fold change; right, short variant burden). The upper fold change means the expression value of the long strain is higher than that of the short strain. The lower fold change means the opposite.

Full size image


Source: Ecology - nature.com

Viral community analysis in a marine oxygen minimum zone indicates increased potential for viral manipulation of microbial physiological state

Q&A: Options for the Diablo Canyon nuclear plant