The full-length sequences of PacBio SMRT sequencing
Based on PacBio SMRT sequencing, 3,751,089, 3,434,452, 3,900,180, 8,535,019, and 4,435,846 subreads were generated for root, stem, leaf, flower, and seed, with a N50 of 3040, 3367, 2611, 2198, and 4584 bp, respectively (Table S1; Fig. S1). Subreads were processed to generate circular consensus sequences (CCSs). By detecting the primers and poly(A) tail, 238,196, 232,290, 211,535, 257,905, and 231,877 full-length non-chimeric (FLNC) reads were identified for root, stem, leaf, flower, and seed, with a mean length of 2633, 3070, 2561, 1746, and 3762 bp, respectively (Table S2; Fig. S2). After Iterative Clustering for Error Correction (ICE) clustering, polishing, base correction, de-redundancy, and non-plant sequences filtering, 37,789, 34,034, 38,100, 54,937, and 53,906 unigenes were retained for root, stem, leaf, flower, and seed, respectively, with an average unigene length of 1802–3786 bp and N50 of 2238–4707 bp (Table S2). The length of most unigenes from five organs exceeded 2000 bp, accounting for 68.88% of the total number (Table S3; Fig. 1A). Based on Benchmarking Universal Single-Copy Orthologs (BUSCO) assessment, about 88.1% (single-copy: 353; duplicated: 916) of the 1440 core embryophyte genes were found to be complete (90.6% were present when counting fragmented genes), suggesting the high integrity of the M. micrantha transcriptome (Fig. S3).
Length distribution of unigenes from PacBio SMRT sequencing (A) and Illumina RNA-Seq (B) across five organs.
De novo assembly of Illumina RNA-Seq data
Based on Illumina RNA-Seq, 43.23, 40.27, 41.01, 65.85, and 41.09 million clean reads were obtained for root, stem, leaf, flower, and seed, respectively, with Q20 exceeding 96.72%. Using Trinity software, clean reads were de novo assembled into 124,238, 60,232, 63,370, 93,229, and 66,411 unigenes for root, stem, leaf, flower, and seed. After filtering non-plant sequences, 124,233, 60,232, 63,370, 93,228, and 66,410 unigenes were finally retained for the five organs, respectively (Table S4). The length of most unigenes (84.70%) was shorter than 2000 bp (Table S3). In addition, the average length and N50 of unigenes generated by Illumina RNA-Seq were 1067–1312 bp and 1336–1685 bp, respectively, which were shorter than that from PacBio SMRT sequencing (Table S4; Fig. 1B).
Functional annotation
To obtain a comprehensive functional annotation of M. micrantha transcriptome, unigenes generated by PacBio SMRT sequencing were annotated in seven public databases, including NCBI non-redundant nucleotide sequences (NT), NCBI non-redundant protein sequences (NR), Gene Ontology (GO), Eukaryotic Orthologous Groups (KOG), Kyoto Encyclopedia of Genes and Genomes (KEGG), Swiss-Prot, and Pfam protein families. For root, stem, leaf, flower, and seed, 35,714 (94.51%), 32,614 (95.83%), 36,134 (94.84%), 49,197 (89.55%), and 50,962 (94.54%) unigenes were annotated to at least one database, respectively, suggesting that our transcriptome is well annotated and that most of unigenes may be functional (Table 1).
Based on NR database annotation, the top three homologous species for the five organs were Cynara cardunculus, Vitis vinifera, and Daucus carota (Fig. S4). The top homologous species was a plant of the Asteraceae family. For the GO function annotation, “binding”, “catalytic activities”, “metabolic process”, “cellular process”, “cell”, and “cell part” were functional categories with the most abundant unigenes (Fig. S5). In addition, numerous unigenes were assigned to “response to stimulus”, “response to biotic stimulus”, and “response to oxidative stress” category (Table S5). Positive response to stress stimuli is an important strategy for invasive plants to adapt to the environment. In the KEGG annotation, the top two pathways with the most abundant unigenes were “carbohydrate metabolism” and “translation”. Furthermore, “energy metabolism” and “environmental adaptation” were also worthy of attention, which are important pathways responsible for energy supply and stress responses (Fig. S6).
TFs identification and AS analysis
Using the iTAK pipeline, 1776 (root), 1293 (stem), 1627 (leaf), 2529 (flower), and 1733 (seed) unigenes were identified as TFs, which were classified into 68 families (Table S6). C3H (884), C2H2 (525), and bHLH (501) were the most abundant TF families (Fig. S7A). In addition, MYB (333) TFs were also found in the five organs. The differential expression levels of the top 15 TF families were further characterized. We found that the top 15 TF families had a certain amount of expression in the five organs of M. micrantha (Fig. S7B).
For root, stem, leaf, flower, and seed, 3300, 2324, 3219, 4730, and 3740 unique transcript models (UniTransModels) were constructed, among which the UniTransModels containing two isoforms were the most common (Fig. S8A). There were 329, 270, 358, 336, and 537 AS events identified in root, stem, leaf, flower, and seed, respectively. Retained introns (RIs) were detected as the most abundant AS event in all five organs, followed by alternative 3′ splice sites (A3) and alternative 5′ splice sites (A5). Mutually exclusive exons (MX) were the least frequent event (Fig. S8B).
Gene expression analysis
The number of unigenes in different expression level intervals was similar across the five organs (Fig. 2A). Using FPKM > 0.3 as the threshold for unigene expression, the total number of unigenes expressed in the five organs was 102,464 (Fig. 2B). Among them, 39,227 unigenes were co-expressed in all five organs. The information of differentially expressed genes (DEGs) identified in pairwise comparisons among the five organs is listed in Table S7. In total, 21,161 DEGs were identified among the five organs (Fig. S9). The number of DEGs between the five organs varied from 3469 (root vs stem) to 10,716 (leaf vs seed) (Fig. 2C). Notably, 933, 428, 1410, 1018, and 1292 DEGs showed significant higher expression in root, stem, leaf, flower, and seed, respectively (Figs. S10 and S11).
Gene expression patterns in five M. micrantha organs. (A) The FPKM interval distribution in the five organs. (B) Venn diagram of the number of unigenes expressed in five organs. (C) Number of differentially expressed genes in each pairwise comparison of five organs.
KEGG enrichment of unigenes with higher expression in each organ
According to the KEGG enrichment analysis results, there were obvious differences in enriched pathways in the five organs (Table S8; Fig. 3). The unigenes with higher expression in root were mainly enriched to defense response and protein processing pathways, such as “plant-pathogen interaction” and “protein processing in endoplasmic reticulum”. In stem, unigenes with higher expression were predominantly enriched to pathways related to the secondary metabolite, sugar, and terpenoid biosynthesis, such as “phenylpropanoid biosynthesis”, “starch and sucrose metabolism”, and “diterpenoid biosynthesis”. In flower, unigenes with higher expression were mainly related to “starch and sucrose metabolism”, “phenylpropanoid biosynthesis”, and “cutin, suberine, and wax biosynthesis”. The unigenes with higher expression in seed were mainly enriched in three fatty acid and sugar metabolism pathways, namely “biosynthesis of unsaturated fatty acids”, “galactose metabolism”, and “amino sugar and nucleotide sugar metabolism”. The unigenes with higher expression in leaf were significantly enriched in photosynthesis pathways, including “photosynthesis-antenna proteins”, “photosynthesis”, “porphyrin and chlorophyll metabolism”, and “carbon fixation in photosynthetic organisms”, which are important for the photosynthesis of M. micrantha.
The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of unigenes with higher expression in each organ. The significantly enriched pathways with corrected p-value (q value) < 0.05 are shown. Number indicates the size of the dot, describing the number of unigenes enriched in the pathway. The color bar represents the q value and indicates significance of the enrichment. The blue represents higher value, and the red represents lower value.
Discovery of gene families related to stress response and adaptation
For root, stem, leaf, flower, and seed, 62, 26, 59, 37, and 12 unigenes were identified as putative TPSs, respectively. Among them, most TPSs belonged to TPS-a subfamily (61 unigenes) and TPS-b subfamily (65), and only a small amount of TPSs were classified as TPS-e (7), TPS-f (1), and TPS-g subfamily (5) (Tables S9 and S10). The neighbor-joining (NJ) tree showed that TPS sequences were classified into six clades, among which TPS sequences from M. micrantha were clustered in TPS-a, TPS-b, TPS-c, TPS-e, and TPS-g clade (Fig. 4). Most sequences of TPS-a and TPS-b subfamily had higher expression in root, leaf, or flower of M. micrantha, and the rest TPS subfamilies were expressed most highly in different organs of M. micrantha.
Phylogenetic analysis of terpene synthase (TPS) sequences from M. micrantha and other angiosperms.
A total of 172 unigenes annotated as GSTs were obtained from five organs, which were classified into Zeta (52 unigenes), Theta (37), Tau (18), Phi (19), glutathionyl-hydroquinone reductases (GHR, 15), dehydroascorbate reductase (DHAR, 11), Lambda (7), and Beta (1) subfamily and some unigenes with undetermined subfamily (Tables S9 and S11). The NJ tree showed that 23 GST sequences were divided into four groups, namely Zeta, GHR, Phi, and Theta subfamily (Fig. 5A). Ten conserved motifs were identified from 23 GST sequences, and motif distribution was obviously different among the four subfamilies (Fig. 5C). Sequences from GHR subfamily were mainly expressed in root, flower, and seed, while sequences distributed in Zeta, Phi, and Theta subfamilies had certain expression levels in the five organs of M. micrantha (Fig. 5B).
Neighbor-joining (NJ) phylogenetic tree, expression level, and conserved motif of glutathione S-transferase (GST). (A) NJ tree. (B) Expression heatmap. (C) Conserved motif.
Pathway related to environmental adaptation
Timely scavenging of ROS under high photosynthetic efficiency and potential stresses is critical for the normal growth and development of M. micrantha. A total of 419 unigenes involved in antioxidant defense system were identified in the five organs, which included superoxide dismutase (SOD, 40 unigenes), catalase (CAT, 155), ascorbate peroxidase (APX, 67), monodehydroascorbate reductase (MDHAR, 19), glutathione peroxidase (GPX, 44), glutathione reductase (GR, 22), glucose-6-phosphate 1-dehydrogenase (G6PDH, 46), and 6-phosphogluconate dehydrogenase (6-PGD, 26) (Table S12). Among them, 78 unigenes were identified as DEGs in the five organs (Fig. 6A). SOD can directly oxidize superoxide radicals (O2•−) to hydrogen peroxide (H2O2), which is further converted into H2O through CAT enzyme. The DEGs responsible for the synthesis of SOD and CAT exhibited higher expression levels in root, stem, leaf, and seed than that in flower. Consistently, the other antioxidant enzymes used for scavenging of toxic H2O2, such as APX, MDHAR, GPX, GR, G6PDH, and 6-PGD, were also be found to be predominantly expressed in root, stem, leaf, and seed (Fig. 6B).
Antioxidant defense system in M. micrantha and the differentially expressed genes (DEGs) involved in the antioxidant defense in five organs of M. micrantha. (A) A simplified diagram of the antioxidant defense system. Numbers in brackets represent DEG numbers. (B) A heatmap of the DEGs involved in the antioxidant defense in five organs.
Plants produce a variety of terpenoids in response to biotic and abiotic stresses. A total of 385 unigenes from the five organs were assigned to MVA and MEP pathways22 (Table S13; Fig. 7). In the MVA pathway, acetyl-CoA acetyltransferase (AACT, 20 unigenes), 3-Hydroxy-3-methylglutaryl-CoA synthase (HMGS, 16), 3-Hydroxy-3-methylglutaryl-CoA reductase (HMGR, 49), mevalonate kinase (MK, 4), phosphomevalonate kinase (PMK, 3), diphosphomevalonate decarboxylase (MPDS, 6), isopentenyl-diphosphate delta-isomerase (IPPI, 13), and farnesyl diphosphate synthase (FPPS, 18) were identified as important components in the synthesis of sesquiterpene precursors. Germacrene D synthase (GDS) acts as a key terpene synthase, which can catalyze the precursors into sesquiterpene germacrene D. In this study, we identified 52 GDSs in five organs (Table S13). In the MEP pathway, 66, 6, 2, 12, 8, 41, 36, 13, 10, and 23 unigenes were annotated as 1-Deoxy-D-xylulose-5-phosphate synthase (DXS), 1-Deoxy-D-xylulose-5-phosphate reductoisomerase (DXR), 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase (MCT), 4-(cytidine 5-diphospho)-2-C-methyl-d-erythritol kinase (CMK), 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase (MDS), 4-Hydroxy-3-methylbut-2-enyl-diphosphate synthase (HDS), 4-Hydroxy-3-methylbut-2-en-1-yl diphosphate reductase (HDR), IPPI, geranyl diphosphate synthase (GPPS), and (3S)-linalool synthase (TPS14), which were responsible for the biosynthesis of monoterpene linalool. Among the 385 unigenes identified in the MVA and MVP pathways, 80 unigenes were determined as DEGs in the five organs. Nearly half of the DEGs (38 DEGs, accounting for 47.50%) displayed lower expression levels in the seed than that in the other four organs, while the number of DEGs expressed most highly was similar in the root (18 DEGs), stem (22), leaf (19), and flower (18) (Fig. 7).
MVA and MEP pathways22 for terpenoid biosynthesis in M. micrantha and the differentially expressed genes (DEGs) involved in the terpenoid biosynthesis in five organs of M. micrantha. Numbers in brackets represent DEG numbers. The heatmap represents the expression level of DEGs in five organs. R: root; S: stem, L: leaf, F: flower, and Se: seed.
Source: Ecology - nature.com