Main features of mycorrhizal genomes
We compared 62 draft genomes from mycorrhizal fungi, including 29 newly released genomes, and predicted 9344–31,291 protein-coding genes per species (see “Methods”, Supplementary Information and Supplementary Data 1). This set includes new genomes from the early diverging fungal clades in the Russulales, Thelephorales, Phallomycetidae, and Cantharellales (Basidiomycota), and Helotiales and Pezizales (Ascomycota). We combined these mycorrhizal fungal genomes with 73 fungal genomes from wood decayers, soil/litter saprotrophs, and root endophytes (Fig. 1 and Supplementary Data 2). There was little variation in the completeness of the gene repertoires, based on Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis (coefficient of variation, c.v. = 7.98), despite variation in assembly contiguity (Fig. 1). Genome size varied greatly within each phylum, with genomes of mycorrhizal fungi being larger than those of saprotrophic species (Figs. 1 and 2, and Supplementary Data 2; P < 0.05, generalized least squares with the Brownian motion model (GLS)). Glomeromycotina had exceptionally large genomes (Figs. 1 and 2, and Supplementary Data 2), with Gigaspora rosea having the largest genome (567 Mb) among the 135 fungi compared27.
Genome and assembly features. The different lifestyles (e.g., ectomycorrhiza) are color coded, see bottom panel. The dotted lines show median values. Genome (Mb): genome size in Mb, TE content (%): coverage (%) of transposable elements (TE) in assemblies, Genes (K): number of predicted genes, Secreted (K): number of predicted secreted proteins, Scaffolds (K): number of scaffolds, L50 (Mb): N50 length, BUSCO (%): genome completeness in % based on BUSCO survey. Source data Fig. 1.
The boxes represent median, upper, and lower quartiles with the whiskers showing minimal and maximal values, and outliers in circle. The small dots show single observations. The number of species per lifestyle is as follows: ectomycorrhizal fungi (n = 45), arbuscular mycorrhizal fungi (n = 10), ericoid mycorrhizal fungi (n = 4), orchid mycorrhizal fungi (n = 3), endophytes (n = 5), pathogens (n = 14), soil/litter saprotrophs (n = 32), and wood decayers (n = 17). Source data Fig. 1.
TEs in mycorrhizal genomes
The main driver of genome inflation appeared to be repeat content (Fig. 3 and Supplementary Data 3; P < 0.05 GLS), such as long terminal repeat retrotransposons (LTRs), which ranged from 0.01 to 46.4% of the assembly (Fig. 3 and Supplementary Data 3). The distribution of TE categories varies between ectomycorrhizal fungal taxa even within the same fungal orders or genera (Fig. 3), indicating that invasions by different TE families took place independently in different clades. However, the total TE coverage in genomes of ectomycorrhizal Ascomycota and Glomeromycota was significantly higher than in Basidiomycota (P < 0.05, GLS; Supplementary Data 3). In Ascomycota, the most abundant TE families are LTRs, such as Gypsy and Copia, and non-LTR I, whereas in Basidiomycota, Gypsy and Copia LTRs, Tad1, helitrons, and Zizuptons are abundant (Fig. 3). Lifestyle has a higher impact than phylogeny on TE coverage with a significantly higher TE content in ectomycorrhizal symbionts compared to other lifestyles for several TE categories (14–35% of total variances, P value <0.05; PERMANOVA) (Supplementary Fig. 1a, b, and Supplementary Data 3 and 4). In Agaricales, a large proportion of LTR insertions occurred during the past 5 million years (Mya) in Amanita rubescens, Laccaria bicolor, Laccaria amethystina, and Cortinarius glaucopus, while the major bursts are more ancient in Tricholoma matsutake (Supplementary Fig. 1c). These LTRs have been proliferating over the past 5–6 Mya for Tuber melanosporum, Tuber aestivum, and Choiromyces venosus, while LTR accumulation in Tuber magnatum occurred between 6 and 14 Mya (ref. 16).
The bubble size is proportional to the coverage of each of TE family (% indicated inside bubbles). The bars show the total coverage per genome. See also Supplementary Data 2–4. Source data are provided as a Source data Fig. 1.
Phylogeny, orthologous, and paralogous genes
Reconstructed phylogenetic relationships and estimated divergence dates are generally consistent with the dates recovered by previous studies. For example, we estimated the age of the most recent common ancestor (MRCA) of the Agaricomycetidae to be 132 Mya (Fig. 4a), while it was estimated to be 149 or 125 Mya in phylogenomic analyses by Floudas et al.26 and Kohler et al.12, but 191–176 Mya in a multigene megaphylogeny by Varga et al.28. We estimated the ages of the MRCAs of Agaricomycetes, Agaricales, Polyporales, Russulales, and Boletales to be 280, 116, 104, 88, and 82 Mya (Fig. 4a), respectively. We estimated the MRCA of Pezizomycotina at 326 Mya (Fig. 4b), while Floudas et al.26 obtained a mean age of 344 Mya.
a Basidiomycota. b Ascomycota. Fungal taxa are displayed according to their phylogeny (left panel). Bubbles with numbers at the tree nodes represent the total number of genes coding for total PCWDEs (intracellular and secreted) for ancestral nodes determined by COMPARE. The bubble size is proportional to the number of PCWDE genes. Heat maps contain the number of genes coding for substrate-specific PCWDEs for the extant species. Relative abundance of genes is represented by a color scale, from the minimum (blue) to maximum (red) number of copies per species. Ectomycorrhizal fungi are framed with a black line. See also Supplementary Data 8 and Supplementary Fig. 5. The tree is a chronogram estimated with r8s on the basis of a maximum likelihood phylogeny inferred with RAxML. The geological timescale (in million year) is indicated at the bottom. Source data are provided as a Source data Fig. 2.
We inferred gene families from the predicted proteomes using gene family clustering. A total of 68,923 and 129,258 gene families were identified in Ascomycota and Basidiomycota, respectively, and were used to infer orthology and paralogy (Supplementary Data 5). The species in our datasets contained substantial novelty in gene content. Species-specific genes ranged from 994 in T. melanosporum to 39,410 in Mycena galopus, for a total of 251,392 and 606,378 taxon-specific genes in Ascomycota and Basidiomycota, respectively.The large number of species-specific genes in M. galopus is the result of a series of striking expansion of gene families.
Functional gene categories encoded by mycorrhizal genomes
Hierarchical clustering of the presence and abundance of different Pfam protein domains identified genome-wide patterns of functional domain content among some of these fungi (Supplementary Fig. 2a). Arbuscular mycorrhizal fungi clustered together with Pfam categories showing a substantial differential abundance in genes encoding proteins with NUDIX, tetratricopeptide repeat, BTB/POZ, Sel1 repeat, ubiquitin, and high-mobility group box domains, corroborating our previous study27. This comparison also emphasizes some of the unique aspects of the genomes from the ectomycorrhizal ascomycete Acephala macrosclerotiorum (Helotiaceae) and ericoid mycorrhizal fungi (e.g., Oidiodendron maius, Meliniomyces species), such as the expansion of genes encoding proteins with FAD and AMP-binding domains, aldehyde dehydrogenases and sugar transporters. They share their Pfam pattern with soil saprotrophic ascomycete species, such as Chalara longipes and the dark septate endophyte (DSE) Phialocephala scopiformis. Ectomycorrhizal fungi in Basidiomycota are grouped with soil saprotrophs and wood decayers, and displayed no specific pattern in their primary and secondary metabolism gene repertoires that may explain their symbiosis-related ability. Similarly, hierarchical clustering of the presence and abundance of the different membrane transporters and transcriptional regulators revealed no specific pattern(s) for ectomycorrhizal fungi (Supplementary Fig. 2b, c).
Predicted secretomes of saprotrophs and symbiotrophs
As secreted proteins play a key role in SOM decomposition and symbiosis development, we compared predicted secretomes, including carbohydrate-active enzymes (CAZymes), lipases, proteases, and other secreted proteins, such as effector-like small secreted proteins (SSPs; Supplementary Data 6). The number of genes encoding secreted proteins represents 4.6–5.7% of the total protein repertoire for Basidiomycota and Ascomycota, respectively (Supplementary Fig. 3). The proportions of secreted proteins within the different protein categories were consistent in both phyla (Supplementary Fig. 3, and Supplementary Data 6 and 7). About half of the secreted proteins were SSPs (Supplementary Fig. 3a and Supplementary Data 7c). Only secreted CAZymes and SSPs showed significant differences in relative abundance among lifestyles (P < 0.01; generalized Campbell and Skillings procedure; Supplementary Fig. 3 and Supplementary Data 8), with orchid and ericoid mycorrhizal symbionts possessing the largest CAZyme repertoires. On the other hand, the average number of secreted proteases and lipases is similar in saprotrophs and symbiotrophs. No expansion of gene families coding for secreted proteases, secreted phosphatases, and phytases were found in ectomycorrhizal fungi (Supplementary Fig. 3c).
Losses of PCWDEs
In Basidiomycota, ectomycorrhizal species contain significantly fewer secreted CAZymes acting on cellulose, hemicellulose, pectins, lignin, suberins, and tannins (P < 0.01, the generalized Campbell and Skillings procedure; Fig. 5, Supplementary Fig. 3c and Supplementary Data 6) than all other ecological guilds, i.e., lifestyles. They are also reduced in secreted auxiliary activity (AA) enzymes (including class II PODs, laccases, lytic polysaccharide monooxygenases (LPMO)), carbohydrate esterases (CE), polysaccharide lyases (PL), and associated cellulose-binding modules (e.g., CBM1; Supplementary Figs. 3–5, and Supplementary Data 7 and 8). Notably, there is no invertase GH32 gene in their genome, except for Cantharellus anzutake, implying that ectomycorrhizal basidiomycetes are unable to use sucrose directly from the plant. Except for Acephala macrosclerotiorum, the number of PCWDEs in ectomycorrhizal Ascomycota is lower than those of most of other groups, excluding yeasts and arbuscular mycorrhizal fungi (Figs. 4 and 5, and Supplementary Figs. 3–5). According to RedoxiBase (http://peroxibase.toulouse.inra.fr), none of the class II PODs of ectomycorrhizal species are ligninolytic (i.e., ligninolytic POD or LiP), except those of Gautieria morchelliformis (see below; Supplementary Fig. 4 and Supplementary Data 6p). Below, we discuss the PCWDEs of ectomycorrhizal fungi, emphasizing newly sampled lineages with previously unknown decomposition capacity.
Bubbles with numbers contain the number of genes coding for a series of secreted PCWDEs needed for cellulose and lignin degradation. Taxa are color coded according to their lifestyle (see bottom panel). See also Supplementary Data 6a, j. Left panel, Ascomycota and Mucoromycota; right panel: Basidiomycota. Source data are provided as a Source data Fig. 3.
Cantharellales arose before the origin of ligninolytic class II POD26,28,29,30. Both saprotrophs (Botryobasidium botryosum, Sistotrema sp.) and orchid symbionts (Tulasnella calospora, Ceratobasidium sp.) in Cantharellales possess large sets of enzymes acting on cellulose, hemicellulose, and pectins, including LPMOs (AA9; Figs. 4 and 5, and Supplementary Figs. 3–5). We sequenced the first genomes of ectomycorrhizal Cantharellales, Hydnum rufescens, and C. anzutake, and found that they have highly reduced repertoires of secreted PCWDEs. In this regard, the ectomycorrhizal symbionts of Cantharellales are more similar to ectomycorrhizal Agaricales and Boletales than to orchid symbionts of Cantharellales. These results provide another independent example of convergent loss of PCWDEs in ectomycorrhizal lineages, and highlight the different decomposition capacities of two guilds of plant symbionts.
In contrast to Cantharellales, Phallomycetidae diverged shortly after the evolution of ligninolytic class II POD. The one previously published genome from this group, the saprotrophic “cannonball fungus”, Sphaerobolus stellatus, has an astonishingly large repertoire of PCWDEs, including 63 class II PODs. The two genomes of ectomycorrhizal Phallomycetidae reported here, G. morchelliformis (Gomphales) and Hysterangium stoloniferum (Hysterangiales)31, have diverse PCWDEs acting on cellulose, hemicellulose, pectins, and lignin, with 31 and five genes encoding ligninolytic class II PODs, respectively (Figs. 4 and 5, Supplementary Figs. 3–5 and Supplementary Data 8). Although many Ramaria species (Gomphales) form ectomycorrhiza, the newly sequenced Ramaria rubella (subgenus Lentoramaria) is likely a litter decomposer, which is consistent with its possession of 13 class II PODs, 6 cellulose-acting LPMOs, and GH6 and GH7 cellobiohydrolases. Thus, a robust suite of PCWDEs appears to be a characteristic of both saprotrophs and ectomycorrhizal species in Phallomycetidae.
Ectomycorrhizal Russulales and Thelephorales, for which we report the first genomes, both have highly reduced complements of PCWDEs. The two Thelephorales species (Thelephora terrestris and T. ganbajun) resemble Boletales in having a highly reduced suite of the major enzymes acting on crystalline cellulose and lignin (three cellulose-acting LPMOs, no class II POD, and no GH6 or GH7). Ectomycorrhizal Russulales (Lactarius quietus, Russula emetica, and Russula ochroleuca) also have small repertoires of PCWDEs (no CBM1, one or two cellulose-acting LPMOs, no GH6 or GH7, no CE1, and no LiP), but they retain one atypical Mn POD gene (Supplementary Data 6p). In contrast, saprotrophic Russulales (e.g., Lentinellus vulpinus) have a typical white rot array of PCWDEs.
In Ascomycota, species in Tuberaceae followed the general symbiotroph trend, except C. venosus, which has a large complement of PCWDEs (Figs. 4a and 5, see ref. 16). Other ectomycorrhizal Pezizales, such as the newly sequenced Wilcoxina mikolae and Trichophaea hybrida (Pyronemataceae), also have few PCWDEs. The desert truffles Kalaharituber pfeilii and Terfezia boudieri (Pezizaceae) both have ectomycorrhiza-like restricted suites of PCWDEs, but they differ in the numbers of genes encoding cellulose-acting LPMOs (ten and one copies, respectively) and they encode one copy of GH32 invertase. Like other ectomycorrhizal species, A. macrosclerotiorum contained no class II PODs, but had a large set of PCWDEs (282 genes, including 17 cellulose-acting LPMOs, and genes encoding GH6, GH7, GH32, GH45, PL1, and PL3; Figs. 4 and 5, Supplementary Figs. 3–5 and Supplementary Data 8). Closely related taxa to A. macrosclerotiorum mainly contain DSEs, such as P. scopiformis, saprotrophs, and pathogens, which are characterized by a larger set of PCWDEs compared to ectomycorrhizal fungi. To assess whether this large repertoire of PCWDEs is expressed during symbiosis development, we carried out transcript profiling of Pinus sylvestris–A. macrosclerotiorum ectomycorrhizas using RNA-Seq. The fungal symbiont develops a dense Hartig net within the root cortex, but no mantle sheath (Supplementary Fig. 6a). Transcript profiling of ectomycorrhizas showed that most of the PCWDEs are not induced in symbiotic tissues (Supplementary Fig. 6b and Supplementary Data 16), suggesting a tight transcriptional control of the PCWDE gene expression.
MCWDEs are retained in ectomycorrhizal fungi
We explored the genetic capabilities of the sequenced fungi to decompose microbial (i.e., bacterial and fungal) cell walls by comparing PCWDE to MCWDE repertoires (Figs. 4 and 5, Supplementary Figs. 3–5 and Supplementary Data 7). The proportion of MCWDE (acting on chitin, glucans, mannans, and peptidoglycans) in ectomycorrhizal fungi is similar to that in saprotrophs. Thus, ectomycorrhizal fungi have generally retained MCWDE genes (e.g., chitinases, ß-1,3-glucanases) although they have lost most PCWDEs (e.g., endo- and exocellulases). Ectomycorrhizal fungi may use secreted MCWDE to scavenge nitrogen compounds trapped in SOM (e.g., chitin) by selectively using these hydrolytic enzymes in addition to oxidative mechanisms32. Some of these chitin-, glucan-, and mannan-active enzymes are likely involved in fungal cell wall remodeling during complex developmental processes, such as ectomycorrhiza and sporocarp formation10,11,33.
Mycorrhizal development is driven by gene co-option
We used a phylostratigraphic approach to characterize the evolutionary origins of ectomycorrhizal lineages on the basis of gene functions in extant organisms. We examined the age distribution of genes induced at different stages of ectomycorrhiza establishment, so-called symbiosis-induced genes, by defining phylogenetic ages (phylostrata), that correspond to internal nodes of the tree along the lineage leading from the root to the symbiotic species for which transcriptomic data are available (Fig. 6, Supplementary Fig. 7 and Supplementary Data 9). In Ascomycota and Basidiomycota, an average of 74% and 67% of the ectomycorrhiza-induced genes predated the evolution of ectomycorrhizal symbiosis, respectively (Fig. 6). Approximately 6% and 18% of these genes were already present in the MRCAs of the Ascomycota and that of the Basidiomycota (e.g., for L. bicolor, hydrophobins, GH131, GH28, and CBM1_GH5), respectively (Fig. 6 and Supplementary Fig. 7). No specific phylostrata was enriched in symbiosis-induced genes (Fisher’s exact test, P < 0.005; Supplementary Data 9b), suggesting that there was no distinguished period during ectomycorrhiza evolution caracterized by an excess number of gene birth events. These observations imply that symbiosis-induced genes have mostly been co-opted for ectomycorrhiza development during evolution from saprotrophic ancestors. However, the phylostratigraphic analysis also suggested that an average of 9–22 % (in Ascomycota; Fig. 6a) and 19–20 % (in Basidiomycota; Fig. 6b) of ectomycorrhiza-induced genes are restricted to specific mycorrhizal lineages. Given our current taxon sampling, it is difficult to distinguish genes that coincidentally evolved with mycorrhiza formation from species-specific orphan genes, except in the case of three clades that comprise more than a single ectomycorrhizal species, Boletales, the genus Laccaria and Tuberaceae, which comprise more than a single ectomycorrhizal species. In these clades, the proportion of symbiosis-induced genes that map to the origin of ectomycorrhiza showed a large variation, from 0 to 0.8% in the Boletales (two species), to 10% in Laccaria, and 14% in the Tuberaceae (Fig. 6).
Pie charts on the time-calibrated trees represent the number of gene clusters containing ectomycorrhiza-specific upregulated genes from the species compared. Numbers and percentage of genes mapping to phylostrata are shown on the right of the trees. Ectomycorrhizal species used for the comparison are in green boxes. Upregulated genes were selected according to FDR adjusted P value < 0.05. a Ascomycota, the fold change (FC) used for defining ectomycorrhiza-specific upregulated genes was FC ≥ 5. b Basidiomycota, the FC used for defining ectomycorrhiza-specific upregulated genes was FC ≥ 5. See Supplementary Data 9 for the identified phylostrata containing ectomycorrhiza-upregulated genes and enrichment statistics. Source data are provided as a Source data Fig. 4.
To test our phylostratigraphic results, we assessed the evolutionary conservation of 5917 ectomycorrhiza-induced transcripts identified in ten different ectomycorrhizal interactions among the 135 studied fungal genomes (Fig. 7a). We found that 15.2 % of the 5917 symbiosis-induced genes are shared by all species in our dataset (clusters IV and V). Most encode proteins involved in core metabolic or signaling functions. Genes from cluster III are conserved within Basidiomycota only, while genes from cluster VII are shared by Ascomycota only. Most of them have no known function (i.e., no KOG domain). Altogether, 31.2% of these symbiosis-induced genes are species specific (cluster VI). Most of these genes encode proteins with unknown KOG functions or mycorrhiza-induced small secreted proteins (MiSSPs). The proportion of species-specific, ectomycorrhiza-induced genes is therefore substantial, as reported earlier10,12,16 and suggested by our phylostratigraphic analysis.
We analyzed genes coding for ectomycorrhiza-induced proteins, secreted proteins and ectomycorrhiza-induced secreted proteins among 135 fungi. Heat maps show BLASTP sequence similarity of 5917 ectomycorrhiza-induced proteins (a), 6669 genes coding for secreted proteins (b), and 1028 ectomycorrhiza-induced secreted proteins (c) from ten ectomycorrhizal fungi, with available symbiotic transcriptomes among 135 genomes of Basidiomycota and Ascomycota. The heat maps depict a double-hierarchical clustering of protein sequences encoded by symbiosis-upregulated genes, genes coding for secreted proteins and genes coding for symbiosis-upregulated secreted proteins (rows, fold change ≥ 5 in symbiotic tissues compared to free-living mycelium, false discovery rate-corrected P ≤ 0.05); right panel, functional categories (KOG) are given for each cluster of sequences in % as bargrams. Clusters I–VIII correspond to group of sequences sharing the same level of protein sequence similarity based on BlastP. The distribution of transcripts belonging to Ascomycota and Basidiomycota in each cluster is shown as pie charts. Data were visualized and clustered using R (package HeatPlus97). The hierarchical clustering was done by using a Euclidian distance and Ward clustering method. Color scale on the left (white to red) shows the % of sequence identity according to BLASTP. The symbiosis-induced genes were retrieved from the ectomycorrhizal transcriptomes of ten species: A. macrosclerotiorum EW76-UTF0540, A. muscaria Koide, C. geophilum 1.58, H. cylindrosporum h7, L. bicolor S238N, P. involutus ATCC 200175, P. croceum F1598, P. microcarpus 441, T. matsutake 945, and T. magnatum (see Supplementary Data 10, 13 and 14). GEO accession codes are provided in the “Methods/Phylostratigraphy” section. Source data are provided as a Source data Fig. 5.
Next, we examined whether the same or different gene families were co-opted in independent ectomycorrhizal lineages, i.e., whether co-option was convergent or divergent during fungal evolution. We quantified convergence by the extent of overlap among conserved ectomycorrhiza-induced gene families within independent lineages. Conserved ectomycorrhiza-induced genes showed little or no overlap among the analyzed species (Fig. 6 and Supplementary Data 9). For example, there were only eight clusters of ectomycorrhiza-induced genes common to at least four Basidiomycota symbionts (Supplementary Data 9). Similarly, only 12 gene clusters were shared by at least three Ascomycota symbionts (Supplementary Data 9). This low overlap between co-opted genes suggests that independently evolved ectomycorrhizal lineages recruited different ancestral gene families for symbiosis, in addition to a likewise unique set of novel genes, that evolved after the origins of symbiosis12.
Symbiosis-induced secreted proteins in symbionts
The expression of genes encoding secreted and symbiosis-induced secreted proteins from ectomycorrhizal fungi was measured by RNA-Seq profiling in ten ectomycorrhizal associations (Supplementary Methods). By using a BLASTP-based analysis, we assessed the evolutionary conservation of the expressed 6669 secreted proteins (Fig. 7b) and 1028 symbiosis-induced secreted proteins (Fig. 7c) among 135 fungal species (Supplementary Datas 12, 13 and 14). A substantial proportion of the secreted proteins are species-specific SSPs (cluster III, Fig. 7b). In addition, we found that 38.1% of symbiosis-induced secreted proteins are shared by all species of fungi (clusters I and III). Most code for core metabolic or signaling functions and CAZymes. Genes from cluster II (9.3%) are conserved within Basidiomycota only. Genes from cluster IV (21.4%) are conserved mainly in Ascomycota, showed a lower similarity in Basidiomycota and are poorly conserved in Glomeromycota. They encode MiSSPs, CAZymes, proteins with unknown KOG function and proteins of signaling and metabolic pathways. Altogether, 31.2% of these 1028 symbiosis-induced genes are mostly species specific (clusters V–VII). Most of these genes encode MiSSPs, proteins with unknown KOG functions and proteins of signaling pathways. The phylostratigraphic analysis of secreted proteins and MiSSPs corroborated these results (Supplementary Fig. 7).
Source: Ecology - nature.com