The MiDAS global consortium was established in 2018 to coordinate the sampling and collection of metadata from WWTPs across the globe (Supplementary Data 1). Samples were obtained in duplicates from 740 WWTPs in 425 cities, 31 countries on six continents (Fig. 1a). The majority of the WWTPs were configured with the activated sludge process (69.7%) (Fig. 1b), and these were the main focus of the subsequent analyses. Nevertheless, WWTPs based on biofilters, moving bed bioreactors (MBBR), membrane bioreactors (MBR), and granular sludge were also sampled to cover the microbial diversity in other types of WWTPs. The activated sludge plants were designed for carbon removal only (C; 22.1%), carbon removal with nitrification (C,N; 9.5%), carbon removal with nitrification and denitrification (C,N,DN; 40.9%), and carbon removal with nitrogen removal and enhanced biological phosphorus removal, EBPR (C,N,DN,P; 21.7%) (Fig. 1c). The first type represents the simplest design whereas the latter represents the most advanced process type with varying oxic and anoxic stages or compartments.
MiDAS 4: a global 16S rRNA gene catalogue and taxonomy for WWTPs
Microbial community profiling at high taxonomic resolution (genus- and species-level) using 16S rRNA gene amplicon sequencing requires a reference database with high-identity reference sequences (≥99% sequence identity) for the majority of the bacteria in the samples and a complete seven-rank taxonomy (domain to species) for all reference sequences16,20. To create such a database for bacteria in WWTPs globally, we applied synthetic long-read full-length 16S rRNA gene sequencing20,21 on samples from all WWTPs included in this study.
More than 5.2 million full-length 16S rRNA gene sequences were obtained after quality filtering and primer trimming. The sequences were processed with AutoTax20 to yield 80,557 full-length 16S rRNA gene amplicon sequence variant (FL-ASVs). These reference sequences were added to our previous MiDAS 3 database16, providing a combined database (MiDAS 4) with a total of 90,164 unique, chimera-free FL-ASV reference sequences. The absence of detectable chimeric sequences is a unique feature of the database and is achieved due to the attachment of unique molecular identifiers (UMIs) to each end of the original template molecules before any PCR amplification steps21. This allows filtering of true biological sequences from chimera already in the synthetic long-read assembly20,21. The novelty of the FL-ASVs were determined based on the percent identity shared with their closest relatives in the SILVA 138 SSURef NR99 database and the threshold for each taxonomic rank proposed by Yarza et al.22. Out of all FL-ASVs, 88% had relatives above the genus-level threshold (≥94.5% identity) and 56% above the species-level threshold (≥98.7% identity) (Fig. 2 and Table 1).
MiDAS 4 provides placeholder names for many environmental taxa
Although only a small percentage of the reference sequences in MiDAS 4 represented new putative taxa at higher ranks (phylum, class, or order) according to the sequence identity thresholds proposed by Yarza et al.22, a large number of sequences lacked lower-rank taxonomic classifications and was assigned de novo placeholder names by AutoTax20 (Fig. 2 and Table 1). In total, de novo taxonomic names were generated by AutoTax for 26 phyla (30.6% of observed), 83 classes (37.2% of observed), 297 orders (46.8% of observed), and more than 8000 genera (86.3% of observed). Without the de novo taxonomy we would not be able to discuss these taxa across studies to unveil their potential role in wastewater treatment systems.
Phylum-specific phylogenetic trees were created to determine if the FL-ASV reference sequences that were assigned to de novo phyla were actual phyla or simply artifacts related to the naive sequence identity-based assignment of de novo placeholder taxonomies (Supplementary Fig. 1a). The majority (65 FL-ASVs) created deep branches from within the Alphaproteobacteria together with 16S rRNA gene sequences from mitochondria, suggesting they represented divergent mitochondrial genes rather than true novel phyla. We also observed several FL-ASVs assigned to de novo phyla that branched from the classes Parcubacteria (3 FL-ASVs) and Microgenomatis (22 FL-ASVs) within the Patescibacteria phylum. These two classes were originally proposed as superphyla due to an unusually high rate of evolution of their 16S rRNA genes23,24. It is, therefore, likely that these de novo phyla are also artefacts due to the simple taxonomy assignment approach, which does not take different evolutionary rates into account20. Most of the class- and order-level novelty was found within the Patescibacteria, Proteobacteria, Firmicutes, Planctomycetota and Verrucomicrobiota. (Supplementary Fig. 1b). At the family- and genus-level, we also observed many de novo taxa affiliated to Bacteroidota, Bdellovibrionota and Chloroflexi.
MiDAS 4 provides a common taxonomy for the field
The performance of the MiDAS 4 database was evaluated based on an independent amplicon dataset from the Global Water Microbiome Consortium (GWMC) project2, which covers ~1200 samples from 269 WWTPs. The raw GWMC amplicon data of the 16S rRNA gene V4 region was resolved into ASVs, and the percent identity to their best hits in MiDAS 4 and other reference databases was calculated (Fig. 3). The MiDAS 4 database had high-identity hits (≥99% identity) for 72.0 ± 9.5% (mean ± SD) of GWMC ASVs with ≥0.01% relative abundance, compared to 57.9 ± 8.5% for the SILVA 138 SSURef NR99 database, which was the best of the universal reference databases (Fig. 3). The relative abundance cutoff selects taxa that likely have a quantitative impact on the ecosystem while filtering out the rare biosphere which includes many bacteria introduced with the influent wastewaters25. Similar analyses of ASVs obtained from the samples included in this study showed, not surprisingly, even better performance with high-identity hits for 90.7 ± 7.9% of V1–V3 ASVs and 90.0 ± 6.6% of V4 ASVs with ≥0.01% relative abundance, compared to 60.6 ± 11.9% and 73.9 ± 10.3% for SILVA (Supplementary Fig. 2a). Although the sampling of WWTPs was focused towards activated sludge plants, the MiDAS 4 database also includes high-identity references for most ASVs in other plant types (granules, biofilters, etc.) (Supplementary Fig. 2b). This suggests that most taxa were shared across plant types, although often present in other relative abundances.
Using MiDAS 4 with the SINTAX classifier, it was possible to obtain genus-level classifications for 75.0 ± 6.9% of the GWMC ASVs with ≥0.01% relative abundance (Fig. 3). In comparison, SILVA 138 SSURef NR99, which was the best of the universal reference databases, could only classify 31.4 ± 4.2% of the ASVs to genus-level. When MiDAS 4 was used to classify amplicons from this study, we obtained genus-level classification for 92.0 ± 4.0% of V1–V3 ASVs and 84.8 ± 3.6% of V4 ASVs (Supplementary Fig. 2a). This is close to the theoretical limit set by the phylogenetic signal provided by each amplicon region analyzed20. Improved classifications were also observed for archaeal V4 ASVs (93.3 ± 10.6% for MiDAS 4 vs 69.3 ± 21.3% for SILVA), although no additional archaeal reference sequences were added to the MiDAS database in this study.
MiDAS 4 was also able to assign species-level classifications to 40.8 ± 7.1% of the GWMC ASVs. In contrast, the 16S rRNA gene reference database obtained from GTDB SSU r89, which is the only universal reference database that contains a comprehensive species-level taxonomy, only classified 9.9 ± 2.0% of the ASVs (Fig. 3). For the ASVs created in this study, MiDAS 4 provided a species-level classification for 68.4 ± 6.1% of the V1–V3 and 48.5 ± 6.0% of the V4 ASVs (Supplementary Fig. 2a).
Based on the large number of WWTPs sampled, their diversity, and the independent evaluation based on the GWMC dataset2, we expect that the MiDAS 4 reference database essentially covers the large majority of bacteria in WWTPs worldwide. Therefore, the MiDAS 4 taxonomy should act as a shared vocabulary for wastewater treatment microbiologists, providing opportunities for cross-study comparisons and ecological studies at high taxonomic resolution.
Comparison of the V1–V3 and V4 primer sets for community profiling of WWTPs
Before investigating what factors shape the activated sludge microbiota, we compared short-read amplicon data created for all activated sludge samples belonging to the four main process types (C; C,N; C,N,DN and C,N,DN,P) collected in the Global MiDAS project using two commonly used primer sets that target the V1–V3 or V4 variable region of the 16S rRNA gene. The V1–V3 primers were chosen because the corresponding region of the 16S rRNA gene provides the highest taxonomic resolution of common short-read amplicons20,26, and these primers have previously shown great correspondence with metagenomic data and quantitative fluorescence in situ hybridisation (FISH) results for wastewater treatment systems17. The V4 region has a lower phylogenetic signal, but the primers used for amplification have better theoretical coverage of the bacterial diversity in the SILVA database20,26.
The majority of genera (62%) showed less than twofold difference in relative abundances between the two primer sets, and the rest were preferentially detected with either the V1–V3 or the V4 primer (19% for both) (Fig. 4). We observed that several genera of known importance detected in high abundance by V1–V3 were hardly observed by V4, including Acidovorax, Rhodoferax, Ca. Villigracilis, Sphaerotilus and Leptothrix. Similarly, we observed genera abundant with V4 but strongly underestimated by V1–V3, such as Acinetobacter and Prosthecobacter. A complete list of differentially detected genera (Supplementary Data 2) serves as a valuable tool in combination with in silico primer evaluation for deciding which primer pair to use for targeted studies of specific taxa.
Because the V1–V3 primers provide better classification rates at the genus- and species-level (Supplementary Fig. 2a), we primarily focused on this dataset for the following analyses. It should be noted that the V1–V3 primer set performs poorly on anammox bacteria27,28 and does not target archaea at all. To determine the importance of these groups, we estimated their relative read abundance using the V4 amplicon data. Ca. Brocadia and Ca. Anammoximicrobium were the only anammox genera detected, and the latter was never more than 0.6% abundant. Ca. Brocadia was observed in MBBR reactors and granular sludge in anammox reactors with relative read abundances reaching 29%, but it was below 0.1% relative abundance in all but two of the activated sludge samples investigated. For archaea, the relative read abundance was generally low (median = 0.18%), but for a few WWTPs high (up to 11.7%), so archaea should not be neglected in these cases.
Process and environmental factors affecting the activated sludge microbiota
Alpha diversity analysis revealed that the rarefied (10,000 read per sample) richness and diversity in activated sludge plants were most strongly affected by process type, industrial load and continent (Supplementary Fig. 3 and Supplementary Note 1). The richness and diversity increased with the complexity of the treatment process, as found in other studies, reflecting the increased number of niches29. In contrast, it decreased with high industrial loads, presumably because industrial wastewater often is less complex and therefore promotes the growth of fewer specialised species7. The effect of continents is presumably caused by the necessary unbalanced sampling of WWTPs and confounded by the effects of plant types and industrial loads.
Distance decay relationship (DDR) analyses were used to determine the effect of geographic distance on the microbial community similarity of activated sludge plants with the four main process types (Supplementary Fig. 4 and Supplementary Note 2). We found that distance decay was only effective within shorter geographical distances (<2500 km), which suggests that the microbiota was partly shaped by immigrating bacteria from the source community as recently observed25. In addition, we observed low similarity between geographically separated samples (>2500 km) at the ASV-level, but higher similarities with OTUs clustered at 97% and even more at the genus-level. This suggests that many ASVs are geographically restricted and functionally redundant in the activated sludge microbiota, so different strains or species from the same genus across the world may provide similar functions.
To gain a deeper understanding of the factors that shape the activated sludge microbiota, we examined the genus-level taxonomic beta-diversity using principal coordinate analysis (PCoA) and permutational multivariate analysis of variance (PERMANOVA) analyses (Fig. 5 and Supplementary Note 3). We have chosen taxonomic diversity instead of phylogenetic diversity (UniFrac) because many of the important traits are categorical (yes/no) and only conserved at lower taxonomic ranks (genus/species). The analysis was made at the genus-level due to the high classification rate achieved with MiDAS 4 and because genera were less affected by DDR compared to ASVs. We found that the overall microbial community was most strongly affected by continent and temperature in the WWTPs. However, process type, industrial load and the climate zone also had significant impacts. The percentage of total variation explained by each parameter was generally low, indicating that the global WWTPs microbiota represents a continuous distribution rather than distinct states, as observed for the human gut microbiota30.
Genera selected for by process type and temperature
Redundancy analyses (RDA) were used to identify which genera were the strongest indicators for specific processes and/or environmental conditions. RDA was performed on both V1–V3 (Supplementary Fig. 5) and V4 (Supplementary Fig. 6) amplicon datasets to ensure that essential taxa were not missed due to primer bias. We here highlight the results for process type and temperature. Results for the other parameters and RDA scores for all analyses can be found in Supplementary Note 4 and Supplementary Data 3, respectively.
The RDA analyses of process types revealed that genera commonly involved in nitrification (Nitrosomonas and Nitrospira), denitrification (Rhodoferax, Sulfuritalea) and the polyphosphate-accumulating organisms (PAOs) (Tetrasphaera, Ca. Accumulibacter and Dechloromonas) were strongly enriched in more advanced process types along with de novo taxa midas_g_17 (family: Saprospiraceae), midas_g_72 (class: Anaerolineae) and midas_g_300 (order: Sphingobacteriales). Conversely, carbon removal plants were enriched with Hydrogenophaga and Prevotella, the filamentous genera Sphaerotilus and Thiothrix, and the glycogen-accumulating organisms (GAOs) Ca. Competibacter and Defluviicoccus. Specific to the EBPR plants were an increased abundance of known PAOs (see above) and Azospira, Propionivibrio, Propioniciclava, Ca. Amarolinea and the de novo taxa midas_g_399 (class: Actinobacteria), midas_g_384 (family: Saprospiraceae) and midas_g_945 (class: Elusimicrobia). The latter genera should be considered targets for further characterisation as potential PAOs or GAOs.
The RDA based on temperature showed that high temperatures were associated with an increased abundance of Ca. Competibacter, Thauera, Defluviicoccus, Azospira, Rhodoplanes, Ottowia and Phaeodactylibacter, whereas lower temperatures favoured the presence of Flavobacterium, Tetrasphaera, Ferruginibacter, Trichococcus, Ca. Epiflobacter and Acinetobacter. These differences suggest that plants with similar designs and operations may have differences in community structure depending on prevailing temperature conditions.
Core and conditional rare or abundant taxa in the global activated sludge microbiota
Core taxa are commonly defined in complex communities based on how frequently specific taxa are observed in samples from a well-defined habitat31. In addition, an abundance threshold can be applied to select for those taxa that may likely have a quantitative impact on ecosystem functioning6. We here used three frequency thresholds for the core taxa with >0.1% relative abundance in 80% (strict core), 50% (general core) and 20% (loose core) of all activated sludge plants (Fig. 6a).
In addition to the core taxa, we also identified conditionally rare or abundant taxa (CRAT)32 (Fig. 6b). These are taxa typically present in low abundance but occasionally become prevalent, including taxa related to process disturbances, such as bacteria causing activated sludge foaming or those associated with the degradation of specific residues in industrial wastewater. CRAT have only been studied in a single WWTP treating brewery wastewater, despite their potential effect on performance32,33. CRAT are here defined as taxa which are not part of the core, but present in at least one WWTP with a relative abundance above 1%.
Core taxa and CRAT were identified for both the V1–V3 and V4 amplicon data to ensure that critical taxa were not missed due to primer bias. We identified 250 core genera (15 strict, 65 general and 170 loose) and 715 CRAT genera (Supplementary Data 4). The strict core genera (Fig. 7) mainly contained genera with versatile metabolisms found in several environments, including Flavobacterium, Novosphingobium and Haliangium. The general core (Fig. 7) included many known bacteria associated with nitrification (Nitrosomonas and Nitrospira), polyphosphate accumulation (Tetrasphaera, Ca. Accumulibacter) and glycogen accumulation (Ca. Competibacter). The loose core contained well-known filamentous bacteria (Ca. Microthrix, Ca. Promineofilum, Ca. Sarcinithrix, Gordonia, Kouleothrix and Thiothrix), but also Nitrotoga, a less common nitrifier in WWTPs.
Because MiDAS 4 allowed for species-level classification, we also identified core and CRAT species based on the same criteria as for genera (Supplementary Fig. 7 and Supplementary Data 4). This revealed 113 core species (0 strict, 9 general and 104 loose). The general core species (Fig. 7) included Nitrospira defluvii and Tetrasphaera midas_s_5, a common nitrifier and PAO, respectively. Arcobacter midas_s_2255, a potential pathogen commonly abundant in the influent wastewater, was also part of the general core34. The loose core contained additional species associated with nitrification (Nitrosomonas midas_s_139 and Nitrospira nitrosa), polyphosphate accumulation (Ca. Accumulibacter phosphatis, Dechloromonas midas_s_173, Tetrasphaera midas_s_45), as well as known filamentous species (Ca. Microthrix parvicella and midas_s_2 (recently named Ca. M. subdominans35), Ca. Villigracilis midas_s_471 and midas_s_9223, Leptothrix midas_s_884). In addition to the core species, we identified 1417 CRAT species. As CRAT taxa are generally found in low abundance and the current study does not include time series or influent data, we cannot say anything conclusive about their general implications for the ecosystem. However, they may be present due to short-term mass immigration25 or specific operational conditions36 and in both cases, potentially affect the plant operation. They should therefore be considered important target for further investigations together with the core taxa.
Many core taxa and CRAT can only be identified with MiDAS 4
The core taxa and CRAT included a large proportion of MiDAS 4 de novo taxa. At the genus-level, 106/250 (42%) of the core genera and 500/715 (70%) of the CRAT genera had MiDAS placeholder names. At the species-level, the proportion was even higher. Here placeholder names were assigned to 101/113 (89%) of the core species and 1352/1417 (95%) CRAT species. This highlights the importance of a comprehensive taxonomy that includes the uncultured environmental taxa.
The core and CRAT taxa cover the majority of the global activated sludge microbiota
Although the core taxa and CRAT represent a small fraction of the total diversity observed in the MiDAS 4 reference database, they accounted for the majority of the observed global activated sludge microbiota (Fig. 6c, d). Accumulated read abundance estimates ranged from 57–68% for the core genera and 11–13% for the CRAT, and combined they accounted for 68–79% of total read abundance in the WWTPs depending on process types. The core taxa represented a larger proportion of the activated sludge microbiota for the more advanced process types, which likely reflects the requirement of more versatile bacteria associated with the alternating redox conditions in these types of WWTPs. The remaining fraction, 21–32%, consisted of 6–8% unclassified genera and genera present in very low abundance, presumably with minor importance for the plant performance. The species-level core taxa and CRAT represented 11–24% and 24–33% accumulated read abundance, respectively. Combined, they accounted for almost 50% of the observed microbiota.
Global diversity within important functional guilds
The general change from simple to advanced WWTPs with nutrient removal and the transition to water resource recovery facilities (WRRFs) requires increased knowledge about the bacteria responsible for the removal and recovery of nutrients, so we examined the global diversity of well-described nitrifiers, denitrifiers, PAOs and GAOs (Fig. 8). GAOs were included because they may compete with the PAOs for nutrients and thereby interfere with the biological recovery of phosphorus37. Because MiDAS 4 provided species-level resolution for a large proportion of activated sludge microbiota, we also investigated the species-level diversity within genera affiliated with the functional guilds. A complete overview of species in all genera detected in this global study is provided in the MiDAS field guide (https://www.midasfieldguide.org/guide).
Nitrosomonas and potential comammox Nitrospira were the only abundant (≥0.1% average relative abundance) genera found among ammonia-oxidising bacteria (AOBs), whereas both Nitrospira and Nitrotoga were abundant among the nitrite oxidisers (NOBs), with Nitrospira being the most abundant across all countries (Fig. 8). Nitrobacter was not detected, and Nitrosospira was detected in only a few plants in very low abundance (≤0.01% average relative abundance). At the species-level, each genus had 2–5 abundant species (Supplementary Fig. 8). The most abundant and widespread Nitrosomonas species was midas_s_139. However, midas_s_11707 and midas_s_11733 were dominating in a few countries. For Nitrospira, the most abundant species in nearly all countries was N. defluvii. ASVs classified as the comammox N. nitrosa38,39 was also common in many countries across the world. However, because the comammox trait is not phylogenetically conserved at the 16S rRNA gene level38,39, we cannot conclude that these ASVs represent true comammox bacteria. For Nitrotoga, only two species were detected with notable abundance, midas_s_181 and midas_s_9575. Ammonia-oxidising archaea (AOAs) were not detected with MiDAS 4 due to the lack of reference sequences, and because AOAs are not targeted by the V1–V3 primer pair. However, analyses of our V4 amplicon dataset classified with the SILVA database revealed a considerable relative read abundance of AOAs in Malaysia and the Philippines, but absence or low abundance of AOAs in other countries (Supplementary Fig. 9). Other studies have occasionally found AOAs across the world, but generally in lower abundance than AOBs40,41,42. To ensure detection of AOAs with MiDAS 4, we anticipate adding external reference sequences for AOAs in a future release of the database.
Denitrifying bacteria are very common in advanced activated sludge plants, but are generally poorly described. Among the known genera, Rhodoferax, Zoogloea and Thauera were most abundant (Fig. 8). Zoogloea and Thauera are well-known floc formers, sometimes causing unwanted slime formation43. Rhodoferax was the most common denitrifier in Europe, whereas Thauera dominated in Asia. Many denitrifiers could not be classified at the species-level (Supplementary Fig. 10), likely due to highly conserved 16S rRNA genes. An exception was Zoogloea, where midas_s_1080 and Z. caeni and were the most abundant species worldwide.
EBPR is performed by PAOs, with three genera recognised as important in full-scale WWTPs: Tetrasphaera, Dechloromonas and Ca. Accumulibacter13. According to relative read abundance, all three were found in EBPR plants globally, with Tetrasphaera as the most prevalent (Fig. 8). Dechloromonas was also abundant in nitrifying and denitrifying plants without EBPR, indicating a more diverse ecology. Four recognised GAOs were found globally: Ca. Competibacter, Defluviicoccus, Propionivibrio and Micropruina, with Ca. Competibacter being the most abundant (Fig. 8). Only a few species (2–6 species) in each genus were dominant across the world for both PAOs (Supplementary Fig. 11) and GAOs (Supplementary Fig. 12), except for Ca. Competibacter, which covered ~20 abundant but country-specific species. Among PAOs, the abundant species were Tetrasphaera midas_s_5, Dechloromonas midas_s_173, (recently named D. phosphorivorans) Ca. Accumulibacter midas_s_315, Ca. A. phosphatis and Ca. A. aalborgensis. Interestingly, some of the most abundant PAOs and GAOs were also abundant in the simple process design with C-removal, indicating more versatile metabolisms.
Global diversity of filamentous bacteria
Filamentous bacteria are essential for creating strong activated sludge flocs. However, in large numbers, they can also lead to loose flocs and poor settling properties. This is known as bulking, a major operational problem in many WWTPs. Many can also form foam on top of process tanks due to hydrophobic surfaces. Presently, approximately 20 genera are known to contain filamentous species44, and among those, the most abundant are Ca. Microthrix, Leptothrix, Ca. Villigracilis, Trichococcus and Sphaerotilus (Fig. 9). They are all well-known from studies on mitigation of poor settling properties in WWTPs. Interestingly, Leptothrix, Sphaerotilus and Ca. Villigracilis belong to the genera where abundance-estimation depended strongly on primers, with V4 underestimating their abundance (Fig. 3). Ca. Microthrix and Leptothrix were strongly associated with continents, most common in Europe and less in Asia and North America (Fig. 9).
Many of the filamentous bacteria were linked to specific process types (Supplementary Fig. 13), e.g. Ca. Microthrix were not observed in WWTPs with carbon removal only, and Ca. Amarolinea were only abundant in plants with nutrient removal. The number of abundant species within the genera were generally low, with one species in Trichococcus, two in Ca. Microthrix and approximately five in Leptothrix and Ca. Villigracilis (Supplementary Fig. 14). Only five abundant species were observed for Sphaerotilus. However, a substantial fraction of unclassified ASVs was also observed, demonstrating that certain species within this genus are poorly resolved based on the 16S rRNA gene. Ca. Promineofilum was also poorly resolved at the species-level (Supplementary Fig. 15).
Conclusion and perspectives
We present a worldwide collaborative effort to produce MiDAS 4, an ASV-resolved full-length 16S rRNA gene reference database, which covers more than 31,000 species and enables genus- to species-level resolution in microbial community profiling studies. MiDAS 4 covers the vast majority of WWTP bacteria globally and provides a strongly needed common taxonomy for the field, which provides the foundation for comprehensive linking of microbial taxa in the ecosystem with their functional traits. Presently, hundreds of studies are undertaken to combine engineering and microbial aspects of full-scale WWTPs. However, most ASVs or OTUs in these studies are classified at poor taxonomic resolution (family-level or above) due to the use of incomplete universal reference databases. Because many important functional traits are only conserved at high taxonomic resolution (genus- or species-level), this strongly hampers our ability to transfer taxa-specific knowledge from one study to another. This will change with MiDAS 4, and we expect that reprocessing of data from earlier studies may reveal new perspectives into wastewater treatment microbiology. Our online Global MiDAS Field Guide presents the data generated in this study and summarises present knowledge about all taxa. We encourage researchers within the field to contribute new knowledge to MiDAS using the contact link in the MiDAS website (https://www.midasfieldguide.org/guide/contact).
The global microbiota of activated sludge plants has been predicted to harbour a massive diversity with up to one billion species2. However, most of these occur at very low abundance and are of little importance for the treatment process. By focusing only on the abundant taxa, we can see that this number is much smaller, i.e., ~1000 genera and 1500 species. We consider these taxa functionally the most important globally, representing a “most wanted list” for future studies. Some taxa are abundant in most WWTPs (core taxa), and others are occasionally abundant in fewer plants (CRAT). The CRAT have received little attention in the field of wastewater treatment, but they can be of profound importance for WWTP performance. Both groups have a high fraction of poorly characterised species. The high taxonomic resolution provided by MiDAS 4 enables us to identify samples where these important core taxa occur in high abundance. This provides an ideal starting point for obtaining high-quality metagenome-assembled genomes (MAGs), isolation of pure cultures, in addition to targeted culture-independent studies to uncover their physiological and ecological roles.
Among the known functional guilds, such as nitrifiers or polyphosphate-accumulating organisms, the same genera were found worldwide, with only a few abundant species in each genus. There were differences in the community structure, and the abundance of dominant species was mainly shaped by process type, temperature, and in some cases, continent. This discovery sends an important message to the field: relatively few species are abundant worldwide, so research or operational results can reliably be transferred from one geographical region to another, stimulating the transition from WWTPs to more sustainable WRRFs.
The relatively low number of uncharacterised abundant species also shows that it is within our reach to describe them all in terms of identity, physiology, ecology and dynamics, providing the necessary knowledge for informed process optimisation and management. The number of poorly described genera (i.e. those with only a MiDAS placeholder genus name) was 88 among the 250 core genera (35%) and more than 89% at the species-level, so there is still some work to do to link their identities and function. An important step in this direction is the visualisation of the populations. With the comprehensive set of FL-ASVs, it is possible to design highly specific FISH probes, and to critically evaluate the old probes. In the Danish WWTPs, we have successfully done this for groups in the Acidobacteriota42 based on the MiDAS 3 database18. Our recent retrieval of more than 1000 high-quality MAGs from Danish WWTPs with advanced process design is also an important step to link identity to function43. The HQ-MAGs can be linked directly to MiDAS 4 as they contain complete 16S rRNA genes. They cover 62% (156/250) of the core genera and 61% (69/113) of the core species identified in this study. These MAGs may also form the basis for further studies to link identity and function, e.g. by applying metatranscriptomics44 and other in situ techniques such as FISH combined with Raman45,46, guided by the “most wanted” list provided in this study. We expect that MiDAS 4 will have significant implications for future microbial ecology studies in wastewater treatment systems.
Source: Ecology - nature.com