More stories

  • in

    Evolution of snow algae, from cosmopolitans to endemics, revealed by DNA analysis of ancient ice

    Classification of snow algae in the ice core based on ITS2 sequencesWe used high-throughput sequencing to obtain DNA sequences of algae from 19 layers of an ice core drilled on a glacier in central Asia, dated from present time to 8000 years ago (Fig. 1 and Table S1). In total, 17,016 unique sequences (phylotypes) for the fast-evolving algal nuclear rDNA internal transcribed spacer 2 (ITS2) region were determined in the ice core, from which 290 OTUs were defined with ≥98% nt sequence identity among all OTUs.The ITS2 sequences were classified at the species level according to the genetic species concept based on secondary structural differences in the ITS2 region, which correlate with the boundaries of most biological species [38]. The ITS2 sequences from ice core samples were classified into 24 subgroups consisting of 17 chlorophycean, 5 trebouxiophycean, and 2 ulvophycean groups based on their secondary structures and BLASTn results (Fig. S1 and Tables S3–S4). The 17 subgroups of Chlorophyceae were subdivided into 10 subgroups of the Chloromonadinia clade, 1 subgroup of the Monadinia clade (recently treated as the genus Microglena [54]), 3 subgroups of the Reinhardtinia clade, 2 subgroups of the Stephanosphaerinia clade, and 1 subgroup corresponding to an unnamed group (which is related to Ploeotila sp. CCCryo 086-99) (for the clade names, see [55]). Although the Chloromonadinia clade contains several snow species belonging to Chloromonas or Chlainomonas, the 10 subgroups of the Chloromonadinia clade were considered to be Chloromonas. The 5 trebouxiophycean subgroups were composed of 2 subgroups of the Chlorella group, 1 subgroup of the Raphidonema group, 1 subgroup of the Trebouxia group, and 1 subgroup of the Neocystis group. The 2 subgroups of Ulvophyceae were closely related to the genus Chamaetrichon and Planophila, respectively. It is noted that Sanguina (‘Chlamydomonas’-snow group B [6]), Ancylonema, and Mesotaenium, which are snow algal genera found throughout the world [56, 57], were not detected in the ice core samples (Tables S3–S4).Global distribution of the Raphidonema groupTo understand the process by which snow algae form geographically specific population structures and how they migrate globally across the glaciers and snow fields, it is necessary to focus on the microbial species that inhabit the global cryosphere. Previous work elucidated that the Raphidonema group and ‘Chlamydomonas’-snow group B (Sanguina) are the cosmopolitans at both poles [6], but the latter was not detected in ice core samples examined in this study. Therefore, to elucidate the evolutionary history of the Raphidonema group, we further analyzed the ITS2 sequences obtained from the ice core sample as well as the glacier-surface samples from both poles [6] and from the mid-latitudes (samples from 10 sites, obtained in this study) (hereafter, surface samples; Table S2). Members of the Raphidonema group were detected in the older (deep core) layers of the ice core and at the glacier surface of central Asia (Fig. S1 and Tables S3–S4), as well as in the red snow samples from both poles [6]. In central Asia, the Raphidonema group was found in the Russian, Chinese, and Kyrgyz samples but was not detected in the Japanese and Tajik samples (Fig. S1 and Tables S3–S4). Combining these sequences yielded 893,649 reads and 22,389 unique sequences for subsequent detailed analysis (Tables S5–S6). The taxonomic composition of the Raphidonema communities differed among the mid-latitude, ice core, Arctic, and Antarctic samples as determined by PERMANOVA (Table S7). Most of the unique sequences in the Raphidonema group were consistent with an endemic distribution (Tables S8–S10). An average of 77% of the unique sequences of the Raphidonema group were endemic to a specific region (mid-latitude, 96%; Antarctic, 66%; Arctic, 79%), accounting for 40% of the total sequencing reads (mid-latitude, 77%; Antarctic, 74%; Arctic, 22%) (Fig. 2a, b and Tables S9–S10). This result suggested that most of the unique sequences are endemic, indicating that their dispersal has been limited to their respective regions [58,59,60,61].Fig. 2: Distribution types of the Raphidonema group obtained from each region and the ice core based on ITS2 unique sequences.Proportions of unique sequence and number of sequencing reads are shown. a Unique sequences from surface snow and ice-core samples. b Number of sequencing reads from surface snow and ice core samples. c Unique sequences from the indicated locations within the ice core. d Number of sequencing reads of the unique sequences from the indicated locations within the ice core.Full size imageNext, we analyzed the global distribution of the cosmopolitan phylotypes of the Raphidonema group, because a previous study analyzed their distribution only at the poles [6]. Only a limited number of unique sequences were distributed in all regions (mid-latitude, 1.4%; Antarctic, 5.6%; Arctic, 3.1%), accounting for a large proportion of the sequencing reads in polar regions but for only a small proportion in the mid-latitudes (mid-latitude, 2.8%; Antarctic, 20%; Arctic, 55%) (Figs. 2a, b, S2–S3, and Tables S9–S10). The distribution types of the Raphidonema group obtained from each region and the ice core were similar between the USEARCH and DADA2 analyses (Figs. 2, S4). In addition, we note that in ancient samples, post-mortem nt substitutions, such as cytosine to thymine, accumulate over many years of deposition [62], and these are not included in the DADA2 error model, which leads to the elimination of minor sequences in the DADA2 analysis. Therefore, we based our analysis on the results of the USEARCH unique sequences. These results suggested that only a few snow algae in the Raphidonema group were detected in samples from the mid-latitude regions.Snow algae of the Raphidonema group were detected in different ice core layers, corresponding to different time periods. The ice core records revealed that the distribution types of the Raphidonema group have not changed significantly for the last 8000 years, with p = 0.1924 based on a PERMANOVA between the newer (1800–2001 AD) and the older (6000–8000 years before present) layers (Fig. 2c, d). In ice core samples, 77% of the unique sequences of the Raphidonema group were detected only in the ice core samples, accounting for 23% of the total sequencing reads (Fig. S5). Although some of these unique sequences may be artifacts of the post-mortem nt substitution or sequencing errors, because we conducted sequence quality filtering and removed the majority of artifact sequences by removing the singleton clusters, most of the unique sequences in the ice core are not likely to be artifacts, but they could represent endemic phylotypes (Figs. 2a, b, S5).The cosmopolitan phylotypes were detected over a broad period as represented by ice core samples. They were present in approximately similar ratios in the newer and older layers (Fig. 2c, d). The cosmopolitan phylotypes were relatively abundant in the ice core samples (average, 4.0%; range, 0.2–13%), accounting for 13% (0.9–81% in the samples) of the total sequencing reads (Figs. 2c, d and S5).Microevolution of cosmopolitan and endemic phylotypesWe analyzed the evolutionary relationship between cosmopolitan and endemic phylotypes of the Raphidonema group among all snow surface and ice core samples. In total, 22,389 unique sequences of the Raphidonema group were clustered into 170 OTUs that were defined with ≥98% nt sequence identity among sequences within OTUs. The OTU sequences were subdivided into five subgroups (Groups A–E) based on phylogenetic analysis (Figs. S6–S11 and Tables S11–S12). Based on a previous study [63], Groups A–C and Group E were assigned to R. sempervirens and R. nivale, respectively, but Group D was not consistent with any species examined in that study (Fig. S6).The phylotypes were categorized into three subsets: the cosmopolitan phylotypes found at both poles and the mid-latitude regions; the multi-region phylotypes found in any two of the Antarctic, Arctic, and mid-latitude regions; and the endemic phylotypes found in only one of the three regions. Cosmopolitan phylotypes were found in Groups A, B, and C and accounted for 64.6% of the unique sequences. We then analyzed the dispersal of the three groups in detail.MJ networks [47] for the ITS2 sequences in each subgroup revealed that the cosmopolitan phylotypes were located at the center of the networks in Groups A and C that contained any types (endemics, multi-regions, and cosmopolitans) of the phylotypes, whereas the endemic phylotypes were considered to be derived from the cosmopolitan phylotypes (Figs. 3 and S12–S13). Moreover, the outgroup phylotypes were directly connected to the cosmopolitan phylotypes. These findings clearly showed that the cosmopolitan phylotypes were ancestral, whereas the endemic phylotypes were derived. In contrast, there were remarkable differences in the shape of the networks between Group B and the others (Groups A and C). In Group B, the Antarctic endemic phylotypes formed a distinct clade, and multi-region phylotypes seemed to be recently derived from this clade. In addition, the Arctic endemic phylotypes formed another distinct clade. These two Group-B clades split directly from a cosmopolitan phylotype (5.3% of the total sequencing reads). For Groups A and C, however, major portions of the total sequencing reads belonged to cosmopolitan phylotypes in Groups A (48.2%) and C (62.4%), and the endemic and multi-region phylotypes were directly connected to these major cosmopolitan phylotypes in a radial manner—the so-called “star-like” pattern [64]. These contrasting network shapes seem to have been formed as a consequence of the unique evolution of each of these groups. We also found that sequences from ice cores did not represent a basal position (Figs. 3 and S12–S13). This is because the haplotypes found in the modern samples have existed from times earlier than the ice core ages, due to the very small mutation numbers expected to have occurred since the ice core ages. Therefore, detected ice core ages were not included in the molecular evolution calculations of our demographic model. However, the phylogenetic networks themselves do not provide information on the evolutionary time scale. Hence, the ice core samples provide further direct evidence that Raphidonema, especially cosmopolitans belonging to this genus, persistently grew on snow and ice at least during the Holocene, and their ITS2 sequences have not changed over the last 8000 years.Fig. 3: Phylogenetic relationships among phylotypes of the Raphidonema groups.Phylotype networks for ITS2 sequences within Groups A (a), B (b), and C (c) of the Raphidonema group that include the cosmopolitan phylotypes in this study. The median-joining method was used. Circles indicate phylotypes; the size of each circle is proportional to the number of unique sequences. Each notch on the edges represents a mutation. Phylotypes are colored according to geographic region. The arrow represents the phylotype in the outgroup (see Fig. S6).Full size imageReferring to “ancestral” phylotypes as those having a longer history than other, more recently derived phylotypes, it is possible that individuals not closely related can share the same ancestral phylotype. In such cases, if genetically far-related individuals from various geographical regions share the same ancestral phylotype, they appear to be “cosmopolitan” (Fig. S14a). In order to distinguish between these “apparent cosmopolitans”, and “true cosmopolitans” that migrate globally, it is necessary to show that the cosmopolitan and endemic phylotypes have distinct demographic histories rather than being part of a continuous population sharing certain demographic dynamics (Fig. S14). Because phylotype networks are not useful for quantifying the rate(s) of microevolution, we used the coalescent model to quantify phylotype demographics [65]. As numerous phylotypes must be analyzed with this approach, we concentrated on statistical inference based on pairwise comparisons of phylotypes, for which the likelihood can be determined in a practical manner (see Materials and Methods). Histograms for the number of mismatched sites between two phylotypes chosen from a set of phylotypes, which will be called the pairwise mismatch distribution, are shown in Figs. 4 and S15. For Groups A and C, the distribution among cosmopolitans, multi-regions, and endemics was unimodal, in which the modes align from left to right with the order cosmopolitans, multi-regions, and endemics. Rogers and Harpending [48] noted that this “wave” propagation results from the expansion in size of a population, which leads to large mismatches, and the mode shifts to the right (see Fig. 2 of [48]). As time passes, the mode shifts to the left and eventually returns to the origin, i.e., representing a population that has not undergone an expansion event. Rogers and Harpending obtained an approximate solution for the wave and fitted the solution to human mitochondrial sequence data. We improved upon their method based on the coalescent model (see Materials and Methods) and applied it to the ITS2 sequence data for snow algae.Fig. 4: Mismatch distribution based on the number of pairwise differences in each distribution type in Raphidonema groups.The lines represent the observed number of pairwise differences in each distribution type (cosmopolitan, multi-region, endemic) within the Raphidonema Groups A (a), B (b) and C (c). Calculations were performed for all distribution types of Raphidonema Groups A and C, for which various cosmopolitan phylotypes were detected. On the other hand, calculations for only multi-region and endemic phylotypes were performed for Raphidonema group B, because no variation was found in cosmopolitan phylotypes.Full size imageFor Group A, when we fit the single demographic model to all phylotypes, the log-likelihood was –414,487. In contrast, when we fit the demographic model to each subset, that is, cosmopolitans, multi-regions, and endemics, separately, the log-likelihood was –341,964. Because the latter is larger than the former, we fit the model to each subset of phylotypes separately. For Group C, when we fit the demographic model to the cosmopolitans, multi-regions, and endemics separately, the log-likelihood was –142,106, which is larger than the log-likelihood, –218,080, when we fit the single demographic model to all phylotypes. In contrast to Groups A and C then, we fit the single demographic model to all phylotypes of Group B because the log-likelihood, –196,070, was larger than the log-likelihood, –220,145, when we fit the demographic model to the cosmopolitans, multi-regions, and endemics separately. These results suggested that cosmopolitans, multi-regions, and endemics experienced different demographic histories in Groups A and C, whereas they had the same demographic history in Group B (Table S13). These results indicate the cosmopolitans in Group A and C are true cosmopolitans, whereas the those in Group B can be regarded as an apparent cosmopolitan.The ML estimates of (tau = 2ut_0), (theta _0 = 2N_0u), and (theta _1 = 2N_1u) are shown in Table S13 with standard deviation values. The population expanded t years ago, with the size before and after the expansion being represented by N0 and N1, respectively. The mutation rate (u) was assumed to be 7.9 × 10–8/ sequence/generation, and the generation interval was assumed to be 24 days (Materials and Methods). In Group A, for the cosmopolitans, the estimates of t, N0, and N1 were (33.8/(2 times 7.9) times 10^8 times {textstyle{{24} over {365}}} = 1.4 times 10^7) years, ((0.108 – 0.010)/(2 times 7.9) times 10^8 = (6.8 – 0.63) times 10^5), and ((0.217)/(2 times 7.9) times 10^8 = 1.4 times 10^6), respectively. In the same way, we computed estimates of t, N0, and N1 of other phylotypes and other groups (Table S14). For the endemics, the respective values were 9.2 × 106 years, 80, and 2.1 × 107, and the values were 4.6 × 106 years, 139, and 1.5 × 107 for the multi-regions. Taking into account the minimum and maximum ranges of the mutation rates per generation as well as the generation intervals, t for cosmopolitans was 3.6 × 106–4.0 × 107 years ago, and t for endemics was 2.3 × 106–2.6 × 107 years ago (Table S14). These results suggested that the cosmopolitans existed at least 1.4 × 107 years ago, and the endemics were derived from the cosmopolitans 9.2 × 106 years ago. The size of the endemics expanded 2.6 × 105-fold, which may have resulted from extensive dispersal. The multi-regions tended to mimic the endemics. Note that our demographic model was simplified to avoid overparameterization. In reality, considering the branching patterns of the MJ network, it is plausible that the endemic phylotypes have been repetitively and continuously derived from the cosmopolitans in multiple lineages—from 9.2 × 106 years ago to the present. In the same way, as for Group C, our results suggested that the cosmopolitan population expanded 3.9-fold ~3.2 × 106 years ago, and the endemics were derived from the cosmopolitans 1.9 × 105 years ago. The size of the endemics expanded 59-fold. In contrast to the phylotypes of Groups A and C, those of Group B experienced no significant expansion (Supplementary Results). In Groups A and C, the derived endemics (and multi-regions) expanded greatly as compared with the ancestral cosmopolitans (Table S14). These extraordinary expansions constitute evidence for local adaptation by the endemic/multi-region populations. In contrast, there was no evidence of local adaptation in Group B. The mismatch distribution of the entire Group B (multi-regions + endemics) showed a multimodal pattern (Fig. 4), which is present in the populations with stable sizes for a long period. When the populations finally reach equilibrium, the mismatch distributions show the exponential distribution [48]. Based on our ML estimates (Table S14), the historical population of Group B has been stable. More

  • in

    Modeling marine cargo traffic to identify countries in Africa with greatest risk of invasion by Anopheles stephensi

    With human movement and globalization, invasive container breeding vectors responsible for dengue, Zika, chikungunya and now malaria, with An. stephensi, are being introduced and establishing populations in new locations. They are bringing with them the threat of increasing or novel cases of vector-borne diseases to new locations where health systems may not be prepared.Anopheles stephensi was first detected on the African continent in Djibouti in 2012 and has since been confirmed in Ethiopia, Somalia, and Sudan. Unlike most malaria vectors, An. stephensi is often found in artificial containers and in urban settings. This unique ecology combined with its initial detection in seaports in Djibouti, Somalia, and Sudan has led scientists to believe that the movement of this vector is likely facilitated through maritime trade.By modeling inter- and intra-continental maritime connectivity in Africa we identified countries with higher likelihood of An. stephensi introduction if facilitated through maritime movement and ranked them based on this data. Anopheles stephensi was not detected in Africa (Djibouti) until 2012. To determine whether historical maritime data would have identified the first sites of introduction, 2011 maritime data were analyzed to determine whether the sites with confirmed An. stephensi would rank highly in connectivity to An. stephensi endemic countries. Using 2011 data on maritime connectivity alone, Djibouti and Sudan were identified as the top two countries at risk of An. stephensi introduction if it is facilitated by marine cargo shipments. In 2021, these are two of the three African coastal nations where An. stephensi is confirmed to be established.When 2011 maritime data were combined with the HSI for An. stephensi establishment, the top five countries remain the same as with maritime data alone: Sudan, Djibouti, Egypt, Kenya and Tanzania, in that order. The maritime data show likelihood of introduction and HSI shows likelihood of establishment. When combined, the analyses show a likelihood of being able to establish and survive once introduced. Interestingly, the results of the combined analyses align with the detection data being reported in the Horn of Africa. The 2011 maritime data reinforces the validity of the model as it points to Sudan and Djibouti, where An. stephensi established in the following years. Similarly, the HSI data for Ethiopia has aligned closely with detections of the species to date15. Interestingly, around this time of initial detection in Djibouti, Djibouti City port underwent development and organizational change. The government of Djibouti took back administrative control of the port as early as 201230.Following this method, maritime trade data from 2020 could point to countries at risk of An. stephensi introduction from endemic countries as well as from the coastal African countries with newly introduced populations. Here we provide a prioritization list and heat map of countries for the early detection, rapid response, and targeted surveillance of An. stephensi in Africa based on this data and the HSI (Fig. 4). Further invasion of An. stephensi on the African continent has the potential to reverse progress made on malaria control in the last century. Anopheles stephensi thrives in urban settings and in containers, in contrast to the rural settings and natural habitats where most Anopheles spp. are found20. The situation in Djibouti may be a harbinger for what is to come if immediate surveillance and control strategies are not initiated18.Figure 4Prioritization Heat Map of African Countries. These 2020 heat maps rank African countries using (A) the Likelihood of An. stephensi through Maritime Trade Index (LASIMTI) data alone and (B) LASIMTI and HSI combined, based on maritime connectivity to countries where An. stephensi is endemic. Higher ranking countries which are at greater risk of An. stephensi introduction are darker in red color than those that are lower ranking (lighter red). Countries which are shaded grey are inland countries that do not have a coast and therefore no data on maritime movement into ports. Countries which are grey and checkered have established or endemic An. stephensi populations and are considered source locations for potential An. stephensi introduction in this analysis. Map was generated using MapChart (mapchart.net).Full size imageMaritime data from 2020, with Djibouti and Sudan considered as potential source populations for intracontinental introduction of An. stephensi, indicate the top five countries at risk for maritime introduction are Egypt, Kenya, Mauritius, Tanzania, and Morocco, suggesting that targeted larval surveillance in these countries near seaports may provide a better understanding of whether there are maritime introductions. When the data from 2020 data is combined with HSI for An. stephensi, the top five countries are instead Egypt, Kenya, Tanzania, Morocco, and Libya. Interestingly, historical reports of An. stephensi in Egypt exist; however, following further identification these specimens were determined to be An. ainshamsi31. With several suitable habitats both along the coast and inland of Egypt, revisiting surveillance efforts there would provide insight into how countries that are highly connected to An. stephensi locations through maritime traffic may experience introductions.Further field validation of this prioritization list is necessary, because it is possible that An. stephensi is being introduced through other transportation routes, such as dry ports or airports32, or may even be dispersed through wind facilitation33. However, countries highlighted here with high levels of connectivity to known An. stephensi locations should be considered seriously at risk and surveillance urgently established to determine whether An. stephensi introduction has already occurred or to enable early detection. Primary vector surveillance for both Ae. aegypti and An. stephensi are through larval surveys, and the two mosquitoes are commonly detected in the same breeding habitats. It could therefore be beneficial to coordinate with existing Aedes surveillance efforts to be able to simultaneously gather data on medically relevant Aedes vectors while seeking to determine whether An. stephensi is present. Similarly, in locations with known An. stephensi and not well established Aedes programs, coordinating surveillance efforts provides an opportunity to conduct malaria and arboviral surveillance by container breeding mosquitoes simultaneously.Efforts to map pinch points or key points of introduction based on the movement of goods and populations could provide high specificity for targeted surveillance and control efforts. For example, participatory mapping or population mobility data collection methods, such as those used to determine routes of human movement for malaria elimination, may simultaneously provide information on where targeted An. stephensi surveillance efforts should focus. Several methods have been proposed in the literature for modeling human movement and one in particular, PopCAB, which is often used for communicable diseases, combined quantitative and qualitative data with geospatial information to identify points of control34.Data on invasive mosquito species has shown that introduction events are rarely a one-time occurrence. Population genetics data on Aedes species indicate that reintroductions are very common and can facilitate the movement of genes between geographically distinct populations, raising the potential for introduction of insecticide resistance, thermotolerance, and other phenotypic and even behavioral traits which may be facilitated by gene flow and introgression35. Djibouti, Sudan, Somalia, and Ethiopia, countries with established invasive populations of An. stephensi, should continue to monitor invasive populations and points of introduction to control and limit further expansion and adaptation of An. stephensi. Work by Carter et al. has shown that An. stephensi populations in Ethiopia in the north and central regions can be differentiated genetically, potentially indicating that these populations are a result of more than one introduction into Ethiopia from South Asia, further emphasizing the potential role of anthropogenic movement on the introduction of the species17.One major limitation of this work is that Somalia is the third coastal nation where An. stephensi has been confirmed; however, marine traffic data were not available for Somalia so it could not be included in this analysis. The potential impact of Somalia on maritime trade is unknown and it should not be excluded as a potential source population. Additionally, this model does not account for the possibility of other countries with An. stephensi populations that have not been detected yet. As new data on An. stephensi expansion becomes available, more countries will be at higher risk. Other countries with An. stephensi populations, such as Iran, Myanmar, and Iraq, constitute lower relative percentages of trade with these countries so were not included in the analysis. However, genetic similarities were noted from An. stephensi in Pakistan, so this nation was included10.Due to the nature of maritime traffic, inland countries were also not included in this prioritization ranking. Countries which are inland but share borders with high-risk countries according to the LASTIMI index should also be considered with high priority. For example, the ranking from 2011 highlights Sudan and Djibouti, both which border Ethiopia, and efforts to examine key land transportation routes between bordering nations where humans and goods travel may provide additional insight into the expansion routes of this invasive species.In Ethiopia, An. stephensi was detected in 2016. It has largely been detected along major transportation routes although further data is needed to understand the association between movement and An. stephensi introductions and expansion since most sampling sites have also been located along transport routes. Importantly, Ethiopia relies heavily on the ports of Djibouti and Somalia for maritime imports and exports. Surveillance efforts have revealed that the species is also frequently associated with livestock shelters and An. stephensi are frequently found with livestock bloodmeals15. Interestingly, the original detection of An. stephensi was found in a livestock quarantine station in the port of Djibouti. Additionally, livestock constitutes one of the largest exports of maritime trade from this region. For countries with high maritime connectivity to An. stephensi locations, surveillance efforts near seaports, in particular those with livestock trade, may be targeted locations for countries without confirmed An. stephensi to begin larval surveillance.As Ae. aegypti and Culex coronator were detected in tires or Ae. albopictus through tire and bamboo (Dracaena sanderiana) trade, An. stephensi could be carried through maritime trade of a specific good36,37,38. Future examination of the movement of specific goods would be beneficial in interpreting potential An. stephensi invasion pathways. Additionally, the various types of vessels used to transport certain cargo such as container, bulk, and livestock ships could affect An. stephensi survivability during transit. Sugar and grain are often shipped in bulk or break bulk vessels which store cargo in large unpackaged containers. Container ships transport products stored in containers sized for land transportation via trucks and carry goods such as tires. Livestock vessels are often multilevel, open-air ships which require more hands working on deck and water management39.Using LSBCI index data from 2020, we developed a network to highlight how coastal African nations are connected through maritime trade (Fig. 4). The role of this network analysis is two-fold, (1) it demonstrates an understanding of intracontinental maritime connectivity; and (2) it highlights the top three countries connected via maritime trade through an interactive html model (Supplemental File). For example, if An. stephensi is detected and established in a specific coastal African nation such as Djibouti, selecting the Djibouti node reveals the top three locations at risk of introduction from that source country (Djibouti links to Sudan, Egypt and Kenya). This can be used as an actionable prioritization list for surveillance if An. stephensi is detected in any given country and highlights major maritime hubs in Africa which could be targeted for surveillance and control. For example, since the development of this model, An. stephensi has been detected in Nigeria. Through the use of this interactive model, Ghana, Cote d’Ivoire, and Benin have been identified as countries most connected to Nigeria through maritime trade and therefore surveillance prioritization activities could consider these locations.The network analysis reveals the significance of South African trade to the rest of the continent. Due to the distance, South Africa did not appear to be high in risk of An. stephensi introduction. However, this analysis does reveal that if An. stephensi were to enter nearby countries, it could very easily be introduced because of its high centrality. Western African countries such as Ghana, Togo, and Morocco are also heavily connected to other parts of Africa. Interestingly, Mauritius appears to be highly significant to this network of African maritime trade. Based on 2020 maritime data, Mauritius is ranked as the country with the third greatest likelihood of introduction of An. stephensi and has the second highest centrality rank value of 0.159. Considering these factors, Mauritius could serve as an important port of call connecting larger ports throughout Africa or other continents. With long standing regular larval surveillance efforts across the island for Aedes spp., this island nation is well suited to look for Anopheles larvae as part of Aedes surveillance efforts for early detection and rapid response to prevent the establishment of An. stephensi. If An. stephensi were to become established in countries with high centrality ranks, further expansion on the continent could be accelerated drastically. These ports could serve as important watchpoints and indicators of An. stephensi’s incursion into Africa. Anopheles stephensi is often found in shared habitats with Aedes spp. and a great opportunity exists to leverage Aedes arboviral surveillance efforts to initiate the search for An. stephensi, especially in countries that have high potential of introduction through maritime trade. More

  • in

    Predicting cascading extinctions and efficient restoration strategies in plant–pollinator networks via generalized positive feedback loops

    The Campbell et al. model provides an excellent framework to identify species whose extinction leads to community collapse and species whose reintroduction can restore the community (see Fig. 2 for an illustration of these processes). Our first objective, finding the effect of species extinction on the rest of the species in an established community, is achievable using the concept of Logical Domain of Influence (LDOI)41; the LDOI represents the influence of a (set of) fixed node state(s) on the rest of the components in a system. In this section we first present our proposed method to calculate the LDOI for the Boolean threshold functions governing the Campbell et al. model of plant–pollinator community assembly. Then we verify that the simplified logical functions preserve the LDOI and hence can be implemented to further analyze the effect of extinction in plant–pollinator networks. Next, we address one of the main questions that motivated this study: Can stable motif driver set analysis facilitate the identification of keystone species? We discuss the identification of the driver sets of inactive stable motifs and motif groups and present the results of stabilizing these sets to measure the magnitude of the effect of species extinction on the communities. Lastly we discuss possible prevention and mitigation measures based on the knowledge acquired from driver sets of stable motifs and motif groups.Figure 2Illustration of species extinction and restoration in a hypothetical 6-species community. (a) The interaction network (on the left), and the maximal richness community possible for this network (the community with the most established species). Nodes highlighted with green represent established species. (b) The initial extinction of two species, po_1 and po_2 (left) and the community that results after cascading extinctions (right). Nodes highlighted with grey represent extinct species. (c) An intervention to restore pl_2 (left), which induces the restoration of further species, finally leading to a restored community with all the species present (right). The nodes highlighted with teal represent the restored species.Full size imageLDOI in the Boolean threshold modelThe LDOI concept was originally defined on Boolean functions expressed in a disjunctive prime form. Here we extend it to Boolean threshold functions. We implemented it as a breadth first search on the interaction network, as exemplified in Fig. 3. Assume that we want to find the LDOI of a (set of) node(s) (S_0={n_1,dots ,n_N}) and their specific fixed state (Q(S_0)={sigma _{n_1},dots ,sigma _{n_N}}). Starting from the set (S_0), the next set of nodes (S_1) that can acquire a fixed state due to the influence of (Q(S_0)) consists of the nodes that have an incoming edge from the nodes in the set (S_0) in the interaction network. The nodes in set (S_1) are the subject of the first search level. For each node (n_i in S_0) and (n^prime _i in S_1) we assume a “worst case scenario” (i.e., maximal opposition of the effect of (n_i) on (n^prime _i) from other regulators) to find the possible sufficiency relationships between the two. There are five cases:

    1.

    If (n_i) is a positive regulator of (n^prime _i), then (sigma _{n_i}=1) is a candidate for being sufficient for (sigma _{n^prime _i}=1). We assume that all other positive regulators of (n^prime _i) that have an unknown state (i.e., are not in (Q(S_0))) are inactive and all negative regulators of (n^prime _i) that have an unknown state are active. If (sum _j W_{ij} > 0) under this assumption, then the active state of (n_i) is sufficient to activate (n^prime _i). The virtual node (n^prime _i) that corresponds to (sigma _{n^prime _i}=1) is added to LDOI((Q(S_0))).

    2.

    If (n_i) is a positive regulator of (n^prime _i), then (sigma _{n_i}=0) is a candidate for being sufficient for (sigma _{n^prime _i}=0). We assume all other positive regulators of (n^prime _i) that have an unknown state are active and all negative regulators of (n^prime _i) that have an unknown state are inactive. If (sum _j W_{ij}le 0) under this assumption, then the inactive state of (n_i) is sufficient to deactivate (n^prime _i). The virtual node (sim n^prime _i) that corresponds to (sigma _{n^prime _i}=0) is added to LDOI((Q(S_0))).

    3.

    If (n_i) is a negative regulator of (n^prime _i), then (sigma _{n_i}=1) is a candidate for being sufficient for (sigma _{n^prime _i}=0). We assume all positive regulators of (n^prime _i) that have an unknown state are active and all other negative regulators of (n^prime _i) that that have an unknown state are inactive. If (sum _j W_{ij}le 0) under this assumption, then the active state of (n_i) is sufficient to deactivate (n^prime _i). The virtual node (sim n^prime _i) that corresponds to (sigma _{n^prime _i}=0) is added to LDOI((Q(S_0))).

    4.

    If (n_i) is a negative regulator of (n^prime _i), then (sigma _{n_i}=0) is a candidate for being sufficient for (sigma _{n^prime _i}=1). We assume all positive regulators of (n^prime _i) that have an unknown state are inactive and all other negative regulators of (n^prime _i) that that have an unknown state are active. If (sum _j W_{ij} > 0) under this assumption, then the inactive state of (n_i) is sufficient to activate (n^prime _i). The virtual node (n^prime _i) that corresponds to (sigma _{n^prime _i}=1) is added to the LDOI((Q(S_0))).

    5.

    If none of the past four sufficiency checks are satisfied, the node (n^prime _i) will be visited again in the next search levels.

    The second set of nodes that can be influenced, (S_2), are the nodes that have an incoming edge from the nodes in the set (S_1). The algorithm goes over these nodes in the second search level as described above. This search continues to all the levels of the search algorithm until all nodes are visited (possibly multiple times) and either acquire a fixed state and are added to the LDOI or their state will be left undetermined at the end of the algorithm. In Fig. 3, we illustrate this search to find the LDOI((sim )pl_1). The first search level is (S_1={)po_1, po_3(}); (sim )pl_1 is sufficient to deactivate po_3, but not po_1. As a result, (sim )po_3(in ) LDOI((sim )pl_1). This process continues until all levels are visited and at the end of the algorithm LDOI((sim )pl_1()={sim )po_3, (sim )pl_2, (sim )pl_3, (sim )pl_4, (sim )pl_5, (sim )po_1, (sim )po_2 (}).Figure 3Breadth first search of the interaction network to find the LDOI of a (set of) fixed note state(s) in Boolean threshold functions governing the dynamics of plant–pollinator networks. (a) An interaction network with five plants and 3 pollinators. (b) The breadth first search in the case of starting from the node state (sim )pl_1. The nodes with incoming edges from pl_1 make up (S_1={)po_1, po_3(}). The second sufficiency check is satisfied for node state (sim )po_3, as a result (sim )po_3(in ) LDOI((sim )pl_1). The same process is applied for node po_1, but none of the sufficiency checks are satisfied, so this node will be visited again later. The next level of the search consists of the nodes that have incident edges from (S_1), i.e., (S_2={)pl_2, pl_3, pl_4, pl_5(}). The second sufficiency check is satisfied for all of these nodes and they are all fixed to their inactive state in the LDOI((sim )pl_1). Lastly, we reach (S_3={)po_1, po_2(}). Node po_1 is reached again, and with both its positive regulators fixed to their inactive states the second sufficiency check is satisfied and node po_1 is fixed to its inactive state as well. The same holds for po_2 and hence LDOI((sim )pl_1()={sim )po_3, (sim )pl_2, (sim )pl_3, (sim )pl_4, (sim )pl_5, (sim )po_1, (sim )po_2 (}).Full size imageTo measure the accuracy of the simplification method originally introduced in28, we analyzed logical domains of influence in 6000 networks with 50–70 nodes. These networks are among the largest in our ensembles and have the most complex structures. We randomly selected (sets of) inactive node states, found their LDOIs using the Boolean threshold functions and the simplified Boolean functions, and compared the two resulting LDOIs. We used 8 single node states and 8 combinations of size 2 to 4 for each network. We found that in all cases the LDOI calculated using the simplified Boolean functions matches the LDOI calculated using the Boolean threshold functions.Next, we analyzed (sets of) active node states and their LDOIs in the same ensembles of networks. Similar to the previous analysis, we used 8 single node states and 8 combinations of size 2 to 4 for each network. Our analysis shows that in 77.1% of the cases the LDOI calculated using the simplified Boolean functions matches the LDOI calculated using the Boolean threshold functions. In 22% of the cases the LDOI calculated from the simplified Boolean functions contains the LDOI calculated from the threshold functions, and it also contains extra active node states, overestimating the LDOI by 57.5% on average. These additional members of the LDOI result from the fact that the simplified Boolean functions contain fewer negative regulators than the threshold functions. The guiding principle of the simplification method is that the probability of (H(x)=1) conserves the probability of each node having an active state across all the states it can have. In contrast, the probability of the propagation of the active state is not necessarily preserved and tends to be higher in the simplified Boolean model; thus the LDOI of the active node states is overestimated in some cases.In the rest of the cases (about 1%), the LDOI calculated from the simplified Boolean functions does not fully capture the LDOI calculated from the threshold functions. This again is caused by the sparsification of the negative edges in the simplified Boolean functions. In the threshold functions, the activation of 4 or more negative regulators of a target node combined with one active positive regulator is sufficient to deactivate the target node, i.e., there might be inactive node states in the LDOI of a set of active node states. However, some of these negative regulators drop in the simplified Boolean model and the inactive state of the target node is not necessarily in the LDOI of the set of active node states in the simplified case. This is the rare mechanism by which the simplified model might underestimate the influence of active node states on the rest of the network.In the following section we are interested in analyzing the effect of species extinction on the established community, i.e., we look at the LDOI of (sets of) inactive node states. Observing that the influence of extinction of species is measured correctly in the simplified Boolean models, we conclude that these models can be utilized to further analyze the process of extinction and its ecological implications.Stable motif based identification of species whose loss leads to cascading extinctionsEach stable motif or motif group can have multiple driver sets; stabilization of each driver set leads to the stabilization of the whole motif or motif group. In plant–pollinator interaction networks, the stable motifs either represent a sub-community (when the constituent nodes stabilize in their active states) or the simultaneous extinction of all species in the group (when the constituent nodes stabilize to their inactive states). Stabilization of the nodes in the driver set of an inactive stable motif results in stabilization of all the nodes in the stable motif to their inactive state, i.e., cascading extinction of the constituent species.The knowledge gained from stable motif analysis and the network of functional relationships offers insight into the cascading effect of an extinction that constitutes a driver set of an inactive stable motif. The magnitude of this effect depends on (i) the number of nodes that the inactive stable motif contains and (ii) the number of virtual nodes (including motifs and motif groups) corresponding to inactive species that are logically determined by the stabilization of the inactive stable motif.To investigate the role of stable motifs in the study of species extinction in plant–pollinator networks, we simulated extinctions that drive inactive stable motifs in 6000 networks with the sizes of 50–70 nodes. We considered driver sets of size 1, 2, or 3, and implemented them by fixing the corresponding node(s) to its (their) inactive state. As a point of comparison, we also performed a “control” analysis using the same networks with the same size of initial extinction; however, the candidates of initial extinction are inactive node states that do not drive stable motifs or motif groups. Based on the properties of the drivers of stable motifs, one expects that following the extinction of driver species, cascading extinctions of other species follow, while the same does not necessarily hold for non-driver species. As a result, we expect to observe greater damage to the original community when driver species become extinct.We assume that the “maximal richness community”—the community (attractor) in which the largest number of species managed to establish—is the subject of species extinction. This maximal richness community results from the stabilization of all active stable motifs. All other attractors that have some established species contain a subset of all active stable motifs and thus will contain a subset of the species of the maximal richness community. While for a generic Boolean model with multiple attractors one expects that a perturbed version of the model also has multiple attractors, this specific perturbation of a plant–pollinator model (namely, extinction of species in the maximal richness community) has a single attractor. We prove this by contradiction. Assume there are two separate attractors in the perturbed model, which means that there is at least one node that has opposite states in these two attractors. Note that this bi-stability is the result of the perturbation and not a property of the original system as the maximal richness community (an attractor) is the starting point for the introduced extinction. Specifically, the inactive state of the extinct node has to lead to the stabilization of another node to its active versus inactive states in the two separate attractors. The only case in which the stabilization of an inactive node state can result in the stabilization of an active node state is if there is a negative edge from the former to the latter in the interaction network after simplification. Since the Boolean function in 2 is inhibitor dominant, the negative regulators that remain in the Boolean model must be in their inactive states in the maximal richness attractor. As they are already inactive (extinct), they are not candidates for extinction. The only nodes that are candidates for extinction are the ones that positively regulate other nodes; perturbing the system by fixing these candidates to their inactive states cannot lead to the active state of a target node. In conclusion, bi-stability is not possible.We found the new attractor of the system given the (combination of) inactive node state(s) using the the functions percolate_and_remove_constants() and trap_spaces() from the pyboolnet Python package. We quantify the effects of the initial extinction(s) on the maximal richness attractor by the percentage change in the number of active species, which we call damage percentage. Note that this choice of maximal richness community as the reference and starting point allows us to detect the cascading extinctions following the initial damage.In Fig. 4 the left column plots show the average damage percentage caused by the extinction of 1 (top panel), 2 (middle panel), or 3 (bottom panel) species that represent driver sets of stable motifs and motif groups, while the right column plots illustrate the average damage percentage as a result of the extinction of 1, 2 or 3 species that represent non-driver nodes. Comparing the two columns, one can notice that stabilization of the driver sets of stable motifs and motif groups leads to considerably larger damage to the communities. This is due to the fact that stabilization of driver sets ensures the stabilization of entire inactive stable motifs and motif groups and hence ensures cascading extinctions. Comparing the plots in the left column, we see that the larger the driver sets are, the larger the damage to the community becomes. This is because larger driver sets are more likely to stabilize larger stable motifs and motif groups. This figure illustrates the significance of stable motifs and their driver sets in the study of species extinction in plant–pollinator communities.Figure 4Histogram plots illustrating the average percentage of the damage caused in an established community after the extinction of species. This analysis is performed over 6000 networks with the size of 50–70 nodes. To study the extinction of species we started from the maximal richness community, then we fixed the nodes that correspond to the focal species to the their inactive states. The original extinctions are excluded from the damage percentages. The left column plots show the average damage percentage caused to the maximal richness community by the extinction of a driver set of size 1 (top), 2 (middle), or 3 (bottom) of an inactive stable motif or motif group. For each network, we determined all the relevant driver sets of one stable motif or motif group, we performed the extinction and calculated the resulting damage, then we calculated the average damage percentage over all data points collected for the same network. The right column plots show the average damage percentage caused to the maximal richness community by the extinction of 1 (top), 2 (middle), and 3 (bottom) non-driver, randomly chosen nodes. Each time a randomly selected combination of non-driver nodes were the subject of simultaneous extinction until all combinations are explored and then we calculated the average damage percentage over all data points collected for each network. The number of networks that qualify for each of these 6 categories differ (e.g., some networks have a stable motif with a driver set of size 2 but no stable motif with a driver set of size 3). In the left column 5529, 3212, and 1980 networks and in the right column 5779, 5626, and 5423 networks qualified respectively. The red lines represent the mean value of all the presented data points in each plot.Full size imageIn Fig. 4 left column, the full driver set of one inactive stable motif or motif group was stabilized. However, the species that become extinct might only contain a subset of a driver set of a stable motif or motif group, i.e., they only stabilize a subset of the inactive node states in the stable motif or motif group. We compare the extinction effect caused by the stabilization of a full driver set of four nodes with the effect of the extinction of four nodes that contain a partial driver set in Fig. 5 using the batch of the largest networks in this study, i.e, the batch that contains networks with 30 nodes representing plant species and 40 nodes representing pollinator species. This choice is due to the fact that the existence of stable motifs and motif groups having a driver set of four node states is highly probable in larger networks. As expected, the stabilization of the complete driver set leads to greater damage. Stabilization of the same number of nodes that contain a partial driver set leads to significantly less damage and species loss in the community; the median damage percentage in the case of stabilization of partial driver sets is 22.6% while it is 69.2% in the case of stabilization of the full driver sets. We also note that damage of more than 90% occurs rarely and is only possible when a full driver set is stabilized (see Fig. 5 right plot). This suggests that the motif groups that lead to total extinction tend to have a driver set with more than four nodes; in other words, only the simultaneous extinction of five or more species would lead to total community collapse.Figure 5Histogram plots illustrating the average percentage of the damage caused in an established community after the extinction of species. This analysis is performed over 1000 networks with the size of 70 nodes (30 nodes representing plant species and 40 nodes representing pollinator species). The original extinctions are excluded from the damage percentages. The left plot shows the average damage percentage caused to the maximal richness community by the extinction of 2 species that are a subset of the 4-node driver set of an inactive stable motif or motif group plus 2 randomly selected non-driver species. The right plot shows the damage percentage caused to the maximal richness community by the extinction of 4-node driver sets of the same inactive stable motifs and motif groups. Each time the driver set of one stable motif or motif group was the subject of extinction and we calculated the average damage percentage over all data points collected for each network. 295 networks qualified for this analysis.Full size imageMotif driver set analysis outperforms structural measures in identifying keystone speciesThe literature on ecological networks offers multiple measures that reflect the importance of each species for community stability. One family of such measures is centrality (quantified by the network measures degree centrality and betweenness centrality). Previous studies45,46 have shown that species (nodes) with higher centrality scores are keystone species in ecological communities (i.e., species whose loss would dramatically change or even destroy the community). The nodes with highest in-degree centrality (such as pl_2 in Fig. 6a) represent generalist species that can receive beneficial interactions from multiple sources and survive. The nodes with highest betweenness centrality (such as pl_2 and po_2 in Fig. 6a) represent species that act as connectors and help the community survive. We find that high centrality corresponds to specific patterns in the expanded network: the inactive state of generalist or connector species is often the driver of a cascading extinction. Indeed, stable motif analysis of the expanded network in Fig. 6b confirms that there is an inactive stable motif (highlighted with grey) driven by the minimal set {(sim )pl_2}. The fact that node pl_2 is a stable motif driver means that in the case of the extinction of pl_2 the whole community collapses.To compare the effectiveness of stable motif analysis to the effectiveness of the more studied structural measures to identify keystone species, we performed an analysis similar to the previous section. We compared the magnitude of cascading extinctions in the case of extinction of stable motif driver nodes and of nodes with high values of previously introduced structural importance measures. Specifically, we used node betweenness centrality, node contribution to nestedness47, and mutualistic species rank (MusRank)22 to find crucial species based on their structural properties. For more details on definition and adaptation of these two measures see “Methods”. In this analysis, we used each measure to target species in the simplified Boolean models as follows:

    1.

    Betweenness centrality: The 10% of species with the highest betweenness centrality are chosen to be candidates for extinction.

    2.

    Node contribution to nestedness: The species with the most interactions tend to contribute the least to the community nestedness. Targeting them most likely leads to a faster community collapse48. As a result, 10% of species with the lowest contribution to network nestedness are chosen to be candidates for extinction. For more details on this measure, please see “Methods”.

    3.

    Pollinator MusRank: The pollinator species with the highest MusRank importance are more likely to interact with multiple plants, so the 10% of pollinator species with the highest importance are chosen to be candidates for extinction. For more details on this measure, please see “Methods”.

    4.

    Plant MusRank: The plant species with the highest MusRank importance are more likely to interact with multiple pollinators, so the 10% of plant species with the highest importance are chosen to be candidates for extinction.

    Figure 7 illustrates the results of this analysis in 6000 networks with 50–70 nodes. In each network the 1-node, 2-node, and 3-node driver sets of inactive stable motifs are identified and made extinct. In the same networks 10% of nodes based on betweenness centrality, node contribution to nestedness, and node MusRank score were chosen to be candidates for extinction. To match the “driver set” data, all choices of 1, 2, or 3 nodes in these sets were explored and the damage was averaged over each extinction size for each network. We observe the cascading extinction and calculate the damage percentage relative to the maximal richness attractor. The plot represents the collective data over all initial simultaneous extinction sizes of 1, 2, and 3 species.Comparing the four methods, one notices that the histograms acquired using stable motif driver sets, node betweenness centrality, and node contribution to nestedness are very similar, showing a peak for the 10–20% bin of the damage, and a long tail that reaches a damage percentage of 80–100%. The MusRank score performs less well in identifying the crucial species. Also, the frequency of the higher damage percentages shows that node contribution to nestedness is the closest to the “driver set” method in identifying nodes whose extinction causes the collapse of the whole community, making it the best structural measure out of the three. Nevertheless, the driver set method finds keystone species that cannot be identified via structural measures, as the corresponding damage percentage histogram has the most prominent tail at the right edge of the panel. Indeed, stable motif driver sets identified 82%, 80%, and 546% more species whose extinction leads to 60% or higher damage to the community when compared to betweenness centrality, node nestedness, and node MusRank score based methods respectively.The reason for the higher effectiveness of driver set analysis is illustrated in Fig. 8 in which the MusRank score and node contribution to nestedness are calculated for two example networks. One can see how these two measures might incorrectly identify less vital species. In the left column of Fig. 8, MusRank identifies the node po_2, highlighted with green, as the most important species. However, this node does not have any outgoing edges; its extinction does not lead to any cascading extinction. The inability of the MusRank score to consider the direction of edges causes such misidentification. In the right column, the three nodes highlighted with yellow have the lowest contributions to network nestedness. The expanded network shows that these three nodes together are not able to cause full community collapse, while the three-node driver set of the inactive stable motif can. Since the nestedness definition depends on the number of mutual interactions, it might fail to identify some of the keystone nodes that are necessary to the stability of the community (for more details on node nestedness see “Methods”).Previously it was shown that identifying the stable motifs and their driver sets can successfully steer the system toward a desired attractor or away from unwanted ones37,38,43. Stable motif analysis of the Boolean model offers insight into the dynamical trajectories of the system; hence control strategies can be developed accordingly. In the next section we use stable motif driver sets to suggest control methods and analyze their efficiency.Figure 6Generalist species in the interaction network and the expanded network. (a) A simplified network consisting of 3 plant and 3 pollinator species. pl_2 is a generalist species, i.e., it has two incoming edges indicating that it can survive on either of its sources of pollination, po_1 or po_2. The expanded network in (b) illustrates that the stabilization of the grey stable motif stabilizes all the nodes to their inactive states, and hence causes full community collapse. (sim )pl_2 is the minimal driver set of the grey stable motif, consistent with the strong damage induced by the loss of a generalist species.Full size imageFigure 7Histogram plots illustrating the performance of driver set analysis versus structural measures in identifying keystone species. The analysis was done on 6000 networks with sizes of 50–70 nodes. The starting point is the maximal richness community, i.e., the attractor in which the most species establish. For each network 1, 2, and 3 node(s) were selected and simultaneously fixed to their inactive states. After the cascading damage the new attractor is compared to the maximal richness attractor to calculate the damage percentage. The structural measures—betweenness centrality, node nestedness contribution, and node MusRank score—were calculated for all nodes in each network; the top 10% according to the relevant ordering were candidates to being fixed to their inactive states. The network IDs were matched, i.e., only the networks that had candidate nodes according to all four measures for each extinction size are included in this plot. The total number of data points is 6360. The red solid lines represent the mean and the black dashed lines represent the median over all data points in each plot.Full size imageFigure 8Networks illustrating examples of when structural measures fail to identify keystone species. In both columns simplified networks consisting of 3 plant and 3 pollinator species are presented. The MusRank is calculated for all the nodes in the network in the left column and denoted in the node labels. The expanded network corresponding to this network is shown below. Node contribution to network nestedness is calculated for all the nodes in the network in the right column and denoted in the node labels. Similarly the expanded network that correspond to it is shown below. Note that these two networks have different edges. In the left column MusRank score identifies node po_2, highlighted with green, as the most important, while the expanded network shows that the extinction of po_2 does not cause any further damage to the community, as this node has no outgoing edges. This is due to the fact that MusRank calculation process fails to consider the directed network and replaces all the directed edges with undirected ones. The MusRank score does not identify po_3 as a crucial species; however, virtual node (sim )po_3, outlined with black in the expanded network is a driver of a stable motif that has all other nodes in its LDOI; the extinction of po_3 leads to full community collapse. In the right column, the nodes highlighted with yellow (pl_2, pl_3, and po_2) have the lowest node contribution to nestedness, which predicts that these nodes are likely crucial to the stability of the community. Analyzing the expanded network, one can see that these three nodes together are not able to drive the inactive stable motif highlighted with teal. The minimal driver set for this stable motif, outlined with black, consists of {(sim )po_1, (sim )po_2, (sim )po_3}; together these nodes drive the inactive stable motif and cause full community collapse. The nestedness-based measure was not able to capture the significance of nodes po2 and po_3.Full size imageDamage mitigation measures and strategies for endangered communitiesThere are two substantial questions related to managing the damage induced by species extinction: (1) How can one prevent the damage as much as possible? (2) Once the damage happens, the reintroduction of which species can restore the community and to what extent? In this section we aim to answer these questions in the context of the Campbell et al. model, implementing stable motif based network control. This analysis can inform agricultural and ecological strategies employed to prevent and mitigate damage.Damage preventionOne of the most important questions in ecology is what strategies to use so that we can prevent and avert extinction damage to the community. In this section we analyze how the knowledge from stable motif analysis and driver sets can be implemented to minimize the effect of extinction of keystone species in case of limited resources. Each attractor of the original system can have multiple control sets; stabilizing the node states in each control set ensures that the system reaches that specific attractor. The same information from the attractor control sets can be implemented to prevent the system from converging into unwanted attractors. Zañudo et al. illustrated that by blocking (not allowing to stabilize) the stable motifs that lead to the unwanted attractors, one can decrease the probability (sometimes to zero) that the system arrives in those attractors38. In order to block an attractor, the control sets of that attractor are identified and the negations of the node states in the control sets are externally imposed. This approach eliminates the undesired attractor; however, new attractors might form that are similar to the eliminated attractor. Campbell et al. showed that in order to avoid such new attractors one needs to block the parent motif, which in this case is the largest strongly connected subgraph of the expanded network that contains the inactive virtual nodes44. Here, we investigate how stable motif blocking based attractor control can identify the species whose preservation would offer the highest benefit in avoiding catastrophic damage to the community. This information would aid the development of management strategies in plant–pollinator communities.To avoid all attractors that lead to some degree of species extinction, one needs to block all the driver sets of all inactive stable motifs and motif groups in a given network. Implementing this in 100 randomly selected networks with 25 plant and 25 pollinator nodes, we found that 45.6% of the species in the maximal richness community need to be kept (prevented from extinction) to ensure the lack of cascading extinctions. Given that management resources are usually limited, active monitoring and conservation of almost half of the species in a community seems costly and impractical. Hence, we set a more feasible goal of identifying and blocking the driver set(s) of the largest inactive stable motif or motif group in each network. The same 100 networks containing 50 nodes are the subject of analysis in this section. The reason for performing the analysis in a relatively limited ensemble is that it involves the identification of all driver sets of the largest inactive stable motif or motif group, which is computationally expensive. For each network, the driver set of the largest inactive stable motif or motif group (which corresponds to the extinction of all the species in that group) is identified and blocked (that is, the corresponding species are not allowed to go extinct). Then the same number of species as in the driver set of that stable motif or motif group are selected and stabilized to their inactive state. We considered all combinations of node extinctions outside the blocked subset, calculated the damage percentage relative to the maximal richness community, and then averaged over all data points for each network. As a control, we repeated the analysis without blocking; the size of the initial extinction is the same as in the previous analysis for consistency.Figure 9 shows the result of the analysis described above for 100 networks. The left box and whiskers plot illustrates the damage percentage relative to the maximal richness community when the blocking feature is activated, while the right box and whiskers plot shows the damage percentage relative to the maximal richness community when the blocking is disabled. The average and median damage percentages are 14.96% and 13.04% respectively when the largest inactive stable motif or motif group was blocked and 24.73% and 20.38% when it was not. This (sim )10% difference in the average between the two sets of results, as well as the fewer cases of high-damage outliers in the left plot, demonstrates that by preventing the extinction of species identified by stable motif analysis, one can prevent catastrophic community damage considerably.To estimate the fraction of species that would need to be monitored to prevent their extinction, we compared the size of the maximal richness attractor and the size of the driver set of the largest stable motif. The maximal richness community represents an average of 32% of the original species pool, approximately 15 out of 50 species. The driver sets of the largest stable motifs had an average size of 2.5 node states over all 100 networks, i.e., about 16.6% of the maximal richness community. In ecological terms, given limited resources, the information gained from stable motif driver sets can help direct the conservation efforts toward the keystone species that play a key role in maintaining the rest of the community in a cost-effective manner.Figure 9Box plots comparing the damage communities face if the largest inactive stable motif or motif group is completely blocked, i.e., all the drivers of this inactive stable motif or motif group are prevented from stabilizing versus if the same stable motif or motif group is allowed to stabilize. This analysis was performed over 100 randomly selected networks that contain 25 plant and 25 pollinator nodes. All the driver sets of an inactive stable motif or motif group are identified. From left to right the box and whiskers plots show the average damage percentage relative to the maximal richness community if the largest inactive stable motif is blocked and the same quantity if the largest stable motif or motif group is not blocked respectively. For the left box and whiskers plot, all combinations of inactive node states except the driver sets are considered, and for the right box and whiskers plot all combinations are explored. Due to the computational complexity caused by combinatorial explosion, this analysis was performed over 100 randomly selected 50-node networks.Full size imageRestoration of a group of speciesAlthough human preservation efforts have been directed toward community conservation, there are many industrial activities that lead to ecosystem degradation. Ecologists are interested in developing restoration strategies to be deployed after a stable community is hit by catastrophic damage to recover biodiversity and the ecosystem functions it provides49. Here we propose that stable motif analysis and the driver sets identified from the expanded network can give insight into restoration measures. While we examined the inactive stable motifs in the study of species extinction, here we focus on the active stable motifs as our goal is to restore as much biodiversity as possible.Several network measures have been proposed to identify the species that if re-introduced would restore the community considerably. Two of the most studied algorithms include maximising functional complementarity (or diversity) and maximising functional redundancy50. The first strategy targets the restoration of the species that provide as many functions to the ecosystem as possible; this approach results in a community that has a maximal number of functions provided by different groups of species. Alternatively, maximising the functional redundancy yields a community in which several species perform the same function. While this resultant community might have a limited number of functions, it is robust. Both of these community restoration approaches have been studied extensively (e.g. see21).We hypothesize that restoring the species that constitute driver sets of active stable motifs can help maximise the number of species post-restoration. Since there is evidence that functional diversity correlates with the number of species in the community51, we compare the post-restoration communities identified by stable motif driving with the functional diversity maximisation approach. As discussed in section LDOI in the Boolean threshold model, the Boolean simplification of the threshold functions leads to an overestimation of the LDOI of active node states (compared to the original threshold functions) in some networks. We evaluate the negative effects of this overestimation by checking the effectiveness of the restored species in the original threshold model.The same 6000 networks we examined in the last section were the subject of this analysis. To create an unbiased initial community, we create the damaged communities by eliminating the same number of species from the maximal richness community as the number that will be restored. We identify the inactive stable motif or motif group with the driver set size of 1, 2, or 3 node states that causes the most damage to the maximal richness community. We then eliminate the species corresponding to this driver set to reach the most damaged community for the given size of the initial extinction. This community is the starting point for two analyses. In the stable motif driving approach we stabilized an active stable motif that has a driver set of the same size as the initial extinction to reach a post-restoration community and calculated the percentage of the extinct species that were restored. In the functional diversity maximization based approach we re-introduced the same number of species selected from the to 10% of species in terms of their contribution to functional diversity.To calculate the functional diversity of a community one needs to (1) define and construct a trait matrix, (2) determine the distance (trait dissimilarity) of pairs of species, (3) perform hierarchical clustering based on the distances to create a dendrogram, and (4) calculate the total branch length of the dendrogram, i.e., the sum of the length of all paths51,52. Petchey et al. argued that resource-use traits among plant and pollinator species can be used to classify the organisms into separate functional groups53 and Devoto et al. proposed the use of the adjacency matrix based on the interaction network as the trait matrix21. In this study we do the same and implement the bipartite adjacency matrix to construct the distance matrix.Since the networks of the Campbell et al. model are directed, we modify the algorithm in that we have two separate adjacency matrices, one denoting the edges incoming to plant species and the other denoting the edges incoming to pollinator species. The hierarchical clustering algorithm is then run on each of these matrices separately, resulting in a dendrogram for each adjacency matrix. If extinction occurs in a community, the functional diversity of the survived community can be determined by calculating the total branch length of the subset of the dendrogram that includes only the survived species. The restoration strategy using this method is to re-introduce the nodes whose branches add the most to the total branch length of this subset, i.e., maximise the functional diversity of the survived community54. For more details see “Methods”.In each network, the percentage of the extinct species that were restored was calculated and averaged over all data points for each restoration size and each network. Figure 10 illustrates the results of this investigation. Applied to the simplified Boolean model, the median restoration percentage in the case of active stable motif driver set method (blue plot) is 80%. The functional diversity maximization strategy to restoration (yellow plot) yields a lower median restoration percentage, 73%, as well as a large number of low-restoration outliers. Although one might argue that identifying beneficial species using the functional diversity maximization strategy works well, the higher percentage of the cases of 80–100% restoration in case of the active stable motif driver set analysis indicates that the latter identifies some of the most effective restorative species that are not identified via the former method. As in a minority of cases the simplified Boolean model overestimates the positive impact of the sustained presence of a species (see section LDOI in the Boolean threshold model), we sought to verify the effectiveness of the predicted restoration candidates in the original threshold model. The blue (respectively, yellow) box and whiskers plot on the right represents the restoration percentages of the same species as in the left blue (respectively, yellow) plot when these species are restored in the threshold model. The median of the right blue plot is 70%, while the median of the right yellow is 63%, preserving the advantage of the stable motif driver sets. We conclude that although the simplified Boolean model overestimates the restoration effectiveness of certain driver sets (visible in the fact that the lower whisker of the blue plot on the right goes well below the lower whisker of the blue plot on the left), stable motif driver sets are more effective in both comparisons.Figure 10Box and whiskers plots illustrating the average percentage of the extinct species that are restored following the stable motif driver set restoration strategy (blue) versus the functional diversity based approach (yellow). This analysis is performed over 6000 networks with sizes of 50–70 nodes. Starting from the maximal richness community, for each network one inactive stable motif with a driver set of 1, 2 or 3 nodes was stabilized to reach a new damaged community. This task was performed until the community with the most extinct species was identified. This is the community we set as the starting point for the restoration process using both methods. The pair on the left represents the two methods applied to the simplified Boolean model. For both methods we identified 1, 2, or 3 influential nodes for community restoration and we calculated the percentage of the extinct species that could be restored. The pair on the right represents restoring the same species identified by each method in the previous analysis in the original threshold model. In all analyses the community restoration percentage was averaged over all combinations of the same size, for each network and each method. The IDs of all networks are matched.Full size imageCommunity restoration via attractor controlAs illustrated in section “Restoration of a group of species”, stable motif analysis identifies promising and cost-effective group restoration strategies. In this section we aim to go further and identify interventions that can maximally restore a community. Previous stable motif based network control methods37,38,55 require a search for the smallest set of node states to control the system once the stable motif stabilization trajectories are identified. This smallest set may not contain a node from each stable motif in the sequence. In this work, however, we know that each stable motif or motif group needs to be controlled individually28 because the stabilization of none of the motifs results in the stabilization of another. As a result, the control set of each attractor is the same as the union of the driver sets of all members in the consistent combination corresponding to that attractor.In this section we examined this attractor control method by setting the communities with 70% or more of the species in the maximal richness community as the target, i.e., the attractors that have 70% of the species in the maximal richness community are assumed to be the desired attractors. We then recorded the size of the minimal control set needed to achieve each of these attractors. Note that stabilizing each of these control sets guarantees that the system reaches the corresponding attractor38.For this section, we analyzed 6000 networks that have 50–70 nodes. Figure 11 represents box-and-whiskers plots of the size of the minimal set of species that need to be restored, where the target community sizes are classified into three groups based on the percentage of the species relative to the maximal richness attractor. One can see that in half of the cases, the restoration of either 1 or 2 species manages to restore more than 70% of the maximal richness community. The largest set has 8 species that need to be restored; however, this data point is an outlier. As illustrated, driver set analysis and stable motif based attractor control can efficiently identify the species that play an influential restorative role and suggest management strategies that are effective at the scale of the whole community. To assess the impact of the LDOI inflation on this result, we used the restoration candidates identified by control sets of the attractors of the Boolean model in the threshold functions of a subset of networks. The results of comparing the restoration percentage is shown in Fig. 14. The first quartile, median and third quartile values are 78.26%, 86.6%, and 100% for the simplified Boolean models and 43.78%, 72.41%, and 85.71% for the threshold model.To further compare the results of restoration obtained from the two models we sorted the species in the order of their contribution to community restoration following a catastrophic damage. We randomly selected 100 of the largest (70-node) networks, which have the highest probability of a discrepancy between the threshold functions and the simplified Boolean model. In 72% of the cases the two rankings matched completely, and in the majority of the remaining cases only one species was misplaced in the simplified Boolean model-based ranking. To conclude, there is a significant advantage to the implementation of the simplified Boolean model and the drawback can be addressed by a follow-up checking on the original threshold functions.Figure 11The number of species that need to be restored to save 70% of more of the species in the maximal richness community. In this analysis 6000 networks with 50–70 nodes were the subject. For each networks all the attractors that have 70% or more of the species in the maximal richness attractor are identified and set to be the target attractors. The control set of these attractors are then classified into three groups based on the percentage as illustrated in the figure. From left to right, the box and whiskers represent the size of the control set of attractors that have 70–80%, 80–90%, and 90–100% of the species in the maximal richness attractor respectively.Full size image More

  • in

    Plant nitrogen retention in alpine grasslands of the Tibetan Plateau under multi-level nitrogen addition

    Study siteThe field experiment was conducted at Namco Station (30°47’N, 90°58’E, altitude 4730 m) of the Institute of Tibetan Plateau Research, Chinese Academy of Sciences (ITPCAS), which is located in the alpine steppes of TP in China. The experiment was permitted by ITPCAS, complied with local and national guidelines and regulations. From 2006 to 2017, the mean annual temperature (MAT) and mean annual precipitation (MAP) was about − 0.6 °C and 406 mm, respectively. Monthly mean temperature varied from − 10.8 °C in January to 9.1 °C in July and most of the precipitation occurred from May to October37,38. During our six-year observations (2010, 2011, 2012, 2013, 2015 and 2017), climate change during the growing season from May to September varied differently, with the annual precipitation ranged from 255.9 mm to 493.8 mm and the MAT from 6.7 to 7.4 °C. Androsace tapete, Kobresia pygmaea, Stipa purpurea and Leontopodium pusillum were the dominant plant species at the alpine steppe.Experimental design and treatmentsThe long-term experiment began in May, 2010. Three homogenous plots were randomly arranged as replicates at the alpine steppe and six subplots (~ 13 m2) were distributed in each plot by a cycle, with a 2 m buffer zone between each adjacent subplot (Appendix S1: Fig. S1). In this experiment, six treatments of N fertilization rate (0, 1, 2, 4, 8, and 16 g N m−2 yr−1) were clockwise applied in each subplot. The subplots of 0 g N m−2 yr−1 were control group. We sprayed NH4NO3 solution on the first day of each month in the growing season (from May to September) each year. After fertilizing, we rinsed plant residual fertilizer with a little deionized water (no more than 2 mm rainfall). For the control groups, we added equivalent amount of water. The experiment was conducted from 2010 to 2017 (it should be pointed out that there was no fertilization in 2014 and 2016).Sampling and measurementsThe samples were collected with the training and permission of ITPCAS and involved plants that are common species and not endangered or protected. The identification of the plants was done by referring to a book of Chen and Yang39. Pictures of the corresponding specimens can be seen on the website of ITPCAS (http://itpcas.cas.cn/kxcb/kxtp/nmc_normal_plant/).Vegetation samples were collected in August in 2011 and repeated at the same time in 2012, 2013, 2015 and 2017. We established one 50 × 50 cm quadrat in each subplot, clipped aboveground biomass (AGB) and sorted species by families. The biomass was used to measure ANPP (g m−2 yr−1). Following aboveground portion collected, we used three soil cores (5 cm diameter) to collect the belowground roots at 0–30 cm depth and mixed into one sample, which were used to assess belowground net primary productivity (BNPP, g m−2 yr−1). The roots were cleaned with running water to remove sand and stones.Both plant and root samples were dried at 75 °C for 48 h and then ground into powder (particle size ~ 5 μm) by a laboratory mixer mill (MM400, Retsch). To determine N and C content of plants, we weighed the samples into tin capsules and measured with the elemental analyzer (MAT253, Finnigan MAT GmbH, Germany).Estimation of the critical N rate (Ncr), N retention fraction (NRF), N retention capacity and N-induced C gainAccording to the N saturation hypothesis, plant productivity increases gradually during N addition, reaches a maximum at the Ncr, and eventually declines16,17. We considered the Ncr to be the rate where ANPP no longer remarkably changed with N addition (Fig. 1).We defined plant N retention fraction (NRF, %; Eq. 1) as the aboveground N storage caused by unit N addition rate, and N retention capacity (g N m−2 yr−1; Eq. 2) was the increment of N storage due to exogenous N addition compared to the control40. The equations are as following:$$N;retention;fraction = frac{{ANPP_{tr} times N;content_{tr} – ANPP_{ck} times N;content_{ck} }}{N;rate}$$
    (1)
    $$N;retention;capacity = ANPP_{tr} times N;content_{tr} – ANPP_{ck} times N;content_{ck}$$
    (2)
    where ANPPtr and N contenttr (%) refer to those in the treatment (tr) groups, and ANPPck and N contentck refer to those in the control (ck) groups. These expressions are also used in the following equations (Eqs. 3–5).The N-induced C gain (g C m−2 yr−1; Eq. 3) was estimated by the increment of C storage owing to exogenous N addition compared to the control40. Maximum N retention capacity (MNRC, Eq. 4) and maximum N-induced C gain (Eq. 5) mean the maximum N and C storage increment in plant caused by exogenous N input at Ncr, respectively. The formulas are as following:$$N{text{-}}induced;C;gain = ANPP_{tr} times C;content_{tr} – ANPP_{ck} times C;content_{ck}$$
    (3)
    $$MNRC = ANPP_{max } times N;content_{max } – ANPP_{ck} times N;content_{ck}$$
    (4)
    $$Maximum;N{text{-}}induced;C;gain = ANPP_{max } times C;content_{max } – ANPP_{ck} times C;content_{ck}$$
    (5)
    where ANPPmax, N contentmax and C contentmax refer to the value of ANPP, N content and C content at Ncr, respectively.Data synthesisTo evaluate N limitation and saturation on the TP more accurately, we searched papers from the Web of Science (https://www.webofscience.com) and the China National Knowledge Infrastructure (https://www.cnki.net). The keywords used by article searching were: (a) N addition, N deposition or N fertilization, (b) grassland, steppe or meadow. Article selection was based on the following conditions. First, the experimental site must be conducted in a grassland ecosystem. Second, the experiment had at least three N addition levels and a control group. Third, if the experiment lasted for many years, we analyzed data with multi-year average. Based on the above, we collected 89 independent experimental cases. Among these, 27 cases were located on the TP alpine grasslands, 25 in the Inner Mongolia (IM) grasslands and 37 in other terrestrial grasslands (detailed information sees Appendix S2: Table S1).We extracted ANPP data and N addition rate of these cases and estimated Ncr and ANPPmax (Appendix S2: Fig. S2). We then calculated NRF, N retention and C gain of each group of data for further analysis (Appendix S2: Table S2). Most of the 89 cases did not have data on N and C content. To facilitate the calculation, we summarized N and C content from 40 articles in the neighboring areas of the cases and divided the N and C content into seven intervals according to the N addition rate (Appendix S2: Table S3 and Fig. S3). The unit of N addition rate was unified to “g N m−2 yr−1”. All the original data were obtained directly from texts and tables of published papers. If the data were displayed only in graphs, Getdata 2.20 was used to digitize the numerical data. For the estimation of N retention and C gain of the TP at current N deposition rates and future at Ncr, we fitted the exponential relationship to the data from 27 cases on the TP, and then substituted N rates into the fitted equations (Eq. 6):$$y = a times left[ {1 – exp left( { – bx} right)} right].$$
    (6)
    We also included MAT, MAP, soil C:N ratio, fencing management (fencing or grazing) and grassland type (meadow, steppe and desert steppe) of the experiment sites for exploring the drivers affecting N limitation (Appendix S2: Table S1). When climatic data were missing from the article, MAT and MAP were obtained from the WorldClim (http://www.worldclim.org).Species were usually divided into four functional groups (grasses, sedges, legumes and forbs) to study the response of species composition to N addition in previous study41. We synthesized 13 TP experimental cases (including our field experiment) from the data synthesis and each case included at least three functional groups (detailed references see Appendix S2).Statistical analysisThere were 42 species in our field experiment. We divided them by family into eleven groups: Asteraceae (forbs), Poaceae (grasses), Leguminosae (legumes), Rosaceae (forbs), Boraginaceae (forbs), Caryophyllaceae (forbs), Cyperaceae (sedges), Labiatae (forbs), Primulaceae (forbs), Scrophulariaceae (forbs) and Others. Due to species in the group of Others contributed only 1.22% of AGB, we analyzed AGB and foliar stoichiometry among other ten families (Appendix S1: Table S1). In Namco steppe, forbs, grasses, sedges and legumes accounted for 78.0%, 7.4%, 8.2% and 5.2% of the AGB respectively (Appendix S1: Table S1 and Fig. S2). Such a large number of forbs suggested that our experiment was conducted on a severely degraded grassland.For our field data, two-way ANOVAs were used to analyze the effects of year, N fertilization rate and their interactions on species AGB. One-way ANOVAs were used to test the response of ANPP, BNPP, root:shoot ratio, species foliar C content, N content and C:N ratio to N addition rate. Duncan’s new multiple range test was used to compare the fertilization influences at each rate in these ANOVAs. Prior to the above ANOVAs, we performed homogeneity of variance test and transformed the data logarithmically when necessary. Simple regression was used to estimate the relevance among ANPP, NRF, N retention capacity and C gain with N addition rates.Structural equation modeling (SEM) was used to explore complex relationships among multiple variables. To quantify the contribution of drivers such as climate and soil to Ncr, ANPP, NRF and MNRC, we constructed SEM based on existing ecological knowledge and the possible relationships between variables. We considered environmental factors (MAT, MAP and soil C:N) and ANPPck as explanatory variables, and Ncr, NRF and MNRC as response variables. We included the ANPPck in the SEM rather than the ANPPmax because we wonder whether there was a relationship between ANPP in the absence of exogenous N input and the ecosystem N retention in the presence of N saturation. This has important implications for assessing N input. Before constructing the SEM, we excluded collinearity between the factors. In addition, Student’s t-test and one-way ANOVAs were performed to explain the effect of fencing management and grassland type on above response variables, respectively. The SEM was constructed using the R package “piecewiseSEM”42. Fisher’s C was used to assess the goodness-of-model fit, and AIC was for model comparison.Given the influence of extreme values in the data synthesis, we calculated the geometric mean of Ncr, NRF, N retention and N-induced C gain. All statistical analyses were performed with SPSS 26.0 and RStudio (Version 1.2.1335) based on R version 3.6.2 (R Core Team, 2019). More

  • in

    Smart forest management boosts both carbon storage and bioenergy

    Timothy Searchinger and his colleagues raise concerns that the European Union’s plan to produce energy from biomass could compromise forest carbon stocks and biodiversity (Nature 612, 27–30; 2022). However, it is possible for improved forest management to reconcile increased bioenergy production by maintaining and restoring forest ecosystems.
    Competing Interests
    The authors declare no competing interests. More

  • in

    Human activities favour prolific life histories in both traded and introduced vertebrates

    Data collectionWe obtained trade data from two different sources: the United States Fish and Wildlife Service (USFWS) Law Enforcement Management Information System (LEMIS)31 and the International Union for Conservation of Nature (IUCN) Red List32. We used the former to obtain data on the live wildlife trade in general and the latter for data on the pet trade specifically. We then matched trade data with our previously compiled global scale datasets of life history traits and introductions in mammals, reptiles and amphibians25,26.We obtained data on the US live wildlife trade from LEMIS by a Freedom of Information Act Request on 12/08/2019. We requested summary data on all US imports and exports of wildlife across all available years (1999-2019) and all trade purposes, including information on species identities and shipment contents (e.g. live individuals, meat, skins, etc.). For each species, we summed the total number of recorded shipments of live individuals (including individuals that died in transit, and live eggs) as a measure of trade frequency. We classified species as in trade if there was at least one shipment of live individuals recorded in the LEMIS database, and as not traded otherwise. The LEMIS dataset is geographically limited to trade by the US, and therefore may not capture the full diversity of species involved in the wildlife trade. For example, the LEMIS database may be missing some species involved in the substantial trade in live wildlife between South–East Asian countries50. However, the US represents one of the most dominant players in the global market for live wildlife16, and by summing both imports and exports we capture demand for species in countries beyond the US to some extent. Supplementary Fig. 2 illustrates the frequency of trade between the US and countries represented in the US LEMIS dataset. LEMIS data should be considered a minimum estimate of the diversity of species involved in the wildlife trade since they mostly record only legal trade (although confiscated shipments are recorded), and shipments are sometimes not identified to the species level16,51,53,53. The LEMIS database also contains some mis-spelled and incorrectly identified species due to human input errors52. To minimise the effect of misidentified shipments on our species level classifications of US trade status, we discarded all LEMIS records that were not identified to the species level (i.e. those identified using genus, common or generic names only), and manually checked the LEMIS data for synonyms and alternate spellings when we could not automatically match any records in LEMIS with species in our life history datasets. Species classified as traded on the basis of a single recorded live shipment in LEMIS are most vulnerable to species level misclassification due to misidentified shipments. The vast majority of traded species have multiple shipments recorded in LEMIS (259/312 [83%] of traded mammals, 265/285 [93%] of traded reptiles and 72/75 [96%] of traded amphibians), reducing the potential impact of shipment level misidentification over the reliability of species level trade classifications. However, to investigate the robustness of our findings to possible errors in species identification in LEMIS, we re-ran our key analyses excluding species classified as traded on the basis of a single live shipment. We found qualitatively the same effects of life history traits on the probability of trade when removing these species as in our full sample (Supplementary Tables 25–27). Despite its limitations, LEMIS is an invaluable resource for identifying broad scale trends in the wildlife trade since few other countries maintain such detailed records, and it is the only large-scale international trade dataset that includes both CITES- and non-CITES-listed species16,41. Including non-CITES listed species in our analyses is important because CITES-listed species represent only a small minority of those in trade14 and are likely to be a biased sample in terms of life history traits, since species vulnerable to extinction typically have slower life histories40.We obtained separate data on the pet trade from the IUCN Red List. The IUCN has assessed the vast majority of mammal, reptile and amphibian species (91%, 79% and 86% respectively54). Here, we classified a species as involved in the pet trade if the IUCN species account included at least one clear description of involvement in the pet trade. Otherwise, we considered a species as not involved in the pet trade. Although LEMIS records the purpose of trade, it uses broad categories (e.g. ‘Commercial’, ‘Personal’, ‘Breeding in captivity’), none of which refers specifically to nor necessarily equates to trade for pets. Therefore, we sought this additional data on the pet trade from the IUCN Red List instead of following the approach of some previous studies which have used LEMIS data as a proxy for the pet trade (e.g. Refs. 15,19). In contrast, the IUCN Red List contains clear textual descriptions of use and trade for many species, allowing us to identify which species are traded specifically for pets32. The IUCN data has further complementary strengths compared with LEMIS in that it is global in scope and includes both legal and illegal trade. We obtained data from the IUCN Red List by manually searching the binomial name of each species in our samples and consulting the ‘Threats’ and ‘Use and Trade’ sections of the species accounts. We classified species as in the pet trade if the information clearly stated this was the case (e.g. “It has been recorded in the pet trade”, “This species appears in the international pet trade”). We discounted descriptions where the information was uncertain (e.g. the species is described as “probably” or “possibly” traded for pets). We did not count as pets those species that the IUCN categorises as used for “Pets/display animals, horticulture” but which are used only for zoos or captive display, such as beluga whales (Delphinapterus leucas). All species described as pets by the IUCN are ‘exotic’, i.e. those without a long history of domestication14, since the IUCN does not list domesticated species.We matched trade data with our previously published global scale datasets on life history traits and introductions25,26. Internationally traded species may or not be released in the wild outside their native range: some may remain in the confines of captivity (e.g. in zoos or kept by private owners). We defined a species as introduced if there was at least one reliable record of its release, by humans, into the wild outside of its native range, either accidentally or intentionally25,26. We included only species with complete data for the same life history traits as used in our prior analyses (mammals: body mass, gestation period, weaning age, neonatal body mass, litter size, litters per year, age at first reproduction and reproductive lifespan; reptiles: body mass, hatchling mass, clutch size, clutches per year, age of sexual maturity, reproductive lifespan and parity; amphibians: snout-vent length, egg size, clutch size, age of sexual maturity and reproductive lifespan) to facilitate direct comparisons with previous results and to allow us to account for covariation between life history traits55. Species with complete life history data represent 7.8%, 3.5% and 1.6% of the total estimated number of species of mammals, reptiles and amphibians respectively56,57,58. These samples are not random as they over-represent orders containing many species of interest and utility to humans (e.g. ungulates, primates, crocodilians) (Supplementary Tables 28–30). However, these biases are unlikely to undermine our results since we examine life history effects on trade and introduction within these samples. Trade and introduction data do not necessarily cover the same time periods: the US dataset covers only the years 1999-present and the IUCN descriptions also typically refer to recent trade. In contrast, our introduction dataset includes both historical and recent introductions25,26. Therefore, the goal of our analyses is not to test causal hypotheses on the direct relationship between trade and introduction but rather to investigate whether the same life history traits predispose species towards both trade and introduction across diverse taxa, locations and circumstances. When combining the datasets and phylogenies59,60,61,62,63, we resolved species name mis-matches by referring to taxonomic information from the IUCN Red List32, the Global Biodiversity Information Facility (GBIF33) and the Integrated Taxonomic Information System (ITIS64). Table 1 summarises final sample sizes and Supplementary Table 1 the degree of overlap between the trade datasets. Most species in the pet trade are also in the general live wildlife trade, but many more species are traded by the US for general purposes than are involved in the pet trade specifically.Finally, we obtained data for a proxy measure of species detectability in order to control for a potential confounding effect on relationships between life history traits and introduction: larger bodied and longer-lived species may be more likely to be recorded by human observers when introduced compared with smaller and shorter-lived species. We obtained data on species occurrence records, geographic range size and population density, assuming that highly detectable species will have a disproportionately large number of recorded observations than expected based on the size of their geographic ranges and average population densities, following similar approaches by e.g. Refs. 65,66. We obtained occurrence records from the Global Biodiversity Information Facility (GBIF33) via the R package rgbif67 selecting only records resulting from human observation. We obtained range sizes (in decimal degrees squared) from the IUCN Red List32 and processed them for analysis using functions from the rgdal package68, excluding areas of uncertain presence (i.e. limiting range to presence code 1, ‘extant’). We obtained population density estimates from the TetraDENSITY database (version 134), a global database of population density estimates for terrestrial vertebrates. Most species in the TetraDENSITY dataset are represented by estimates from multiple different studies (median = 3, range 1–408). We collapsed density estimates to the species level by taking the median value across studies, including all estimates regardless of sampling method to maximise sample size, and converting all units to individuals/km2 to ensure comparability.Statistical analysesTo investigate relationships between life history traits and trade, we run models treating US or pet trade as the outcome variable and life history traits as the predictors. For all analyses, all life history variables were included in the same models to account for covariation among life history traits55. For US trade, where data on trade frequency are available, we run models both in which trade is treated as a binary variable (traded vs. not traded) and as a count variable (frequency of live shipments, including zero values), while for the pet trade, we have no data on trade frequency and so we treat pet trade as a binary variable only. To investigate the effects of life history traits on introduction, we run models in which introduction is the outcome variable and life history traits are the predictors. In introduction models, we only include traded species (running separate models for the set of species in US trade and the set of species in the pet trade). This approach allows us to disentangle effects associated with trade and introduction and thus identify at which stage(s) life history biases emerge. We also run introduction models in which frequency of US trade is included as an additional predictor alongside life history traits, anticipating that highly traded species are more likely to be introduced. Finally, to investigate possible confounding effects of species detectability on relationships between life history traits and introduction, we investigate effects of number of observations, geographic range size and, where sample sizes allowed, population density on the probability of introduction. If highly detectable species are more likely to be recorded as introduced, we expect to find a positive effect of the number of observations (while accounting for geographic range size and population density) on the probability of introduction. If this effect confounds relationships between body mass/lifespan and introduction, the effect of these life history traits on the probability of introduction should disappear when detectability measures are included in the models alongside life history traits. All analyses were conducted using the R statistical programming environment (Version 4.2.069). Plots were coloured using palettes from the viridis package70.To estimate effects of predictor variables, we fit generalized linear mixed models (GLMMs) using Markov chain Monte-Carlo (MCMC) estimation, implemented in the MCMCglmm package35,36. For analyses with binary outcome variables (traded vs. not traded, introduced vs. not introduced) we fit probit models, while for analyses with US trade frequency as the outcome variable we fit hurdle models. Hurdle models estimate two latent variables: the probability that the outcome is zero (on the logit scale), and the probability of the outcome modelled as a Poisson distribution for non-zero values71. This method therefore allows us to estimate effects of life history traits on the probability and frequency of trade in the same model. While the binary component of a hurdle model estimates the probability that outcomes are zero, when reporting results we reverse the sign of coefficients from the binary model for ease of interpretation, so that effects can be interpreted as the probability that the outcome is not zero. Therefore, here predictors with consistent effects on the probability and frequency of trade in hurdle models will have the same sign (so that if, for example, litter size has a positive effect on both the probability and frequency of trade, both coefficients for litter size from the hurdle model will be positive).Datasets comprising biological measures from multiple related species violate the fundamental statistical assumption that observations are independent of one another, since closely related species are more phenotypically similar than expected by chance due to their shared evolutionary history72. To account for the non-independence of species due to shared ancestry, we included a phylogenetic random effect in all models, represented by a variance-covariance (VCV) matrix derived from the phylogeny. The off-diagonal elements of the VCV matrix contain the amount of shared evolutionary history for each pair of species35,37,38 based on the branch lengths of the phylogeny (here proportional to time)59,61,62,63,63. This approach allows us to estimate phylogenetic signal using the heritability (H2) parameter, which measures the proportion of total variance in the latent variable attributable to the phylogeny35,37,38. Heritability is interpreted in the same way as Pagel’s λ in phylogenetic generalized least squares regression35,37,38,72. Specifically, phylogenetic signal is constrained between 0, indicating no phylogenetic effect so that species can be treated as independent, and 1, indicating that similarity between species is directly proportional to their amount of shared evolutionary history35,38,72. As hurdle models estimate two latent variables, for each hurdle model we report two heritability estimates, one for the binary and one for the Poisson component. All continuous independent variables were log-10 transformed due to positively skewed distributions. Although GLMMs do not require normally distributed predictor variables, log-transforming positively skewed life history predictors in phylogenetic comparative analyses allows us to model life history evolution on proportional rather than absolute scales. This is important as it facilitates biologically meaningful comparisons between species across large scales of life history variation73. Further, log-transforming positively skewed predictors helps to meet assumptions of the underlying Brownian motion model of evolutionary change, which assumes that phenotypic change along branches of the phylogeny is normally distributed74.We calculated variance inflation factors (VIFs) using functions from the car R package75 to check for multicollinearity between predictor variables. Where any model reported a variance inflation factor of 5 or above, indicating potentially problematic levels of collinearity76, we re-ran the model removing the variable with the highest VIF iteratively until all the remaining variables had VIFs of More

  • in

    Genetic and ecological drivers of molt in a migratory bird

    Stefansson, S. O., Björnsson, B. T., Ebbesson, L. O. E. & McCormick, S. D. Smoltification. In Fish Larval Physiology (eds Finn, R. N. & Kapoor, B. G.) 639–681 (CRC Press, 2020).Chapter 

    Google Scholar 
    Kaleka, A. S., Kaur, N. & Bali, G. K. Larval development and molting. In Edible Insects (ed. Mikkola, H.) 17 (IntechOpen, 2019).
    Google Scholar 
    Butler, L. K. & Rohwer, V. G. Feathers and molt. in Ornithology: Foundation, Analysis, and Application (eds Morrison, M. L. et al.) 242–270 (JHU Press, 2018).
    Google Scholar 
    Swaddle, J. P., Witter, M. S., Cuthill, I. C., Budden, A. & McCowen, P. Plumage condition affects flight performance in common starlings: Implications for developmental homeostasis, abrasion and moult. J. Avian Biol. 27, 103–111 (1996).Article 

    Google Scholar 
    Norris, D. R., Marra, P. P., Montgomerie, R., Kyser, T. K. & Ratcliffe, L. M. Reproductive effort, molting latitude, and feather color in a migratory songbird. Science 306, 2249–2250 (2004).Article 
    ADS 
    CAS 

    Google Scholar 
    Delhey, K., Peters, A. & Kempenaers, B. Cosmetic coloration in birds: Occurrence, function, and evolution. Am. Nat. 169, S145–S158 (2007).Article 

    Google Scholar 
    Tomotani, B. M. & Muijres, F. T. A songbird compensates for wing molt during escape flights by reducing the molt gap and increasing angle of attack. J. Exp. Biol. 222, 195396 (2019).Article 

    Google Scholar 
    Galván, I., Negro, J. J., Rodriguez, A. & Carrascal, L. M. On showy dwarfs and sober giants: Body size as a constraint for the evolution of bird plumage colouration. Acta Ornithol. 48, 65–80 (2013).Article 

    Google Scholar 
    Speakman, J. R. & Król, E. Maximal heat dissipation capacity and hyperthermia risk: Neglected key factors in the ecology of endotherms. J. Anim. Ecol. 79, 726–746 (2010).
    Google Scholar 
    Wolf, B. O. & Walsberg, G. E. The role of the plumage in heat transfer processes of birds. Am. Zool. 40, 575–584 (2000).
    Google Scholar 
    Berthold, P. & Querner, U. Genetic basis of moult, wing length, and body weight in a migratory bird species, Sylvia atricapilla. Experientia 38, 801–802 (1982).Article 

    Google Scholar 
    Gwinner, E., Neusser, V., Engl, D., Schmidl, D. & Bals, L. Haltung, Zucht und Eiaufzucht afrikanischer und europäischer Schwarzkehlchen Saxicola torquata. Gefied. Welt 111, 118–120 (1987).
    Google Scholar 
    Berthold, P. & Querner, U. Microevolutionary aspects of bird migration based on experimental results. Isr. J. Ecol. Evol. 41, 377–385 (1995).
    Google Scholar 
    Helm, B. & Gwinner, E. Timing of postjuvenal molt in African (Saxicola torquata axillaris) and European (Saxicola torquata rubicola) stonechats: Effects of genetic and environmental factors. Auk 116, 589–603 (1999).Article 

    Google Scholar 
    Helm, B. & Gwinner, E. Timing of molt as a buffer in the avian annual cycle. Acta Zool. Sin. 52, 703–706 (2006).
    Google Scholar 
    Rohwer, S., Ricklefs, R. E., Rohwer, V. G. & Copple, M. M. Allometry of the duration of flight feather molt in birds. PLoS Biol. 7, e1000132 (2009).Article 

    Google Scholar 
    Jenni, L. & Winkler, R. The Biology of Moult in Birds (Bloomsbury Publishing, 2020).
    Google Scholar 
    Tonra, C. M. & Reudink, M. W. Expanding the traditional definition of molt-migration. Auk Ornithol. Adv. 135, 1123–1132 (2018).
    Google Scholar 
    Rohwer, S., Butler, L. K., Froehlich, D. R., Greenberg, R. & Marra, P. P. Ecology and demography of east–west differences in molt scheduling of Neotropical migrant passerines. Birds Two Worlds Ecol. Evol. Migr. (R. Greenb. PP Marra, Eds.). Johns Hopkins Univ. Press. Balt. Maryl., 87–105 (2005).Bensch, S., Åkesson, S. & Irwin, D. E. The use of AFLP to find an informative SNP: Genetic differences across a migratory divide in willow warblers. Mol. Ecol. 11, 2359–2366 (2002).Article 
    CAS 

    Google Scholar 
    Ruegg, K. Genetic, morphological, and ecological characterization of a hybrid zone that spans a migratory divide. Evol. Int. J. Org. Evol. 62, 452–466 (2008).Article 

    Google Scholar 
    Delmore, K. E., Fox, J. W. & Irwin, D. E. Dramatic intraspecific differences in migratory routes, stopover sites and wintering areas, revealed using light-level geolocators. Proc. R. Soc. B Biol. Sci. 279, 4582–4589 (2012).Article 

    Google Scholar 
    Delmore, K. E. et al. Individual variability and versatility in an eco-evolutionary model of avian migration. Proc. R. Soc. B 287, 20201339 (2020).Article 

    Google Scholar 
    Procházka, P. et al. Across a migratory divide: divergent migration directions and non-breeding grounds of Eurasian reed warblers revealed by geolocators and stable isotopes. J. Avian Biol. 49, 012516 (2018).Article 

    Google Scholar 
    Bensch, S., Grahn, M., Müller, N., Gay, L. & Åkesson, S. Genetic, morphological, and feather isotope variation of migratory willow warblers show gradual divergence in a ring. Mol. Ecol. 18, 3087–3096 (2009).Article 

    Google Scholar 
    Rohwer, S. & Irwin, D. E. Molt, orientation, and avian speciation. Auk 128, 419–425 (2011).Article 

    Google Scholar 
    Pageau, C., Sonnleitner, J., Tonra, C. M., Shaikh, M. & Reudink, M. W. Evolution of winter molting strategies in European and North American migratory passerines. Ecol. Evol. 11, 13247–13258 (2021).Article 

    Google Scholar 
    Butler, L. K., Rohwer, S. & Rogers, M. Prebasic molt and molt-related movements in Ash-throated Flycatchers. Condor 108, 647–660 (2006).Article 

    Google Scholar 
    Barry, J. H., Butler, L. K., Rohwer, S. & Rohwer, V. G. Documenting molt-migration in Western Kingbird (Tyrannus verticalis) using two measures of collecting effort. Auk 126, 260–267 (2009).Article 

    Google Scholar 
    Hobson, K. A. & Wassenaar, L. I. Linking breeding and wintering grounds of neotropical migrant songbirds using stable hydrogen isotopic analysis of feathers. Oecologia 109, 142–148 (1996).Article 
    ADS 
    CAS 

    Google Scholar 
    Hobson, K. A. & Wassenaar, L. I. Tracking Animal Migration with Stable Isotopes (Academic Press, 2018).
    Google Scholar 
    Rubenstein, D. R. & Hobson, K. A. From birds to butterflies: Animal movement patterns and stable isotopes. Trends Ecol. Evol. 19, 256–263 (2004).Article 

    Google Scholar 
    Bearhop, S. et al. Assortative mating as a mechanism for rapid evolution of a migratory divide. Science 310, 502–504 (2005).Article 
    ADS 
    CAS 

    Google Scholar 
    Eppig, J. T. et al. The mouse genome database (MGD): Comprehensive resource for genetics and genomics of the laboratory mouse. Nucleic Acids Res. 40, D881–D886 (2012).Article 
    CAS 

    Google Scholar 
    Contina, A., Bridge, E. S. & Kelly, J. F. Exploring novel candidate genes from the mouse genome informatics database: Potential implications for avian migration research. Integr. Zool. 11, 240 (2016).Article 

    Google Scholar 
    Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).Article 
    CAS 

    Google Scholar 
    Thompson, C. W. Is the Painted Bunting actually two species? Problems determining species limits between allopatric populations. Condor 93, 987–1000 (1991).Article 

    Google Scholar 
    Contina, A., Bridge, E. S., Seavy, N. E., Duckles, J. M. & Kelly, J. F. Using geologgers to investigate bimodal isotope patterns in Painted Buntings (Passerina ciris). Auk 130, 265 (2013).Article 

    Google Scholar 
    Besozzi, E., Chew, B., Allen, D. C. & Contina, A. Stable isotope analysis of an aberrant Painted Bunting (Passerina ciris) feather suggests post-molt movements. Wilson J. Ornithol. 133, 151 (2021).Article 

    Google Scholar 
    Sharp, A. et al. Spatial and Temporal Scale-Dependence of the Strength of Migratory Connectivity in a North American Passerine. https://assets.researchsquare.com/files/rs-1483049/v1/72236b63-952d-4870-89e7-461056b8625b.pdf?c=1648893558 (2022).Pyle, P. et al. Temporal, spatial, and annual variation in the occurrence of molt-migrant passerines in the Mexican monsoon region. Condor 111, 583–590 (2009).Article 

    Google Scholar 
    Bridge, E. S., Fudickar, A. M., Kelly, J. F., Contina, A. & Rohwer, S. Causes of bimodal stable isotope signatures in the feathers of a molt-migrant songbird. Can. J. Zool. 89, 951 (2011).Article 
    CAS 

    Google Scholar 
    Seutin, G., White, B. N. & Boag, P. T. Preservation of avian blood and tissue samples for DNA analyses. Can. J. Zool. 69, 82–90 (1991).Article 
    CAS 

    Google Scholar 
    Ali, O. A. et al. RAD capture rapture: Flexible and efficient sequence-based genotyping. Genetics 202, 389–400 (2016).Article 
    CAS 

    Google Scholar 
    Contina, A. et al. Characterization of SNP markers for the Painted Bunting (Passerina ciris) and their relevance in population differentiation and genome evolution studies. Conserv. Genet. Resour. 11, 5–10 (2019).Article 
    ADS 

    Google Scholar 
    Catchen, J., Hohenlohe, P. A., Bassham, S., Amores, A. & Cresko, W. A. Stacks: An analysis tool set for population genomics. Mol. Ecol. 22, 3124–3140 (2013).Article 

    Google Scholar 
    Parker, P., Li, B., Li, H. & Wang, J. The genome of Darwin’s Finch (Geospiza fortis). Gigascience 10, 100040 (2012).
    Google Scholar 
    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).Article 
    CAS 

    Google Scholar 
    Van der Auwera, G. A. et al. From FastQ data to high-confidence variant calls: The genome analysis toolkit best practices pipeline. Curr. Protoc. Bioinform. 43, 1–33 (2013).
    Google Scholar 
    McKenna, A. et al. The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).Article 
    CAS 

    Google Scholar 
    Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).Article 
    CAS 

    Google Scholar 
    Anderson, E. genoscapeRtools: Tools for Building Migratory Bird Genoscapes (2019).Purcell, S. et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).Article 
    CAS 

    Google Scholar 
    Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).Article 
    CAS 

    Google Scholar 
    Alexander, D. H. & Lange, K. Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinform. 12, 246 (2011).Article 

    Google Scholar 
    Francis, R. M. pophelper: An R package and web app to analyse and visualize population structure. Mol. Ecol. Resour. 17, 27–32 (2017).Article 
    CAS 

    Google Scholar 
    Chew, B., Kelly, J. & Contina, A. Stable isotopes in avian research: a step by step protocol to feather sample preparation for stable isotope analysis of carbon (δ13C), nitrogen (δ15N), and hydrogen (δ2H). Version 1.1. https://doi.org/10.17504/protocols.io.z2uf8ew (2019).Wassenaar, L. I. & Hobson, K. A. Comparative equilibration and online technique for determination of non-exchangeable hydrogen of keratins for use in animal migration studies. Isotopes Environ. Health Stud. 39(3), 211–217 (2003).Article 
    CAS 

    Google Scholar 
    Bowen, G. J., Wassenaar, L. I. & Hobson, K. A. Global application of stable hydrogen and oxygen isotopes to wildlife forensics. Oecologia 143, 337–348 (2005).Article 
    ADS 

    Google Scholar 
    R Core Team: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2021).Wassenaar, L. I. & Hobson, K. A. Stable-hydrogen isotope heterogeneity in keratinous materials: Mass spectrometry and migratory wildlife tissue subsampling strategies. Rapid Commun. Mass Spectrom. 20, 2505–2510 (2006).Article 
    ADS 
    CAS 

    Google Scholar 
    Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).Article 
    CAS 

    Google Scholar 
    Guan, Y. & Stephens, M. Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Ann. Appl. Stat. 5, 455 (2011).Article 
    MathSciNet 
    MATH 

    Google Scholar 
    Marchini, J., Cardon, L. R., Phillips, M. S. & Donnelly, P. The effects of human population structure on large genetic association studies. Nat. Genet. 36, 512–517 (2004).Article 
    CAS 

    Google Scholar 
    Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).Article 
    CAS 

    Google Scholar 
    Chaves, J. A. et al. Genomic variation at the tips of the adaptive radiation of Darwin’s finches. Mol. Ecol. 25, 5282–5295 (2016).Article 
    CAS 

    Google Scholar 
    Zhang, Y.-W. et al. mrMLM v4.0.2: An R platform for multi-locus genome-wide association studies. Genom. Proteom. Bioinform. 18, 481–487 (2020).Article 

    Google Scholar 
    Grabherr, M. G. et al. Genome-wide synteny through highly sensitive sequence alignment: Satsuma. Bioinformatics 26, 1145–1151 (2010).Article 
    CAS 

    Google Scholar 
    Quinlan, A. R. & Hall, I. M. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).Article 
    CAS 

    Google Scholar 
    Ellis, N., Smith, S. J. & Pitcher, C. R. Gradient forests: Calculating importance gradients on physical predictors. Ecology 93, 156–168 (2012).Article 

    Google Scholar 
    Hijmans, R. J., Cameron, S. E., Parra, J. L., Jones, P. G. & Jarvis, A. Very high resolution interpolated climate surfaces for global land areas. Int. J. Climatol. 25, 1965–1978 (2005).Article 

    Google Scholar 
    Anderson, E. C. snps2assays: Prepare SNP Assay Orders from ddRAD or RAD Loci (2015).Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).Article 
    CAS 

    Google Scholar 
    Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).Article 

    Google Scholar 
    Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).Article 
    CAS 

    Google Scholar 
    Ruegg, K. et al. Ecological genomics predicts climate vulnerability in an endangered southwestern songbird. Ecol. Lett. 21, 1085–1096 (2018).Article 

    Google Scholar 
    Bay, R. A. et al. Genomic signals of selection predict climate-driven population declines in a migratory bird. Science 359, 83–86 (2018).Article 
    ADS 
    CAS 

    Google Scholar 
    Hedenström, A. Adaptations to migration in birds: Behavioural strategies, morphology and scaling effects. Philos. Trans. R. Soc. B Biol. Sci. 363, 287–299 (2008).Article 

    Google Scholar 
    Buehler, D. M. & Piersma, T. Travelling on a budget: Predictions and ecological evidence for bottlenecks in the annual cycle of long-distance migrants. Philos. Trans. R. Soc. B Biol. Sci. 363, 247–266 (2008).Article 

    Google Scholar 
    Schieltz, P. C. & Murphy, M. E. The contribution of insulation changes to the energy cost of avian molt. Can. J. Zool. 75, 396–400 (1997).Article 

    Google Scholar 
    Carling, M. D. & Thomassen, H. A. The role of environmental heterogeneity in maintaining reproductive isolation between hybridizing Passerina (Aves: Cardinalidae) buntings. Int. J. Ecol. 2012, 1–11 (2012).Article 

    Google Scholar 
    Irwin, D. E. Incipient ring speciation revealed by a migratory divide. Mol. Ecol. 18, 2923–2925 (2009).Article 

    Google Scholar 
    Thomas, D. W., Blondel, J., Perret, P., Lambrechts, M. M. & Speakman, J. R. Energetic and fitness costs of mismatching resource supply and demand in seasonally breeding birds. Science 291, 2598–2600 (2001).Article 
    ADS 
    CAS 

    Google Scholar 
    Rohwer, V. G., Rohwer, S. & Ortiz-Ramirez, M. F. Molt biology of resident and migrant birds of the monsoon region of west Mexico. Ornitol. Neotrop. 20, 565–584 (2009).
    Google Scholar 
    Bensch, S., Andersson, T. & Åkesson, S. Morphological and molecular variation across a migratory divide in willow warblers, Phylloscopus trochilus. Evolution 53, 1925–1935 (1999).Article 

    Google Scholar 
    Turbek, S. P., Scordato, E. S. C. & Safran, R. J. The role of seasonal migration in population divergence and reproductive isolation. Trends Ecol. Evol. 33, 164–175 (2018).Article 

    Google Scholar 
    Scordato, E. S. C. et al. Migratory divides coincide with reproductive barriers across replicated avian hybrid zones above the Tibetan Plateau. Ecol. Lett. 23, 231–241 (2020).Article 

    Google Scholar 
    Battey, C. J. et al. A migratory divide in the Painted Bunting (Passerina ciris). Am. Nat. 191, 259–268 (2018).Article 
    CAS 

    Google Scholar 
    Contina, A. et al. Genetic structure of the Painted Bunting and its implications for conservation of migratory populations. Ibis 161, 372 (2019).Article 

    Google Scholar 
    Butler, L. K. The grass is always greener: Do monsoon rains matter for molt of the Vermilion Flycatcher (Pyrocephalus rubinus)? Auk 130, 297–307 (2013).Article 

    Google Scholar 
    Turbek, S. P. et al. A migratory divide spanning two continents is associated with genomic and ecological divergence. Evolution 76, 722 (2022).Article 

    Google Scholar 
    Dietz, M. W., Daan, S. & Masman, D. Energy requirements for molt in the kestrel Falco tinnunculus. Physiol. Zool. 65, 1217–1235 (1992).Article 

    Google Scholar 
    Vézina, F., Gustowska, A., Jalvingh, K. M., Chastel, O. & Piersma, T. Hormonal correlates and thermoregulatory consequences of molting on metabolic rate in a northerly wintering shorebird. Physiol. Biochem. Zool. 82, 129–142 (2009).Article 

    Google Scholar 
    Bazzi, G. et al. Candidate genes have sex-specific effects on timing of spring migration and moult speed in a long-distance migratory bird. Curr. Zool. 63, 479–486 (2017).CAS 

    Google Scholar 
    Busby, L. et al. Sonic hedgehog specifies flight feather positional information in avian wings. Development 147, 188821 (2020).Article 

    Google Scholar 
    Eichberger, T. et al. GLI2-specific transcriptional activation of the bone morphogenetic protein/Activin antagonist Follistatin in human epidermal cells. J. Biol. Chem. 283, 12426–12437 (2008).Article 
    CAS 

    Google Scholar 
    Matzuk, M. M. et al. Multiple defects and perinatal death in mice deficient in follistatin. Nature 374, 360–363 (1995).Article 
    ADS 
    CAS 

    Google Scholar 
    Patel, K., Makarenkova, H. & Jung, H.-S. The role of long range, local and direct signalling molecules during chick feather bud development involving the BMPs, follistatin and the Eph receptor tyrosine kinase Eph-A4. Mech. Dev. 86, 51–62 (1999).Article 
    CAS 

    Google Scholar 
    Nakamura, M. et al. Control of pelage hair follicle development and cycling by complex interactions between follistatin and activin. FASEB J. 17, 1–22 (2003).Article 
    MathSciNet 

    Google Scholar 
    Pays, L., Charvet, I., Hemming, F. J. & Saxod, R. Close link between cutaneous nerve pattern development and feather morphogenesis demonstrated by experimental production of neo-apteria and ectopic feathers: Implication of chondroitin sulphate proteoglycans and other matrix molecules. Anat. Embryol. 195, 457–466 (1997).Article 
    CAS 

    Google Scholar 
    Pyle, P., Saracco, J. F. & DeSante, D. F. Evidence of widespread movements from breeding to molting grounds by North American landbirds. Auk Ornithol. Adv. 135, 506–520 (2018).
    Google Scholar 
    De Mita, S. et al. Detecting selection along environmental gradients: Analysis of eight methods and their effectiveness for outbreeding and selfing populations. Mol. Ecol. 22, 1383–1399 (2013).Article 

    Google Scholar 
    Lotterhos, K. E. & Whitlock, M. C. Evaluation of demographic history and neutral parameterization on the performance of FST outlier tests. Mol. Ecol. 23, 2178–2192 (2014).Article 

    Google Scholar 
    Frichot, E., Schoville, S. D., de Villemereuil, P., Gaggiotti, O. E. & François, O. Detecting adaptive evolution based on association with ecological gradients: Orientation matters!. Heredity (Edinb.) 115, 22–28 (2015).Article 
    CAS 

    Google Scholar 
    Trivedi, A. K. et al. Temperature alters the hypothalamic transcription of photoperiod responsive genes in induction of seasonal response in migratory redheaded buntings. Mol. Cell. Endocrinol. 493, 110454 (2019).Article 
    CAS 

    Google Scholar  More

  • in

    Tracking microbes in extreme environments

    In 2008, I was investigating the methane bubbling up on the beaches and in shallow waters of Mocha Island, off the coast of central Chile. I became intrigued by how microorganisms could thrive in methane-rich areas and changed my research focus from marine biology to extreme environments. I wanted to understand how methane acts as a source of energy and carbon for microbes.Since then, I have explored a number of bizarre environments. In 2010, I went in a submarine down to 200 metres in the Black Sea, one of the world’s largest anoxic water bodies. There, I found mats of filamentous bacteria that survive on sulfur compounds.In 2017, I studied the microbes in Canada’s tailing ponds, artificial lakes of water, sand and clay waste that are left behind after petroleum extraction. And I sampled the microorganisms living in 100 °C Antarctic hot springs in 2022.I came home to Chile in 2018 and began collaborating with an international team researching the geomicrobiology of thermal features, including hot springs, geysers and volcanoes. After travelling with the group to Argentina’s active volcanic region, I got funding to explore the microbial communities that exist beneath hydrothermal vents in southern Chile, where the oceanic crust is subducting beneath the continental plate.In this image, I am in the Atacama Desert in South America, the driest non-polar desert on the planet. I am measuring 80–100 °C steam released from a fumarole containing yellow sulfur, which crystallizes at its opening as the vapour cools. I also sampled sub-surface microbes that are flushed out with the fluids. We’ll sequence their DNA to assess the microbial communities and their biological interactions.My goal is to learn more about subsurface microbes in extreme environments. I want to understand how microbial forces shaped the planet and how these communities might shift in the future with climate change. More