in

Common and distinctive genomic features of Klebsiella pneumoniae thriving in the natural environment or in clinical settings

Genome’s collection and phylogenetic analysis

The study examined the genomes of 139 isolates, 61 of environmental samples (ENV) and 78 clinical (CLI) (Supplementary Table 1, Supplementary Fig. 1), with origin in 21 countries: USA (23/139, 17%), UK, Portugal and Spain (each 15/139, 33%), China (14/139, 10%), Germany (13/139, 9%), Thailand (11/139, 8%) and other countries (each < 8 isolates, 33/139, 24%). Redundant genomes were excluded by ensuring that each pair sharing 100% of Average Nucleotide Identity (ANI), had origin in a different country or sample type. Although available in the GenBank as K. pneumoniae, 13 genomes generated ANIb values of 93–95% with the other 126, which shared among them 98–100%, (Supplementary Table 2). That group of 13 strains was later reclassified by the GenBank as K. variicola (n = 6) and K. quasipneumoniae (n = 7)20. The determination of the K. pneumoniae Pasteur multi-locus sequence types (STs) of 139 genomes resulted in 62 STs, 8 of which (ST11, ST14, ST15, ST37, ST45, ST147, ST348 and ST437) included CLI and ENV genomes, 23 STs included only CLI strains and 32 STs only ENV strains (Table 1, Supplementary Table 1, and Supplementary Fig. 2). The predominant STs among CLI genomes were ST147 (18%), ST11, ST23, and ST258 (each 8%) and among ENV genomes were ST14 (8%), ST895 and ST3128 (each 7%) (Table 1 and Supplementary Table 1). Unique STs (n = 36), corresponding to a single genome, were observed in CLI (n = 13) and in ENV (n = 24) isolates in proportions significantly different (Fisher’s Exact test, p < 0.05). Part of these were the genomes latter identified as K. variicola (n = 6, 1 CLI and 5 ENV) and as K. quasipneumoniae (n = 7, 1 CLI and 6 ENV) and were all unique (Supplementary Fig. 2). The option to maintain the non-K. pneumoniae genomes in the study was justified by the fact that they belong to the same species complex and their inclusion avoided the disproportion on the number of CLI and ENV genomes that might bias the results. Possible biases in the results due to the inclusion of these genomes were also critically assessed. The dendrogram representing the ANIb values between pairs of strains clustered the genomes according to the ST although, in some cases, such as the ST11, ST23, ST37, ST258 and ST392, the ST was divided in different groups (Supplementary Fig. 3).

Table 1 Summary of the Klebisella spp. genomes features used in this study.
Full size table

Six of the 8 STs that included CLI and ENV genomes, were distributed by different countries. Specifically, ENV isolates with the same ST as the CLI ones were observed in ST11 (1 ENV in Japan and 6 CLI in Germany, China, USA and Spain), ST14 (5 ENV in Algeria and 2 CLI in USA), ST15 (2 ENV in Portugal and 4 CLI in Portugal, Nepal, USA and China), ST37 (2 ENV in Thailand and 2 CLI in USA and China), ST45 (2 ENV in UK and 1 CLI in Thailand) and ST147 (2 ENV in Portugal and Switzerland and 14 CLI in Portugal, Germany, United Arab Emirates, Thailand, Pakistan and Spain) (Supplementary Fig. 2). Also, some STs represented by more than one genome were reported in a single country (USA, ST16, n = 2 CLI; USA, ST941, n = 2 CLI; Portugal, ST348, n = 2 ENV, n = 1 CLI; Spain, ST392, n = 4 CLI; Spain, ST326, n = 4 CLI; Spain, ST405, n = 2 CLI; Germany, ST3128, n = 4 CLI), and most of the times, were reported by the same authors. This latter situation was observed for ST258 in USA (n = 6 CLI) and ST437, (n = 2 ENV, n = 1 CLI) in Brazil (Supplementary Fig. 2). A phylogenetic analysis was performed based on a total of 2704 core monocopy genes of the 139 isolates, relying on the generalized time-reversible (GTR) model, the most suitable to assess the evolution in both CLI and ENV groups (Fig. 1). As expected, the isolates grouped preferentially according to the STs and the core-genome based phylogenetic analysis performed individually for CLI and ENV genomes showed an identical organization (Supplementary Fig. 4). The core-genome based phylogenetic tree showed that in a few cases, isolates of CLI and ENV origins shared high sequence identity of these core genes (100%). This was observed in ST15 and in ST348 in isolates from Portugal, in ST437 in isolates from Brazil, in ST147 in isolates from Portugal (ENV) and Germany (CLI), or Thailand (CLI) and China (ENV), as well as ST11 with isolates from China (CLI) and Japan (ENV) (Fig. 1). An intriguing situation was observed for isolate KP120 whose MLST typing indicated ST23, and the core genome and ANI analysis placed this isolate in a cluster comprised by ST37 isolates. The possibility that is a case of contamination was excluded, since after the first check of genomes, all genomes with > 5% contamination were eliminated from the analyses.

Figure 1

Phylogenetic trees based on the concatenated nucleotide sequences of the 2704 monocopy core genes defined in the (A) K. pneumoniae sensu stricto (n = 126), (B) K. quasipneumoniae and (C) K. variicola genomes analysed. A phylogenetic tree was constructed based on 2,542,200 bp and using the GTR evolutive model, which was determined to be the model that better fitted the data, and (A), (B) and (C) represent zoom sections according to the species. On the labels are indicated the name of the strain genome, the sequence type and the country of isolation. Grey circles in the nodes indicate values of bootstrap above 70%. Red and green circles indicate clinical and environmental genomes, respectively.

Full size image

The genomes were further analyzed aiming to infer about features that might be associated to environmental versus clinical origin, species or STs. The scoring of genes in the pan-genome of the 139 genomes21 showed that no significant differences of gene annotation and frequency were observed between CLI and ENV genomes. As expected, this analysis revealed the exclusive genes in each of the three species (p < 0.05, Bonferroni), 292 in K. pneumoniae, 99 in K. quasipneumoniae, and 1638 in K. varicola (Supplementary Table 3). Among the exclusive genes in K. pneumoniae, more than half (150) were annotated as hypothetical proteins, and others were related with metabolism, virulence and type IV secretion systems, among other (Supplementary Table 4). The comparison of each of the most abundant ST (Table 1) with all the others, did not reveal significant differences in the pangenome composition. Among the STs that included CLI and ENV genomes, only in ST14 isolates were detected gene annotations with significantly different frequency between CLI and ENV genomes (Bonferroni, p < 0.05). A deeper examination of ST14 that comprised 2 CLI and 5 ENV genomes, showed that among the 307 annotations that distinguished both, most (235) were hypothetical proteins, being the remaining related with metabolism, arsenic, chloramphenicol or aminoglycoside resistance, as well as transposons, only present in the CLI genomes (Supplementary Table 4).

Pangenome and clinical vs. environmental analyses

A similar number of open reading frames (ORFs) was detected in CLI and ENV pangenomes (a total 12,133 genes, 10,210 in CLI and 10,288 in ENV) (Supplementary Fig. 5). The dendrogram representing the matrix of presence/absence of genes and the respective heatmap (Supplementary Fig. 6 and Supplementary Fig. 7) showed that ENV and CLI genomes clustered together. These observations suggest that the pangenome was closely related among members of the same phylogenetic group and geography or CLI vs. ENV origin. The CLI and ENV core genes included functional categories mostly related with genetic information processing (587/2713, 21% CLI, 435/2319, 19% ENV), environmental information processing (298/2713, 11% CLI, 261/2319 11% ENV), signalling and cellular processes (298 /2713 11% CLI, 281/2319 12% ENV) and carbohydrate metabolism (288/2713 11% CLI, 249/2319 11% ENV) (Supplementary Fig. 8). In some cases, the array of accessory genes contributed to subdivide a single ST into distinct clusters, suggesting that the profile of gene acquisition was not explained based only on the origin or phylogenetic group. This was observed, for example, for ST11, ST147, ST258, ST392, and ST437 (Supplementary Fig. 7). The absence of significant pan-genome associations with CLI or ENV genomes, motivated a targeted analysis of clinically relevant traits in both groups. Therefore, the genomes were further compared based on presence/absence of alleles of 237 related with antibiotic and metal resistance, plasmid replicon type, virulence, efflux systems, oxidative stress and quorum sensing. These 237 genes were screened in the CLI and ENV genomes to estimate prevalence of genes, prevalence of the respective genetic variants (represented by sequences differed in at least one nucleotide) and intra-gene diversity index.

Regarding gene prevalence, it was observed that 3 metal resistance (pbrA, pbrBC and pbrR genes), 37 virulence genes (iro, clb, iuc, among other genes), 13 antibiotic resistance genes (rmt, mef, mcr, among other genes) and 14 plasmid replicon types (IncHI2, IncQ1, IncU, IncX3, psL483, among others) were detected only in CLI genomes (Supplementary Fig. 9). Only one of these genomes harbouring these exclusive genes, specifically the plasmid replicon type psL483 was affiliated to K. quasipneumoniae (Supplementary Table 5). In contrast, one antibiotic resistance gene and six plasmid replicon types were detected exclusively in ENV genomes (Supplementary Fig. 9). Only one of these genomes was affiliated K. quasipneumoniae, the only environmental genome (1/61 genomes) harbouring the replicon types Col(IMG531) and Col(IRGK) (Supplementary Table 5). These differences apart, 20 genes in these categories were detected in all genomes (antibiotic resistance n = 1, virulence n = 1, efflux systems n = 8, oxidative stress n = 5, quorum sensing n = 5) and most genes of the others (162/237) were detected in both groups (Supplementary Table 5 and Supplementary Table 6). However, some of the genes common to both groups presented significantly different prevalence values (p < 0.05). The genes related with antibiotic (blaKPC, blaOXA, blaTEM and aac(6′)-Ib-cr) or metal resistance (mer, ter) and virulence (yersiniabactin—fyuA, irp, ybt) were significantly more prevalent among CLI genomes, irrespective of the inclusion in the analysis of K. variicola strain KP071 (blaOXA, blaTEM, aac(6′)-Ib-cr and ter) and K. quasipneumoniae strain KP125 (blaTEM) (Fig. 2A). In turn, the resistance genes blaOKP-A and replicon types Col(MGD2) and Col(BS512) were significantly more frequent in ENV genomes (Fig. 2A and Supplementary Table 5). However, this result was attributed to the fact that 6 out of the 7 K. quasipneumoniae genomes harboured the gene blaOKP-A, intrinsic in this species20. A dendrogram based on the presence/absence of the 237 genes clustered the genomes in agreement with the sequence types and/or geography and only three genomes, one ENV and two CLI, of distinct ST were identical in this profile (Supplementary Fig. 10). This analysis also showed that a single ST could be subdivided according to geography—e.g. ST14 genomes were divided in Algeria and USA subgroups, a bias that may be related with the distinct pattern of gene association. Also, the ST147 genomes were split into groups from Germany, Portugal, Thailand or United Arab Emirates (Supplementary Fig. 10). Genomes in which was detected the lowest number of the screened genes (< 30% of 237) corresponded to 21 CLI (of 13 STs) and 25 ENV (to 17 STs) genomes. These genomes typically contained metal resistance genes (pco, sil, ars) in ENV genomes, or antibiotic resistance (blaKPC) and virulence genes (yersiniabactin) in CLI genomes. Genomes with the highest number of the screened genes (> 50% of the 237) corresponded to 3 clinical genomes (ST14, USA and two ST23, China) (Supplementary Table 6).

Figure 2

Statistically significant differences observed between clinical and environmental K. pneumoniae and closest related species genomes analysed. (A) Prevalence (%) of genes (Fisher’s exact test and p-value < 0.05); (B) The Shannon diversity index was determined for the alleles of each gene and the genes with significant differences between clinical and environmental genomes were identified (p-value < 0.05). The prevalence (%) of the alleles of these genes, meaning variants of a single gene that differ in at least one nucleotide, is presented. The prevalence of the genes or of the gene-alleles was determined based on the following formula: Prevalence (%) = 100 × (Number of clinical or environmental genomes containing the gene A/total number of clinical or environmental genomes) or 100 × (Number of observed variants of gene A in clinical or environmental genomes/total number of observations of the gene A in clinical or environmental genomes). *Indicates statistically significant differences between clinical and environmental within K. pneumoniae genomes. Some genes such as blaLEN and blaOKP-A were only observed in the K. variicola and K. quasipneumoniae species, respectively. AR antibiotic resistance, MR metal resistance, Vir virulence, Plasm plasmids, ES efflux systems, OS oxidative stress, QS quorum sensing.

Full size image

Regarding the prevalence of alleles of each of the 237 genes, were detected 2661 gene variants (1600 in CLI and 1648 in ENV genomes) (Supplementary Table 7). The rationale of this analysis was to assess genetic variation, irrespective of the implications on the phenotype. The highest number of gene variants was observed for virulence, especially to capsular related genes (wzc n = 55, wzi n = 64), quorum sensing (e.g. lsrB n = 74, tqsA n = 83) and oxidative stress (e.g. msrQ n = 37, oxyR n = 37), also observed in K. quasipneumoniae and K. variicola genomes (Supplementary Table 5). For 29 genes, distributed by all analysed categories, it was observed a statistically significant different (p < 0.05) prevalence allele between CLI and ENV (Supplementary Fig. 9). Twenty six out of those 29 were common to both CLI and ENV, and these were related with antibiotic resistance (n = 4), metal resistance (n = 2), virulence (n = 10), efflux systems (n = 3), oxidative stress (n = 2) and quorum sensing (n = 4) and plasmid replicon type (n = 1) (Supplementary Fig. 9). The prevalence of 15 out of these 26 alleles was significantly different, irrespective of the inclusion of K. quasipneumoniae and K. variicola in the analysis (e.g. aadA, blaOXA, ibpB, terC, among others). For the other 11 genes out of those 26, K. quasipneumoniae and K. variicola were responsible for the differences observed between CLI and ENV genomes.

Regarding intra-gene diversity index, for 65 out of the 237 genes, it was observed that CLI and ENV genomes yielded significantly different (p < 0.05) diversity indices (Fig. 2B and Supplementary Table 8). Thirty-nine out the 65 were significantly more diverse among ENV, specifically for antibiotic resistance (n = 3), metal resistance (n = 4), virulence (n = 7), efflux systems (n = 10), oxidative stress (n = 8) and quorum sensing (n = 6) and plasmid replicon type (n = 1). Twenty-six out the 65 were significantly more diverse among CLI, specifically for antibiotic resistance (n = 9), metal resistance (n = 2), virulence (n = 12), and plasmid replicon type (n = 3). These differences in diversity indices were maintained when K. quasipneumoniae and K. variicola genomes were excluded from the analysis, except for 25 genes (Fig. 2B).

Capsular genes are important virulence factors in Klebsiella spp., whose products may be determinant for ubiquity and gene flow10, justifying the serotyping of the capsular K and lipopolysaccharides O antigens (Supplementary Fig. 11 and Supplementary Table 9). A total of 50 K antigens were detected, being the most predominant the KL64 (13/139), KL2 (8/139), and KL1, KL102, KL15, KL25, and KL62 (each 6/139) (Supplementary Table 9). The antigen types KL64, KL2, KL102, KL15 were detected in CLI and in ENV genomes, although KL64 (12/13) was significantly more frequent in the first (Fisher’s Exact test, p < 0.05). The KL1, reported to be associated to hypervirulent K. pneumoniae22,23,24 was only detected in CLI genomes mainly of ST23 (5/6). KL2 antigens also associated with hypervirulent strains22,23,24 were detected in 8 genomes, all ST14 (n = 7) and in ST15 (n = 1). In total, 10 O antigens were detected, being O1/O2v1 (55/139), O1/O2v2 (33/139), O4 (14/139) and OL101 (10/139), the most predominant (Supplementary Table 9). These antigens were detected either in CLI or ENV genomes, although O1/O2v2 was more frequent in clinical genomes (Fisher’s Exact test, p < 0.05).


Source: Ecology - nature.com

Viscotoxin and lectin content in foliage and fruit of Viscum album L. on the main host trees of Hyrcanian forests

Major biodiversity summit will go ahead in Canada not China: what scientists think