Intraspecific variability at the chromosomal ends
The first step of our investigation was the detection and delimitation of the TIRs. De novo assemblies performed following next generation sequencing approaches (NGS) do not allow identifying such large intragenomic duplications (i.e., duplications larger than the read size). However, the size of TIRs can be deduced from the analysis of read coverage36. Hence, a twofold quantity of sequencing reads can be aligned to regions of the assembly that are readily duplicated in the genome. Mapping of the bulk reads onto the genomic sequence obtained at the assembler output is thus an efficient way to precisely detect the limits and extent of the terminal duplications (see Material and Methods section).
The size of the TIRs ranged from 303 kb to 579 kb for chromosomes (Table 1). In addition, among the seven extrachromosomal elements detected in individuals of the population, three (pRLB1–9.2, pS1D4-20.1 and pS1D4-14.1) were linear. The two first possessed TIRs of 24 kb and 68 kb respectively, but no TIR was detected for pS1D4-14.1. Telomeres were found at the extremities of all three replicons (see below).
This variability of the chromosomal TIR size revealed an intense plasticity of the terminal regions of the chromosome and this correlated with the different phylogenomic sub-clades (Fig. 1). Hence, while the size of TIRs is conserved between strains belonging to the sub-clade I (e.g., S1D4-20, RLB3-5, RLB1-8, and S1D4-14), it is highly variable within or between the other sub-clades (Table 1). For example, TIRs of strain RLB1-9 were shorter by 90 kb compared to those of its sister strains (i.e., S1A1-3 and S1A1-8) and lack the distal regions of the chromosome, but without the loss of linearity (not shown). Further recombination events (translocation, inversion and indels, labelled respectively A, B, C in Fig. 2) were revealed by global comparison of the TIRs of the 11 genomes using MAUVE (Fig. 2). Some regions are unique to one strain, suggesting that an has insertion occurred (e.g. Fig. 2, region C). Since this DNA region (16.7 kb) is not present elsewhere in the other 11 genomes, this suggested that this insertion was acquired through a horizontal gene transfer event.

Telomere and terminal protein distribution among the Streptomyces population. A maximum likelihood phylogenomic tree of the 11 strains was built based on 5,149,602 nucleotide positions. Heavy lines indicate branches supported by bootstrap values >80% (100 replicates). According to the tree topology and bootstrap values, three phylogenomic sub-clades (I, II and III) were defined. The strains belonging to a same phylogenomic sub-clade are connected with a red bar. The yellow bars connect stains harbouring identical or highly closely related telomere sequences (more than 92% sequence identity). The dashed yellow lines indicate incongruences between the phylogenomic tree and the distribution of similar telomeres. The coloured backgrounds represent the distribution of the chromosomal telomere loops and of the different tap–tpg/gtpB–gtpA systems. The light blue background encompasses strains for which only the gtpB–gtpA system was identified. Their chromosomal telomeres are typified with a 5′GGA3′ loop. A parsimonious evolutionary scenario suggest that the ancestor of the population shared the same combination. The turquoise background encompasses strains having also telomeres with a 5′GGA3′ loop and a gtpB-gtpA operon, but with an additional type II tap-tpg operon. The light red background represents strains having the ancestral gtpB-gtpA operon, but also an additional type I tap-tpg operon and telomeres typified by a 5′CTTG3′ loop. The red arrow indicates a potential telomere replacement with a plasmid at the root of this sub-cluster.

Identification of rearrangements events within the Terminal Inverted Repeats. The TIR sequences of the 11 strains were compared and visualized using the progressive Mauve algorithm for Windows. The alignment was performed on collinear sequences in which S1A1-7 TIR was used as reference sequence. Boxes with identical colors represent local collinear blocks (LCBs), indicating homologous DNA regions shared by two or more chromosomes without sequence rearrangements. Lines are drawn between LCBs in two adjacent species to indicate homologous regions. The placement of a block below the axis indicates an inversion event. Examples of rearrangement events occurring in the TIRs are highlighted by red circles; (A) Translocation event, (B) inversion event, (C) indel event.
Regardless of the nature of the recombination event, it generally modifies a single chromosomal end and consequently disrupts the TIRs. However, since we identify duplicated sequences, it is likely that a mechanism is in place to maintain two homogeneous copies of the identical TIR in a chromosome. Figure 3 depicts the potential recombination events required to maintain the chromosomal ends in a homogeneous state. This mechanism is reminiscent of the Break Induced Replication (BIR), which rescues broken chromosomes by recopying the intact arm through to the end (including the telomere), which is likely operating between the TIRs and maintaining identical TIR sequences. It also participates to shorten or increase the TIR size variability; hence, if the break point is located upstream or downstream the TIR border, then the size of the TIRs may increase or decrease respectively (Fig. 3a depicts a TIR increase). It was shown to be a powerful mechanism generating a high variability in TIR sizes under laboratory conditions for S. ambofaciens10. When an insertion occurs in a duplicated region (Fig. 3b), the same mechanism may lead to conversion of the original TIRs.

Inter-chromosomal arm recombination scenarios leading to homogenization of TIRs. Double-headed arrows represent Streptomyces linear replicons. The TIRs are highlighted with a light gray frame and the telomeres are represented by colored circles. The yellow flash symbols represent a DNA double strand break (DSB) requiring an upstream Break Induced Replication (BIR) event to rescue the broken replicon and to keep linearity. The BIR event uses the other arm as a matrix and recopy it until the end, including the telomere. If the DSB occurs within a TIR, the BIR will not change its size or sequence (not represented). (a) If the DSB occurs upstream the TIR, the BIR event will lead to the homogenization and extension in size of the TIRs. (b) An insertion/deletion event (indel) occurred (1) in one TIR (yellow triangle) before a DSB event. Like before, the BIR event (2) upstream the TIR will lead to a size increase, but will also propagate the indel event by homogenization. (c) In this case, a telomere replacement occurred in one arm (1) before the DSB. Following the BIR event (2), the arm homogenization will lead to a change of the telomere sequence.
Given these results, the presence of TIRs at the ends of the Streptomyces chromosome appears to be a consequence of terminal recombinational activity. Reciprocally, their presence may also help rescue double-strand breaks occurring in the terminal part of the chromosome by providing an intact substrate for recombination repair37. Furthermore, terminal duplication may have functional consequences such as expression of specific gene function (e.g. specialized metabolite biosynthetic genes38; or may help in the maintenance a terminal cohesive structure such as the ‘racket-frame’ structure39.
Identification of homologous recombination in the TIRs
We inferred homologous recombination (HR) events by scanning the aligned sequences of the TIRs of the Streptomyces population with the RDP4 program. In total for the 11 genomes, 45 unique events were detected (Fig. 4). Strains of a same sub-clade mostly share the same HR events, where other strains exhibit specific HR events. Remarkably, RLB3-17, S1A1-7 and S1D4-23 account for 30 of the 45 unique HR events in these strains, providing evidence of the evolutionary history of the population. Common HR events within a sub-clade likely occurred in a recent common ancestor and spread vertically in these strains, where other strains may have accumulated increasing numbers of recombination events since the origin of the population. Streptomyces have already been shown to be recombinogenic, either at the genus40 or at the population level41. These previous studies were performed by calculating recombination frequencies with seven housekeeping genes (3,910 bp) located in the core genome. Here, due to the similarity of the strains HR events could be visualized between colinear TIRs (circa 408,938 bp). RDP suggests the most probable donor of a recombining DNA sequence, thus here the potential donor within our population. In one extreme case in these strains, the TIR of RLB3-17 seemed to have recombined several times with strains S1A1-7, S1D4-23, RLB3-6 and strains of sub-cluster I. This results in a mosaic structure confirming that the terminal regions are highly recombinogenic. It also highlights the massive gene flux previously described at the population level23 and that this population strains has experienced many gene transfer events.

RDP4 analysis for DNA recombination and transfer within TIRs of the Streptomyces population Each bar represents the TIR sequence of one strain identified by its name and a number from 1 to 11. Regions of recombination events on each TIR are represented by a light shade of the corresponding TIR color. Sequence fragments from potential donors (parents from which recombinant sequence derived from) is depicted under each recombination region with the number corresponding to the donor strain. If the potential donor does not belong to the 11 strains, the fragment is assigned as unknown.
Telomere switching
Considering the high frequency of insertion and deletion events in the TIRs, we questioned the variability of the DNA extremities themselves, i.e., the telomere motifs, within our population. While insertions/deletions in TIRs require at least two recombination events to take place, the replacement of the most distal regions may take a single cross-over event. This terminal exchange results in the formation of a hybrid chromosome (i.e. with two different telomeres) further homogenized by inter-chromosomal arm recombination as depicted on Fig. 3c.
In our work, no specific sequencing approach was used to isolate and sequence the telomeres. However, in order to sequence the extremities of linear replicons, genomic DNA was initially prepared using a proteinase K step ensuring the degradation of terminal proteins bounded to DNA. To get as close as possible to the chromosomal end, we set out to walk on the chromosome towards the extremity by mining the sequencing reads (Illumina). This approach, enabled to extend from a few to several tens of nucleotides the previously published genomic sequences (see materials and methods) and in silico analyses (mfold) (Fig. 5c) of the 180 last nucleotides of each sequence revealed DNA hairpins and loops specific to Streptomyces telomeres. Despite we cannot rule out that the very last terminal nucleotides may still be missing in the final assemblies, however, this approach enabled to identify with confidence telomeric sequences for all chromosomes at the exception of RLB1-9. Regarding the other ten strains, four different telomere sequences were identified (Fig. 5a) with five to eight palindromic stems of variable length capped with conserved loop sequences (5′GGA3′ or 5′CTTG3′). Within the phylogenomic sub-clades I and II, respective strains shared identical telomere sequences (Fig. 5b) while their sequence identities declined to about 30% between sub-clades and were barely possible to align. In contrast, the telomeres of strains S1D4-23 and RLB3-6 forming the sub-clade III only shared weak identity (65%), where they were more closely related to the telomere sequences of RLB3-17 and S1A1-7 respectively that do not belong to sub-clade III (Fig. 5c), for example the S1D4-23 and RLB3-17 telomere sequences aligned almost perfectly (93% sequence identity) and exhibited only two mismatches that were compensatory mutations keeping the stem structures. Thus, telomere sequences defined two new telomere sub-divisions: IIIa with strains S1A1-7 and RLB3-6, and IIIb with strains RLB3-17 and S1D4-23 that were incongruent with the phylogenomic analysis (Fig. 5b). These data strongly support the hypothesis of telomere exchange within populations, and in this case that two of the strains (RLB3-6 and S1D4-23) acquired a new telomere, possibly from strains S1A1-7 or RLB3-17.

Comparison of the different chromosomal and plasmidic telomeres. (a) The terminal 180 nucleotide sequences from the 10 chromosomes and 2 linear plasmids were aligned. Four different groups designed in the table as telomere sub-clades were defined according to their sequence identities. Palindromic sequences are boxed and numbered in Arabic numerals above the sequences and the telomere loops are highlighted in yellow. Compensatory nucleotide changes within palindromes IV and VI of RLB3-7 telomere are highlighted in blue. The telomere sequence of plasmid pS1D4-20.1 could not be aligned with the others and is not presented. (b) Unrooted NJ phylogenetic tree built with the different telomere sequences. Positions with <80% site coverage in the alignment were eliminated enabling to have a total of 164 nucleotide positions in the final dataset. Bootstrap percentages are indicated on the branches. (c) The predicted secondary structures for a representative sequence of each telomere sub-cluster are represented. Two different loops (CTTG and GGA) can be observed at the top of the hairpin structures.
None of telomeres showed a significant nucleotide identity with the ‘archetypal’ telomeres (not shown). In contrast, telomeres of sub-clade I showed a strong identity (87%) with telomeres of linear plasmids including one of 92 kb from Streptomyces dengpaensis strain XZHG99 (GenBank accession number CP026653.1). The latter exhibits the end palindrome I (13 nt, 5′CCCGCTCCGCGGG3′) conserved in the archetypal telomere. Due to the limitations outlined above, we cannot rule out the presence of this palindrome at the ends of sub-clade I telomeres. Hence the last nucleotides of our sequences match the very first ones of the palindrome I sequence. However, since (i) there is no sequence homology with S. coelicolor and (ii) since the loop of the stems are capped with 5′CTTG3′ motifs instead of the classical 5′GCA3′ sheared pairing motif, we concluded that telomeres of sub-clade I constitute a new type of non-archetypal telomere. Further, the ends (over 50 nt) of the telomeres of sub-clade IIIa strains showed a strong homology with the atypical telomere of S. griseus 13350 and share with them the same sequence at the top of the stems (5′GGA3′). Finally, telomeres of sub-clade II showed 75% of nucleotide identity (over the last 3′ 150 nt of the telomere) with the ends of the Streptomyces sp. SirexAA-E chromosome, and possesses 6 stems capped with 5′GNA3′ loops (mostly 5′GGA3′).
In addition to the chromosomes, the telomere structures for the three linear plasmids (pRLB1-9.2, 106 kb, pS1D4-20.1, 394 kb; pS1D4-14.1, 112 kb) were identified. The telomere of pRLB1-9.2 possesses 5′GGA3′ loops (5 of 8 stem-loops, all sharing the classical G-A sheared pairing), the one of pS1D4-14.1 a 5′CTTG3′ loop at the top of five of the six last stem-loops and the one of pS1D4-20.1 is typified by an original 5′GCA3′ loop sequence (at the top of the last 3 of the 5 stem-loops). The novelty of this telomere was confirmed by that fact that no identity would be found with any sequence of the nr database.
The different telomere sequences in the population suggest that various recombination events occurred during the recent evolutionary history of the population. Hence, telomeres of strains of sub-clade I are typified by 5′CTTG3′ loops when other strains harbor 5′GNA3′ ones. Further, the telomere of pS1D4-14.1 (112 kb) are almost identical (97%) to that of the 92 kb-plasmid of S. dengpaensis. Given that pS1D4-14.1 telomeres also share a strong identity (84%) with sub-clade I chromosomes, it is tempting to hypothesize that a chromosome/plasmid replacement of the ancestral telomere loop 5′GGA3′ at the root of sub-clade I could explain the emergence of this telomere in the population (Fig. 1).
Although terminal recombination appears highly efficient to homogenise the terminal sequences and eliminate hybrid replicons, their presence has been reported previously. It has been shown that in S. coelicolor A3(2), both the chromosome (7.2 Mb) and a SCP1′ linear plasmid (1.85 Mb) are chimeric, generated by a single crossover between the wild-type chromosome and SCP142. Similarly, in S. cattleya NRLL 8057, the linear chromosome and a megaplasmid appear to have exchanged telomeres leading to coexisting hybrid replicons43. Telomere plasticity seems to be common in Borrelia (spirochetes), the other main bacterial groups (38) possessing linear replicon44. This may result from telomere exchange as well as from telomere fusion, which may result from reversal of the telomere resolution reaction at the end of the replication process45. At the functional level, telomere recombination triggered by the formation of double strand breaks has also been associated to antigenic variation in Trypanosoma brucei46.
Co-occurrence of telomere and terminal protein genes
Since terminal proteins (TP) interact in a specific manner with the telomere to achieve terminal replication47, the turnover of telomeres should be accompanied by that of the cognate terminal protein machineries. Therefore, we searched in the chromosomes and plasmids of our population for homologues of the archetypal Tap/Tpg genes described in Streptomyces coelicolor A3(2)48, of the atypical GtpB-GtpA of Streptomyces griseus49 as well as of the atypical Tac/Tpc terminal machinery of the linear plasmid SCP1 of S. coelicolor A3(2). No homologues of Tac/Tpc were identified (not shown), but we found that all the strains possessed a chromosomal homologue of the GtpB-GtpA encoding operon (c. 50% of amino acid identity with the S. griseus protein). Among the population, the conservation is high with amino acid identities higher than 98% for both gene products. This operon was likely inherited from the ancestor of the population (Fig. 1).
Using the archetypal Tap/Tpg of S. coelicolor as query sequences, we identified and distinguished two additional sets of genes including a Tpg homologue (called types I and II, Table 1) whose distribution followed the sub-clade phylogenies. Tpgs encoded in type I and type II sets showed amino acid identities of 48% and 59% with the archetypal Tpg, respectively. Type I and II Tpgs showed circa 40% of aa identity between them. All homologues exhibited the typical helix-turn-helix DNA binding domain associated to a nuclear localization signal (NLS) present in the archetypal Tpg although it was predicted at a slightly different location within the polypeptide in the type I Tpg product (Fig. S1). The type I Tpg also shared 78% of amino acid identity with the putative Tpg of S. dengpaensis (accession number AVH61776.1), that is much higher than with S. coelicolor Tpg and share the same NLS sequence and position. All the Tpgs proteins (type I and II) have almost the same size as the archetypal one (i.e. 175 aa).
In addition to the Tpgs, putative Tap proteins were also detected. In the type I gene set, a homologue of S. coelicolor A3(2) archetypal Tap was found with an amino acid identity of 51% (62% of similarity). A DNA binding domain was identified in the N-terminal domain of the Tap polypeptide in all homologues (not shown). Therefore, despite a common functional organization, the terminal complexes encoded by the archetypal and our type I gene set may recognize different telomeres.
In the type II gene set, beside the identified Tpg, we found a truncated version of a Tap gene (92 aa, C-ter, not shown) which appears to be a pseudogene. However, a long coding sequence immediately upstream encoded a polypeptide (648 aa) including an HTH motif in its N-terminal part just as in Tap proteins. Further, this polypeptide also contains a TPR/MLP domain (pfam07926) which is involved in the process of telomere length regulation in eukaryotes. This feature led us to hypothesise that this gene represents a candidate for the replacement of the original tap gene. We called it ‘Tap-alt’ (alt for alternative), and speculate that this atypical gene pair (Tpg/Tap-alt) may encode a terminal machinery able to handle atypical telomeres such as those found in sub-clade II.
Two of the three linear plasmids, pS1D4-14.1 and pS1D4-20.1, belonging to individuals of sub-clade I also harbour tap-tpg operons. While Tap and Tpg borne by pS1D4-14.1 strongly resembled those of the chromosomal genes of the same sub-cluster (i.e. 82% and 80%, respectively), pS1D4-20.1 encoded distantly related Tap-Tpg proteins (i.e. 38% and 46% respectively). In addition, these two Tap-Tpg pairs showed weak identities with archetypal proteins with 48% to 61% of identity. The presence of this atypical Tap-Tpg operon on pS1D4-20.1 plasmid is co-occurring with the unique telomere sequence in our population having 5′GCA3′ loops. It is tempting to suggest that this atypical terminal protein complex may take over the functioning of the unique telomere.
In contrast to linear plasmids of sub-clade I (pS1D4-14.1 and pS1D4-20.1), pRLB1-9.1 which belonged to strain RLB1-9 (sub-clade II) do not encode any Tap or Tpg homologue, and should benefit from host functions (type II Tap-alt/Tpg or GtpA-GtpB).
When the Tap/Tpg gene distribution is considered alongside telomere types, it is possible to hypothesise regarding the potential for co-evolution of telomeres and Tap/Tpg function within natural populations (Fig. 1). The presence of a type I Tap-Tpg locus is associated to the 5′CTTG3′ loop at the top of the stems of the telomere. This locus was identified in sub-clade I and on the plasmid pS1D4–14.1. Considering that the telomere sequences of chromosome and those of plasmid pS1D4–14.1 shared strong identities, it is tempting to speculate that a telomere replacement took place at the origin of this sub-clade and substituted the ancestral telomere (loop 5′GGA3′) by incoming the plasmid-borne one (5′CTTG3′). These non-archetypal and newly acquired telomeres likely require the presence of a specific terminal protein complex encoded by the atypical type I Tap-Tpg locus. Alternatively, these non-archetypal telomeres may be recognized by the S. griseus GtpAB like proteins as it would be in the remaining part of the population (that is present in all the strains and is the only TP complex in sub-clade III). Alternatively in sub-clade II, a new atypical complex encoded by the Tpg/Tap-alt cluster could be involved. The first hypothesis raises questions about the specificity of the interaction between the terminal protein complex and their cognate telomere. Since, the telomeres are rather different between sub-clades II, IIIa and IIIb, this would imply a high flexibility allowing wide recognition of telomeres. Alternatively, if the specificity of the telomere and of the terminal complex is tight, hence the Tpg/Tap-alt complex may be an alternative to handle the telomere, and this would strongly select for the simultaneous acquisition of a new telomere with its terminal complex. This could constitute a powerful selective force for organizing the genes encoding terminal complexes in the proximity of telomeres such that their simultaneous transfer ensures the functional characteristics of the telomere following transfer.
In conclusion, regardless of the terminal complexes supporting a range of telomeres types, the inconsistency between the phylogenomic and the telomere-based trees in the sub-clade III, suggests that terminal DNA exchanges have occurred (Fig. 1). Further, sub-clade I telomeres have undergone a probable replacement during diversification of the population through the exchange of telomeres with a linear plasmid. These events are the first report of a rapid turn-over of terminal region of the chromosomes in a natural population of Streptomyces.
Source: Ecology - nature.com