Basic characters of the six chloroplast genomes
The cp genomes of P. cicutarrifolia, P. hubeiensis, P. jiugongshanensis, P. merrilliana and P. ranunculoides (GenBank accessions: MT268974, MT268976, MT937162, MT268977, MT268978) were reported for the first time here, and that of P. filchnerae was downloaded from NCBI (MK88869821).
The sequencing coverage of our five newly assembled cp genomes was from 923 to 6237 (Figure S1). The six cp genomes possessed typical quadripartite structure: IRa, IRb, LSC and SSC (Table 1), and they exhibited the same gene order, no gene rearrangement or inversion occurred (Figure S2). The physical map of the cp genome of P. hubeiensis was shown in Fig. 1. The GC content was ~ 37%. The newly sequenced genomes ranged from 150,187 bp to 151,972 bp, harboring 113 genes: four ribosomal RNA genes, 29 tRNA genes and 80 protein-coding genes, and among them 14 genes was duplicated in IRa and IRb (Table 1). Due to presence of multiple stop codons, the gene infA was pseudogenized in the five newly sequenced species. The open reading frame (ORF) in accD of P. filchnerae (MK888698) was truncated to be only 1305 bp compared with 1455 or 1464 bp ORF of other five species. Lee et al.34 identified five conserved amino acid sequence motifs in accD gene. Conserved amino acid sequence motifs IV and V were absent in accD of P. filchnerae. Therefore, accD was nonfunctional in P. filchnerae.
Physical map of the P. hubeiensis chloroplast genome.
SSRs and repeats
Five categories of SSRs were identified for the six species (Table 2). The least number of SSRs was 41 for P. ranunculoides and the most 59 for P. merrilliana. Three types of SSRs were detected for P. filchnerae, and in the rest species four types could be found. While mono-, di- and tetra-nucleotide repeats existed across all the six species, tri- and penta-inucleotide repeats resided in three and two species respectively. Mono- and dinucleotide repeats accounted for the vast majority of SSRs (65.1% for P. cicutariifolia, 87.5% for P. filchnerae, 69.0% for P. hubeiensis, 62.8% for P. jiugongshanensis, 72.9% for P. merrilliana, 73.2% for P. ranunculoides). Most or all mono- repeats were A/T repeats including 10 to 16 nucleotides. The number of repeat units ranged from five to eight for dinucleotide repeats. The tri- and penta-nucleotide SSRs consisted of four motifs, and tetra-nucleotide SSRs of four to five repeat units.
Except the largest repeat for each genome (i.e. IRs), a total of 183 repeat pairs (three types: forward (F), reverse (R), and palindromic repeats (P)) were detected in the six genomes (Fig. 2), which ranged from 30 to 137 bp in length. Palindromic repeats were the most common, accounting for 55.2% (101 of 183), followed by forward repeats (44.3%, 81 of 183). No complement repeats were identified in all species and one pair of reverse repeats existed specifically in P. ranunculoides. In the six species, 96.7% (177 of 183 repeat pairs) repeats were 30–59 bp in length, consistent with the length reported in other Primula species20. The longest repeat (137 bp) was found in P. cicutariifolia, and this species contained the most repeats (44 pairs), while P. jiugongshanensis had the least (24 pairs).
Types and numbers of repeat pairs in the cp genomes of six Primula species (Pc: P. cicutarrifolia; Pf: P. filchnerae; Ph: P. hubeiensis; Pj: P. jiugongshanensis; Pm: P. merrilliana; Pr: P. ranunculoides).
IR/SC boundary
The IR/SC boundary regions of the six Primula cp genomes were compared, and the IR/SC junction regions showed slight differences in the length of organization genes flanking the junctions or the distance between the junctions and the organization genes (Fig. 3). The genes spanning or flanking the junction of LSC/IRb, IRb/SSC, SSC/IRa and IRa/LSC were rps19/rpl2, ndhF, ycf1, rpl2/trnH, respectively. IR expansion and contraction was observed. P. cicutarrifolia had the smallest size of IR but largest size of both LSC and SSC; though largest size of IR was detected in P. filchnerae, the LSC or SSC was not the smallest in this species. The gene trnH was located in LSC, 0–24 bp away from the IRa/LSC border. The largest extensions of ycf1 into both SSC and IRa occurred in P. filchnerae (4566 bp and 1023 bp, respectively) and ycf1 of P. filchnerae were the longest among the six species. The gene ndhF was utterly situated in SSC and 108 bp distant from the IRb/SSC junction in P. cicutarrifolia; in the rest five species the fragment size of ndhF in SSC was largest in P. hubeiensis (2194 bp). In P. cicutarrifolia, P. jiugongshanensis and P. merrilliana, rps19 and rpl2 were located in the upstream and downstream of the LSC/IRb junction, respectively; rps19 ran across the LSC/IRb junction in P. filchnerae, P. hubeiensis, P. ranunculoides with 161, 62, 56 bp extension in IRb, respectively.
LSC/IR, and SSC/IR border regions of the six Primula cp genomes.
Divergent hotspots in the Primula chloroplast genome
As indicated by the value of Pi, the nucleotide variability of the 22 Primula species (Table S1) was evaluated by DnaSP 6.1231 using noncoding sequences (intron and intergenic spacer) or protein coding sequences (CDS) at least 200 bp long. The variation level of DNA polymorphorism was 0.00444–0.11369 for noncoding sequences or 0.00094–0.05036 for CDSs. For the CDSs, the highest Pi value were detected for ycf1 (0.05036), followed by matK (0.04878), rpl22 (0.04364), ndhF (0.03975), rps8 (0.03658), ndhD (0.03455), ccsA (0.03292), rpl33 (0.0303), rps15 (0.03022), and rpoC2 (0.02954). These markers had higher Pi than rbcL (0.02149). Obviously, the gene ycf1 exhibited the greatest diversity and harbored the most abundant variation. The ten most divergent regions among noncoding regons included trnH (GUG)-psbA (0.11369), trnW (CCA)-trnP (UGG) (0.09463), rpl32–trnL (UAG) (0.09337), ndhC–trnV (UAC) (0.09148), ccsA–ndhD (0.08745), ndhG–ndhI (0.08363), trnK (UUU)-rps16 (0.08334), trnM (CAU)-atpE (0.08273), trnS (GGA)-rps4 (0.08028), and trnC (GCA)-petN (0.07971). No intron ranked among the top ten variable noncoding regions.
Phylogenetic analysis
The ML tree of 22 Primula species was constructed with RAxML32 (Fig. 4), based on the whole cp genomes. The six pinnate-leaved Primula species did not form a monophyly, but separated into two distant clades. P. filchnerae grouped with P. sinensis, the other five species clustered together and constituted the clade Sect. Ranunculoides with 100% bootstrap. In the ML tree, Sect. Proliferae exhibited monophyly, while species of Sect. Crystallophlomis separated into different clades.
ML phylogenetic tree of Primula species based on cp genomes. Bootstrap support at nodes are all 100%.
The topology of the ML tree based on ycf1 (Figure S3) was consistent with that based on whole cp genomes (Fig. 4), except that the clade formed by P. veris and P. knuthiana were sister to the clade consisting of Sects. Auganthus, Obconicolisteri, Carolinella and Monocarpicae instead of being sister to the clade of Sects. Proliferae, Ranunculoides and Crystallophlomis.
We also constructed both ML and NJ tree of 71 Primula species based on the concatenation of three common barcoding markers (ITS, matK and rbcL). Only the results of NJ analysis (Fig. 5) showed consistency with those of Yan et al.12, Liu et al.35, and ML analysis based on whole cp genomes (Fig. 4). The six pinnate-leaved Primula species were separated into two distantly related groups. The clade consisting of P. filchnerae and P. sinensis (Sect. Auganthus) was sister to the clade formed by Sects. Carolinella, Obconicolisteri, Monocarpicae, Cortusoides, Malvacea, Pycnoloba. The five pinnatisect-leaved species P. cicutarrifolia, P. hubeiensis, P. jiugonshanensis, P. merrilliana and P. ranunculoides (Sect. Ranunculoides) comprised a 100% supported clade, which was sister to the group containing Sects. Crystallophlomis, Petiolares, Proliferae, Amethystina. Sect. Carolinella and Sect. Crystallophlomis, and Sect. Malvacea were polyphyletic.
NJ bootstrap consensus tree of Primula based on concatenation of ITS, matK and rbcL.
Source: Ecology - nature.com