Mitogenome-wise codon usage pattern from comparative analysis of the first mitogenome of Blepharipa sp. (Muga uzifly) with other Oestroid flies
Outcome of DNA sequencing, assembly, and validationIn this study, initially total DNA was isolated from the finely chopped, full-grown pupa of Blepharipa sp. The NanoDrop spectrophotometer (1294 ng/μl) and the Qubit fluorometer (732.8 ng/μl) both found that the concentration of total DNA in the sample at an optimum level for mitochondrial DNA enrichment. The Tape Station profile showed that the size of the fragments of the mitogenomic library were in the range of 250 to 550 bp. The complete insert size distribution ranged from 130 to 430 bp, with the combined adapter size being ~ 120 bp with mitogenome fragments. The appropriate distribution of fragments and their concentrations (~ 27.1 ng/μl) were also found to be suitable for sequencing. Sequencing through Illumina NextSeq500 yielded 4,402,752 raw reads, of which around 3,663,404 high-quality reads were retained after post-quality filtering. The final scaffolding and assembly of contigs generated a 15,080 bp single scaffold MtDNA in Blepharipa sp. (N50 = 15,080).The sequencing outcome was validated by performing PCR amplification of one of the protein-coding genes, in this case, nad6. Where PCR amplification resulted in a single band of expected amplicon size (shown in Supplementary Method Online). Sanger sequencing and subsequent alignment of these amplicons showed almost 92% sequence similarity to our assembled Blepharipa sp. nad6 gene (see Supplementary Method Online). This provided strong evidence that our mitogenome assembly is reliable and can be used for general applications of mitochondrial genes, e.g., as a biomarker. The second mitogenomic region, the control region (CR) was suggested by the reviewer. We have discussed that CRs constitute repetitive A + T regions (“AT richness of Control Region and role of sequencing method” and “Impact of repeats on different sequencing technologies and assembly method” section). One or more repetitive regions within the CR identified in certain species (e.g. fish, human) have shown undesirable effects on PCR amplification and sequencing125,126. Many organisms have segmental duplications in CR induced by the appearance of pseudogenes that PCR can co-amplify127,128,129,130,131. Due to these associated problems, researchers generally rely on protein or ribosomal RNA genes for phylogenetics instead of CRs132,133,134. In this case, we also faced problems validating the CR. The PCR and gel electrophoresis using external PCR primers did not show a desirable single band as seen for nad6. As an alternative strategy, we used two pairs of primers, CR int_fwd and CR int_rev, internal primers, with CR15fwd and CR08rev primers, to perform a two-way sequencing of each amplicon, which generated multiple bands (see Supplementary Method Online, Figs. S1, S2). The most prominent bands were subjected to sequencing and yielded two mixed sequences, the best of which exhibited nearly 54% sequence resemblance with the Blepharipa sp. control region (see Method in Supplementary Note). Further mapping of the Illumina reads with the assembly revealed that the depth of coverage across the CR was not as deep as that of protein-coding genes such as cox2, and it was also not inflated only over a repeated section of the CR. The depth over 1–112 varied from 5 to 20×, and that for the 15,025–15,080 bp was around 30×. We did observe that our reads didn’t cover a 10 bp stretch of CR around 15,030–15,040 bp (see Method in Supplementary Note and Figs. S3–S6). We believe that our sequencing and assembly experiment was able to cover the majority of CR successfully with reasonable coverage barring that 10 bp stretch. Our results corroborate with the difficulties of CR sequencing seen with other species, and while this doesn’t reflect on the quality of our whole mitogenome assembly, researchers using mitogenomic CR regions for any kind of phylogenetic inference should proceed with caution.Size and organization of mitogenome
Blepharipa sp. mitogenome organization and structureThe newly sequenced mitochondrial genome of Blepharipa sp. is closed circular and has a size of 15,080 bp, which falls within the typical insect mitogenome size (14 to 20 kb)135,136,137. Similar to other sequenced bilaterian mitogenomes, the Blepharipa sp. mitogenome has conventional gene content, a total of 37 genes (viz. 13 PCGs, 22 tRNAs, 2 rRNAs) and an AT-rich control region (CR) (Fig. 2A)138,139,140,141. Among these, 23 genes are present on the major strand (J strand or +ve strand), while the remaining 14 genes are present in the minor strand (N strand or –ve strand). The intron-less 13 PCGs are also separately encoded by these two strands, 9 PCGs (nad2, cox1, cox2, atp8, atp6, cox3, nad3, nad6, cytb) from the J strand and 4 PCGs (nad5, nad4, nad4l, nad1) from N strand covering 6899 bp and 4300 bp respectively constituting around 74.31% of the entire mitogenome (Fig. 2). The largest PCG present in this organism is nad5 (1716 bp), and the smallest one is the atp8 (165 bp). Excluding stop codons, the J strand has 2237 codons, and the N strand has 1430 codons. Apart from cox1 (TCG) and nad1 (TTG), 11 PCGs follow the canonical “ATN” start codon. Ten PCGs of this mitogenome have “TAA or TAG” as their stop codon except for cox1, cox2, and nad4, where they end with an incomplete stop codon, a single T (Fig. 2)142. A total of 22 tRNAs are interspersed all over the entire mitogenome, ranging from 63 bp (trnT) to 72 bp (trnV) in size. The J and N strands have 14 tRNAs and 8 tRNAs, respectively, with 928 bp and 528 bp of nucleotides. Typical clover-leaf shaped secondary structures of tRNAs have been observed with a few exceptions where trnC, trnF, trnP, and trnN lack a stable TΨC loop see Supplementary Fig. S7 online). Two N-strand rRNAs with nucleotides of 1360 bp and 783 bp are transcribed individually for rrnL and rrnS (Fig. 2B).Figure 2Complete mitochondrial genome structure of Blepharipa sp.; (A) Circular Map (B) Annotation and genome organization of mitogenome. tRNAs are represented as trn followed by the IUPAC-IUB single letter amino acid codes e.g., trnI denote tRNA-Ile.Full size imageThis mitogenome has 10 gene boundaries where genes overlap with adjacent genes, varying from 1 to 8 bp in length, for a total of 35 bp. The longest overlapping sequence of 8 bp is present over the trnW and trnC genes. Likewise, the total length of all intergenic spacer sequences (excluding the control region) is 139 bp, present at 15 gene boundaries. The length of each intergenic spacer varies between 1 and 40 bp, and the longest one is located between the trnE and trnF genes. In this organism, eleven pairs of genes are located discreetly but adjacent to each other and any PCG adjacent to tRNA, ending with an incomplete stop codon (cox1-trnL2, cox2-trnK). The control region’s length of this dipteran fly is 168 bp, and the nature of this region is highly biased towards A + T content (Fig. 2).Size comparison of Oestroidea mitogenome and their genesTo better understand the mitogenome of Blepharipa sp., it has been compared with the flies of the Oestroidea superfamily (blowflies, bot flies, flesh flies, uzi flies, and relatives). Various features have been taken into account for this comparison: mitogenome size, gene sizes, gene content, and how genes are placed in each mitogenome.The mitogenome of eukaryotic organisms shows that there are significant size differences across mammals, fungi, and plants. The typical size of an animal mitogenome is near about 16 kb, a fungal mitogenome is 19–176 kb, and a plant mitogenome is far larger, with a size range of 200 to 2500 kb143. We have shown that the Blepharipa sp. whole mitogenome size (15,080 bp) is 416 bp smaller than the average Oestroidea flies mitogenome. As for the Oestroidea superfamily, D. hominis (human bot fly), an Oestridae fly has the longest mitogenome of all (16,360 bp), and A. grahami, a Calliphoridae fly, has the shortest mitogenome of all (14,903 bp). Tachinid flies have a smaller average mitogenome size (~ 15,076 bp) than the other flies in this superfamily, and the Oestridae flies have a relatively larger mitogenome (~ 16,031 bp). We observed that the size of the total PCGs, tRNAs, and rRNAs are well-maintained across this superfamily, with an average length of 11,145 bp, 1482 bp, and 2113 bp, respectively (Fig. 1A, green, yellow, and blue line, Table 1).The difference in mitogenome size in insects can be attributed to variations in the length of non-coding regions, especially the control region that differs in length as well as the pattern of sequences (Fig. 1B)104,144. In addition, based on mtDNA sequence similarity among all the Oestroidea flies, Blepharipa sp. has high similitude with the Tachinid Fly E. flavipalpis (87.83%), followed by the two hairy maggot blowflies, Chrysomya albiceps (85.51%) and C. rufifacies (85.44%). Another well-studied uzi fly, E. sorbilans has an 84.82% sequence similarity with Blepharipa sp., while Gasterophilus horse botfly has the lowest sequence similarity (~ 77%) with Blepharipa sp. (Supplementary Data 3A).Gene content and arrangementWe found that the Oestroidea mitogenome represents the reserved gene arrangement of Ecdysozoan, for which it can be easily distinguishable from other bilaterians (Lophotrochozoa and Deuterostomia)140. The mitogenome of Blepharipa sp. and other Oestroidea have three core tRNA clusters, including (1) trnI-trnQ-trnM, (2) trnW-trnC-trnY and (3) trnA-trnR-trnN-trnS1-trnE-trnF, as depicted in Figs. 1C and 2. A comparative study revealed that the Oestroidea superfamily has 4 different kinds of mitogenome arrangements (Fig. 1C). The majority of the Oestroidea flies (25 out of 36) in this study have ancestral (A) dipteran type mitogenome sequences (Table 1)145. However, there are some minor inconsistencies exist in the Calliphoridae family (blowflies), such as the insertion of extra tRNAs (trnI in the genus Chrysomya and trnV in D. hominis) or the translocation of tRNA (trnS1 in C. chinghaiensis) (Fig. 1C)21,24. Barring this, all organisms, including Blepharipa sp., follow a standard dipteran gene arrangement and have 37 genes in their respective mitogenomes (insertion of tRNA into the genus Chrysomya and D. hominis raises gene count) (Fig. 1C (i)(ii), Table 1). In the case of dipterans other than the Oestroidea superfamily, species like gall midge (Cecidomyiidae), mosquitos (Culicidae), and crane flies (Tipulidae) exhibit various rearrangements in mitochondrial tRNAs, such as the absence, inversion, translocation, and extreme truncation of certain genes (Supplementary Data 1A)146,147.Non-coding regionsControl region (CR) of Blepharipa sp. and comparison with OestroideaThis region in the metazoan mitogenome is a single sizeable non-coding sequence containing essential regulatory elements for transcription and replication initiation; it is therefore named the control region148,149. Similar to other Diptera, the CR of Blepharipa sp. is also flanked by rrnS and the trnI-trnQ-trnM gene cluster (Fig. 2). Sequence similarity with other Oestroidea superfamily species indicates that this segment is variable due to the lack of coding constraints150. The CR sequence of Blepharipa sp. 75.49% similar to another tachinid fly Elodia flavipalpis, followed by Chrysomya bezziana (71.15%) (Supplementary Data 3B). Despite its overall high variation in nucleotides, this region harbors multiple different types of repeats (e.g., tandem repeats, inverted repeats)42,151 and conserved structures namely Poly-T stretch (15 bp), [TA(A)]n-like, G(A)nT-like stretches, and poly A tail (15 bp)152,153,154(Fig. 3A). Another conserved motif, “ATTGTAAATT” we found in the CR of Blepharipa sp. and E. flavipalpis (Fig. 3A). Such conserved structures are thought to play role in the regulatory process of transcription or replication. After binding with RNA polymerase, they keep the initiating mode of transcription or replication by preventing the transition to elongation mode without affecting its open-complex structure155,156.Figure 3Conserved non-coding regions; (A) AT rich control region Alignment of Blepharipa sp. with other two Tachinidae species. (B) Three alignments of the common overlap region between trnW-trnC, atp8-atp6 and nad4-nad4l. (C) Three alignment of the consensus gap region between trnS2-nad1 (TACTAAAHHHHAWWMH), trnE-trnF (ACTAAHWWWAATTMHHWA), nad5-trnH (WGAYADATWYTTCAY) genes of all 36 Oestroidea mitogenome (where, W = A/T, H = A/T/C, Y = T/C, D = G/T/A, M = A/C).Full size imageThe CR is also known as the AT-rich region for having the maximum proportion of A/T nucleotides (91.4% for Blepharipa sp.) than other regions of the entire mitogenome. We observed that the Tachinidae family has higher A + T content than other groups, with the highest levels in the Mulberry uzi fly, E. sorbillans (98.10%), and AT poor CR regions identified in G. intestinalis (80.80%) and G. pecorum (80.82%) (Oestridae)42 (Supplementary Data 2A). In this study, the CR of thirteen species have above 90% A + T content, and the top 3 are the tachinid flies, led by A. grahami, D. hominis and Blepharipa sp. consecutively. The CR is prone to high mutation, yet the substitution rate is low due to high A + T content and directional mutation pressure144,154. This part of the mitogenome differs significantly in length among insects, ranging from 70 bp to 13 kb, and it accounts for most of the variation in mitogenome size153. We noted that the CR size of 36 Oestroidea flies ranges from 89 to 1750 bp, of which 16, 12, and 8 species can be categorized as large ( > 800 bp), medium (200–800 bp), and small ( 5 to 0.025 to 0.005 to More