in

Bioinformatic analysis of chromatin organization and biased expression of duplicated genes between two poplars with a common whole-genome duplication

[adace-ad id="91168"]

An improved reference genome of P. alba var. pyramidalis

To identify the major structural variation between the genomes of these two species, we first produced a chromosome-level genome assembly of P. alba var. pyramidalis using single-molecule sequencing and chromosome conformation capture (Hi-C) technologies, and then performed comparative genomic analysis with a recently published genome assembly of P. euphratica37. The resulting assembly of P. alba var. pyramidalis consisted of 131 contigs spanning 408.08 Mb, 94.74% (386.61 Mb) of which were anchored onto 19 chromosomes (Supplementary Fig. S1 and Supplementary Tables S1–S3). A total of 40,215 protein-coding genes were identified in this assembly (Supplementary Table S4). The content of repetitive elements in the genome of P. alba var. pyramidalis (138.17 Mb, 33.86% of the genome) is 188.94 Mb less than that of P. euphratica (327.11 Mb, 56.95% of the genome), which contributes greatly to their differences in genome size (Supplementary Table S5).

3D organization of the poplar genomes

To characterize the spatial organization and evolution of poplar 3D genomes at a high resolution, we performed Hi-C experiments using HindIII for P. euphratica and P. alba var. pyramidalis, generating a total of 482.95 million sequencing read pairs. These data were mapped to their respective reference genome sequences. After stringent filtering, 81.72 and 94.61 million usable valid read pairs were obtained in P. euphratica and P. alba var. pyramidalis, respectively, and used for subsequent comparative 3D genome analysis (Supplementary Table S6). In addition, we profiled the DNA methylation and transcriptomes of the same tissue samples to provide a framework for understanding the relationships among epigenetic features and 3D chromatin architecture in poplar.

We first examined genome packing at the chromosomal level with a genome-wide Hi-C map at 50 kb binning resolution for P. euphratica and P. alba var. pyramidalis. As expected, the normalized Hi-C map from both species showed intense signals on the main diagonal (Fig. 1, and Supplementary Figs. S2 and S3) and a rapid decrease in the frequency of intrachromosomal interactions with increasing genomic distance, indicating frequent interactions between sequences close to each other in the linear genome (Supplementary Fig. S4). Strong intrachromosomal and interchromosomal interactions were also observed on the chromosome arms, implying the presence of chromosome territories in the nucleus, in which each chromosome occupies a limited, exclusive nuclear space16,38.

Fig. 1: Hi-C heatmaps with compartment region analysis results at 50-kb resolution of P. euphratica chromosome 1 (left) and P. alba var. pyramidalis chromosome 1 (right).

The heatmaps at the top are Hi-C contact maps at 50-kb resolution, which show global patterns of chromatin interaction in the chromosome. The chromosome is shown from top to bottom and left to right. The ICE-normalized interaction intensity is shown on the color scale on the right side of the heatmap. The track below the Hi-C heatmap shows the partition of A (red histogram, PC1 > 0) and B (green histogram, PC1 < 0) compartments as characterized by PCA. CG (yellow curve), CHG (gray curve), and CHH (red curve) methylation modification levels along the chromosome are plotted immediately below the compartment partition track. The heatmap at the bottom shows the density of repeat sequences along the chromosome

Full size image

A common feature of all previous studies of chromatin organization is that regions of each chromosome are organized into “A” and “B” compartments, which correspond primarily to the euchromatic and heterochromatic regions, respectively4,5,6. To examine whether a similar compartment pattern also exists in poplar, we performed principal component analysis (PCA) on the genome-wide interaction matrix and categorized the genomic bins as A or B compartments according to the sign of the first principal component (PC1), with A compartments showing higher gene densities. The results indicated that ~56.72% of the P. alba var. pyramidalis genome belongs to A compartments, a significantly higher percentage than that in P. euphratica (53.09%; P = 2.173 × 10−6, two-sided Fisher’s exact test, n = 7743 in P. alba var. pyramidalis and n = 11,004 in P. euphratica; Supplementary Table S7). We found that interactions within each compartment were more frequent than those across compartments (Fig. 1 and Supplementary Fig. S3), and that the A compartment regions interacted more frequently with A compartments from different chromosomes than with B compartments in both poplar species (Supplementary Fig. S5). The genes in the A compartments displayed significantly higher transcription levels than those in the B compartments, while the B compartments exhibited significantly higher transposable element densities and higher levels of CG, CHG, and CHH methylation in both P. alba var. pyramidalis and P. euphratica (Fig. 1, and Supplementary Figs. S3 and S5). These results are consistent with patterns reported in other plant and animal species6,16,39.

A TAD is defined as a genomic region in which the interactions of the loci with each other tend to be more frequent than interactions with loci outside the region7,40. TADs are a common and prominent feature of the mammalian genome and have been shown to have profound effects on gene expression4,5. Recent studies have indicated that although few TADs have been identified in Arabidopsis15,17, they are ubiquitous in the genomes of rice, cotton, Brassica, and other crops19,20,21,23. To examine the existence of TADs in poplar, we employed the TopDom method41 on the 10-kb corrected interaction matrix of each individual chromosome. A total of 3175 and 4829 TADs with median sizes of 100 and 80 kb were identified in the genomes of P. alba var. pyramidalis and P. euphratica, and collectively covered ~97.34% and 86.28% of the genome lengths, respectively (Fig. 2a, and Supplementary Tables S8 and S9). As expected, these domains showed enriched interactions within the same domain, but less frequent interactions with loci located in adjacent domains (Supplementary Fig. S6). To understand the role of TADs in poplar genome organization, we further analyzed the available genomic features at the TAD boundaries. The results showed that protein-coding genes are more often localized at boundaries than in TAD regions. Prominent enrichment of highly expressed genes at the TAD boundaries was observed in both P. alba var. pyramidalis and P. euphratica (Fig. 2b). Consistent with these results, DNA methylation in the CG, CHG, and CHH contexts displayed an obvious decrease around the TAD boundaries (Fig. 2c). All of these results suggest that the active transcription and epigenetic modification might contribute to the formation of TADs in poplar, similar to findings in other plant species19,20,21,23.

Fig. 2: TAD profiles of P. euphratica and P. alba var. pyramidalis.

a Size distribution of TADs of P. euphratica and P. alba var. pyramidalis at 10-kb resolution. b Density of genes with different expression levels (TPM) around TAD boundaries (±30 kb, window size = 1000 bp) in P. euphratica (left) and P. alba var. pyramidalis (right). Genes were divided into four groups based on the number of transcripts per kilobase of exon model per million mapped reads (TPM), and curves were color-coded based on these groups. c Distribution of DNA methylation levels (CG, CHG, and CHH) around TAD boundaries (±30 kb window size = 1000 bp) in P. euphratica and P. alba var. pyramidalis. The y-axis indicates the methylation level

Full size image

Comparison of 3D organization between the two poplar genomes

To study the evolutionary conservation of genome organization during the speciation of these two poplars, we conducted a whole-genome alignment and compared the distribution of compartments and TADs between the syntenic blocks. The results indicated extensive collinearity and similarity between these two genomes, with 298.66 Mb (73.19%) of P. alba var. pyramidalis sequences aligning with 299.69 Mb (52.17%) of P. euphratica sequences. Further analysis revealed that the majority (65.12%) of the unaligned regions resulted from the recent insertion of repetitive elements in the genome of P. euphratica. In total, we identified 19,235 large (>5 kb) structural variants ranging from 5 to 446 kb in length in the alignment of the two genomes, including 719 inversions, 476 translocations, and 7947 and 10,093 unique regions in P. alba var. pyramidalis and P. euphratica, respectively (Supplementary Tables S10 and S11).

To characterize the relationship between structural variation and spatial organization of the poplar genomes, we first analyzed the conservation of A/B compartments between P. alba var. pyramidalis and P. euphratica, using a 50-kb Hi-C matrix. The results showed that 71.52% (145.75 Mb in P. euphratica and 145.63 Mb in P. alba var. pyramidalis) of the total length of the syntenic regions have the same compartment status between the two species, while 43.68 and 43.71 Mb of the genomic regions exhibit A/B compartment switching in P. alba var. pyramidalis and P. euphratica, respectively (Fig. 3a). For the regions with structural variation, we found that 77% of the inversion events between the two genomes had no effects on their compartment status, while 61% of the translocation events occurred within the regions exhibiting compartment switching (Fig. 4a and Supplementary Table S10). Moreover, we also found that 38.59% and 33.39% of the nonsyntenic regions were identified as A compartments in P. alba var. pyramidalis and P. euphratica, respectively, indicating that the large-scale insertions and/or deletions are biased to occur at heterochromatic regions (Fig. 4b). We further assessed the conservation of genome organization at the TAD level by examining whether the orthologous genes within the same TAD in one species could still be located within the TAD in another species19,21,23. The results indicated that only 48.04% of TADs from P. alba var. pyramidalis and 40.95% from P. euphratica were substantially shared between the two species (Figs. 3b, c). Taken together, these results indicated that the 3D genome organization shows surprisingly low conservation across poplar species at both the compartmental and TAD levels.

Fig. 3: Evolutionary conservation of compartment status and TADs across P. euphratica and P. alba var. pyramidalis.

a Overlap of compartment status between syntenic regions in P. euphratica and P. alba var. pyramidalis. b Overlap of TADs between syntenic regions in P. euphratica and P. alba var. pyramidalis. c Example of conserved TAD structures across a syntenic region between P. euphratica and P. alba var. pyramidalis. The TADs are outlined by black triangles in the heatmaps, and the position of the TAD domains is indicated by alternating blue-green line segments. The mean cf value used to identify the domains is also shown. The orthology tracks of these conserved domains are shown at the bottom

Full size image
Fig. 4: Relationship between structural variation and spatial organization of the genomes of P. euphratica and P. alba var. pyramidalis.

a Analysis of compartment inversion (left) and translocation (right) across P. euphratica and P. alba var. pyramidalis. b Analysis of compartments of species-specific regions in P. euphratica (left) and P. alba var. pyramidalis (right)

Full size image

Relationship between chromatin interactions and expression divergence of WGD-derived paralogs

Poplar species have undergone a recent WGD event followed by diploidization, a process of genome fractionation that leads to functional and expression divergence of the duplicated gene pairs27,28,33. Although no biased gene loss or expression dominance was found between the two poplar subgenomes, there is evidence that nearly half of the WGD-derived paralogs have diverged in expression32,33. To explore the potential role of chromatin dynamics on the observed expression patterns of duplicated genes, we examined their differences in chromatin interaction patterns for both species. We first identified a total of 10,438 and 9754 paralogous gene pairs showing interchromosomal interactions in P. euphratica and P. alba var. pyramidalis, respectively. After correlating the frequency of chromatin interactions with their differences in expression, we found that gene pairs with biased expression (more than twofold differences in expression levels) interacted less frequently than gene pairs with similar expression levels in both species (P = 1.71 × 10−6 and 7.20 × 10−7 for P. euphratica and P. alba var. pyramidalis, respectively, Mann–Whitney U test; Fig. 5a). We also estimated the interaction score (the average of the distance-normalized interaction frequencies) for bins involved in the paralogous gene pairs and quantified their differences in interaction strength (Supplementary Fig. S7 and Supplementary Table S12)3,23. Our results showed that for gene pairs with biased expression, highly expressed gene copies have stronger interaction strengths than weakly expressed copies (P = 2.10 × 10−12 and 2.74 × 10−2 for P. alba var. pyramidalis and P. euphratica, respectively, Mann–Whitney U test), while no significant differences were observed for gene pairs with similar expression levels (Fig. 5b). We further investigated these phenomena at the level of high-order chromatin architecture and found that the gene pairs located in conserved TADs had similar expression levels (P = 2.68 × 10−3 and 7.86 × 10−6 for P. euphratica and P. alba var. pyramidalis, respectively, Mann–Whitney U test; Supplementary Fig. S8). Overall, our analyses indicate that the extensive expression divergence between WGD-derived paralogs in Populus is associated with the differences in their chromatin dynamics and 3D genome organization, and suggest that this organization may function as a key regulatory layer underlying expression divergence during diploidization.

Fig. 5: Comparison of interaction levels between WGD-derived paralogs with biased/similar expression in P. euphratica and P. alba var. pyramidalis.

a The box plot shows that the interaction frequency of WGD-derived paralogs with biased (fold change > 2) and similar (fold change < 2) expression in P. euphratica and P. alba var. pyramidalis. Genes with low expression levels (TPM < 0.5) were discarded. The y-axis values indicate the ICE-normalized interaction frequency. b The box plots show the interaction score of WGD-derived paralogs with biased/similar expression in both P. euphratica and P. alba var. pyramidalis. The y-axis shows the interaction scores, which were calculated based on 10 kb O/E matrices

Full size image

In addition, we identified 849 and 454 spatially colocalized paralogs in P. euphratica and P. alba var. pyramidalis, respectively, which exhibited significantly stronger chromatin interactions than other gene pairs derived from WGD (false detection rate < 0.05). The number of colocalized paralogs was greater than that obtained from 1000 randomly selected samples, indicating that the spatial organization of the WGD-derived paralogs is not random and that they are more likely to be colocalized in both species (Fig. 6a). Further comparisons showed that these colocalized paralogs exhibited more similar DNA methylation patterns than noncolocalized gene pairs, especially in the “CHH” context (Fig. 6b). We finally examined the evolutionary conservation of the spatial colocalization, and the results showed that 198 of the colocalized gene pairs were orthologous between the two species. These overlapping genes accounted for 11.66% and 21.81% of the total colocalized paralogs in P. euphratica and P. alba var. pyramidalis, respectively, significantly higher proportions than expected by chance (3.89 and 7.38% at random, P < 2.2 × 10−16, two-sided Fisher’s exact test). These results highlight the conservation of colocalized paralogs and suggest that the spatial constraints of 3D genome organization might have functional significance under selective pressure.

Fig. 6: Colocalization frequency of interchromosomal paralogs retained after WGD and comparison of DNA methylation modification differences between colocalized and noncolocalized paralogs.

a Colocalized interchromosomal paralog pairs in P. euphratica (left) and P. alba var. pyramidalis (right). The orange dashed lines indicate the observed colocalization frequencies for paralogs retained after WGD. The blue curves show the colocalization frequency distributions for 1000 randomizations of the same number of pairs as in the real data. b Box plots of the absolute difference in DNA methylation modification levels (CG, CHG, and CHH) between colocalized paralogs and between noncolocalized paralogs after WGD in P. euphratica and P. alba var. pyramidalis

Full size image

Source: Ecology - nature.com

3 Questions: Claude Grunitzky MBA '12 on launching TRUE Africa University

A multifaceted approach to understanding bat community response to disturbance in a seasonally dry tropical forest