Genotyping control and datasets creations
After filtering the initial raw dataset of 1,071 goats for poor quality genotype and related animals we obtained the dataset used for the haplotype sharing analysis, which included 980 animals and 48,396 SNPs. This dataset was further pruned for linkage disequilibrium (LD; r2 < 0.2).
We first balanced the number of animals in the different populations by reducing the size of the nine largest groups, leaving 42,088 SNPs and 802 individuals. We used this dataset to perform population structure analyses (Multi-Dimensional Scaling (MDS), Admixture, and Reynolds distances).
Upon the removal of 2nd-degree related individuals and animals without geographical coordinates, we retained 41,898 SNPs and 489 individuals for Landscape Genomics analyses. See Table 1 for the detailing of the different datasets.
Population structure
The MDS plot showed a north–south geographic gradient comparable with previous findings on Italian goat population structure6. The first MDS component identified three main groups corresponding to northern Italian, central-southern Italian, and Maltese populations. The second MDS component discriminated the insular Montecristo goat (MNT_I; Fig. 1a) from the other mainland breeds, likely due to the high inbreeding and prolonged geographical isolation (Somenzi et al., in preparation). For this reason, we excluded the two Montecristo populations (MNT_M and MNT_I) from the subsequent population structure and haplotype sharing analysis and to repeat the analyses without them. The new MDS plot without the two MNT populations (Fig. 1b) still separated the three main groups on the first component, a structure further supported by the bootstrapped Reynolds’ distances phylogenetic tree (Supplementary Fig. 2).
(a) MDS plot representing all the populations contained in the dataset, (b) MDS plot without the two Montecristo populations (MNT_M e MNT_I).
Although overlapping a previous investigation of the Italian caprine population structure6, our improved dataset identified a closer relationship between the central and southern Italian population, more in accordance with the recent known history and geography of the Country. Until 1860 Italy was divided in many states with tight connections to other European kingdoms (https://www.150anni.it). The north-western part of the country and Sardinia were part of the Sardinian kingdom, tightly connected with the French empire, whereas the north-east part (the Kingdom of Lombardy—Venetia) was under the political influence of the Austrian Empire. Central Italy was ruled by the Papal state, and southern Italy and Sicily were under the Kingdom of the two Sicilies ruled by the Borbone (Fig. 2)20.
Historical map of Italy pre-unification with pie chart of the ADMIXTURE K = 3 values plotted on the geographical coordinates of the mean sampling location of each breed. The colours reflect the ADMIXTURE component associated with the Maltese (purple), Northern Italy Cluster (yellow), and Central-Southern Cluster (blue). BEZ and MXS were not represented due to the lack of specific geographic coordinates. MNT_I and MNT_M were excluded as extreme outliers. Map was generated in Inkscape v1.0 (https://inkscape.org/); the pie charts were created using R21.
The ADMIXTURE analysis (Supplementary Fig. 3) at K = 2 separates the Maltese populations (purple component) and the Northern Italy breeds (yellow component), and improves the representation of the North–South gradient over previous studies on Italian goat populations6. At K = 3 it resembles the MDS plot distinguishing the central-southern Italian breeds led by the Girgentana (GIR; blue component) and the mean proportion for each breed overlap nicely with the political borders of Italy prior 1860 (Fig. 2). Each K above 3 distinguishes single or groups of breeds, such as Teramana (TER; K = 4) and Valdostana (VAL; K = 5). The lowest cross-validation error was recorded at K = 20 (Supplementary Fig. 4) and showed the similar genetic background of those breeds originated from the same geographical regions (north, central, south and Maltese), and some breeds identified by private clusters, once again confirming the uniqueness of GIR, ORO, VAL, TER and SAM, among others (Supplementary Fig. 3).
The haplotype sharing analysis across populations (Fig. 3) also highlights the three genetic groups corresponding to admixture K = 3 and consistently with the administrative and temporal history of the Italian Peninsula until 186020.
Proportion of the median haplotype shared among the Italian goat populations. The colours reflect the ADMIXTURE components at K = 3, which overlap with the administrative and temporal history of the Italian Peninsula until 1860. The outgroup is highlighted in green (extended name reported in Table 1). The figure was generated using Circos v0.69-8 Software22.
We observe that the Northern-Italian populations (yellow cluster) show no haplotype exchange with the other clusters, with the exception of SAA and TER probably due to a recent introgression event. Within the Northern-Italian cluster there is a more pronounced haplotype sharing among the Lombardy breeds (ORO, NVE, LIV and BIO) than among those from the rest of the Alps. The Val Passiria (VPS) together with the Garfagnana (GAR) are the only two populations that do not exchange haplotypes at all, perhaps suggesting a geographical and/or political isolation. Populations from Central-Southern Italy (blue cluster) show large haplotype sharing within and among different clusters, possibly due to breeding and management practices as well as local geographical conditions, such as breeds from the Lazio region (BIA, GCI, CAP, and FUL) have high haplotype sharing among themselves. Lastly, the populations from the isles and in particular the Maltese (MAL and SAM, purple cluster) and Sarda (SAR and MXS) are those that mostly shared haplotypes with all other southern breeds, probably as a consequence of their high productivity and diffusion over the territory. The green colour represents the outgroup Capra aegagrus that does not exchange haplotypes with any of the other breeds. Importantly, future investigations with dedicated experimental designs aimed to dissect the different effects of selection might aid unfolding the undergoing evolutionary dynamics.
The political subdivision of Italy preceding the unification of the country has probably contributed to maintain the ancient genetic flows from central-north Europe in the north of the country and from Africa and Spain in the south13, with only a minor impact on the population structure of the following 150 years of history of the country.
Landscape genomics
The landscape genomics analyses (LGA) were performed using the climatic variables representing the current climate applying two different approaches: Samβada23 and LFMM24. We observed no direct overlap between the two methods. However, this is not surprising as simulation studies showed that LFMM is overall more conservative than Samβada, and the two methods tend to have marginal overlap on co-selecting the same signals, with the most significant loci detected by Samβada ignored by LFMM23. Samβada identified 252 genotypes belonging to 216 different SNPs significantly associated (FDR < 0.05) with at least one climatic variable (Supplementary Table 1). Among them, 75 SNPs mapped within a gene region annotated in the goat genome (ARS1.2), identifying a total of 62 different genes associated with at least one of the following four representative environmental variables: “Isothermality” (47 genes), “Mean diurnal range” (four genes), “Mean temperature wettest quarter” (three genes) and “Precipitation coldest Quarter” (11 genes) (Supplementary Table 2). Some of these genes had already been identified in other landscape genomics works in relation with different environmental variables, for example ANK3 and BTRC in relation to longitude, and RYR3 with Mean Temperature of the wettest quarter (BIO3)19. The DCLK1 gene, in particular, was found in association with the continental goat group compared to the rest of the world9. Details on correlations among representative and excluded variables are shown in Supplementary Table 3.
Initially, we investigated the role in biological pathways of the 62 genes identified by Samβada (Supplementary Table 2), splitting them according to the associated environmental variable. We identified only one significant pathway (“Circadian rhythm related genes WP3594”; adjusted p-value < 0.0045) for those genes associated with “Mean diurnal range” with two genes linked to the circadian clock regulation (MAPK925) and to hair follicle formation and hair growth in Cashmere goat (NTKR326).
We also analysed the 62 genes individually to better understand their function. Using the information found, we can clump the most interesting genes into four groups based on the phenotype they affect the most: (1) meat- and growth-related genes, (2) circadian rhythm-related genes, (3) fertility-related genes, and (4) inflammatory response genes.
The first group (meat and growth) is the largest and counts 24 genes, including HADC9, which has a role in the feedback inhibition of myogenic differentiation in sheep muscle27, DLG1, that is related to adipogenesis and residual feed intake in cows28, and KLF12, which is related to the formation of preadipocytes in goats29. The second group (circadian rhythm) includes 12 genes, such as MAPK9 and EYA3, both related to melanin production and photoperiod regulation30, and KCNJ1, associated with the production of polyunsaturated fatty acid (PUFA) and feed efficiency in cattle5,31,32. The third group (fertility-related) includes 15 genes such as BTRC, whose mutations can affect spermatogenesis and mammary gland development in mice33, PRKD1, associated to age at puberty in pigs34, and DENND1A, related to anti-Mullerian hormone and superovulation in dairy cows and to polycystic ovarian syndrome in human35. Finally, the fourth group (inflammatory response) includes eight genes such as BTLA, strongly related to rheumatoid arthritis36. This last gene in particular is relevant as a candidate for one of the most relevant infective diseases of goats worldwide, the Caprine Arthritis Encephalitis Virus (CAEV). This virus belongs to the Retroviriade virus family, like the human immunodeficiency virus (HIV), and has rheumatoid arthritis among its principal symptoms37,38. Due to the CAEV importance and the relevance of climatic factors and their change play into pathogens diffusion39, this group of genes becomes a potential candidate for studies on livestock resilience to incoming climate challenges.
LFMM identified four SNPs significantly associated (FDR < 0.05) with three different climatic variables (Mean Diurnal Range, Mean Temperature Wettest Quarter, and SlopeP), two of which intercepting NBEA, a gene located within a region involved with wool production in sheep40 and previously associated with continental goat group in the work of Bertolini et all 20189, and the RHOBTB1 gene that is known to be associated to meat quality in cattle41 (Supplementary Table 4).
Future genotypes prediction
The data collected on the current Koppen-Geiger climate classification showed that 21 Italian breeds live in “Temperate” regions, eight in “Cold” regions (BIO, SAA, VLS, TER, MNT_M, LIV, ORO, VPS), two in “Arid” regions (GAR, MNT_I), and one in a “Polar” region (VAL; Fig. 4a). BEZ and MXS were not considered for the analysis due to lack of georeferenced information. If we compare the current Koppen-Geiger classification of their breeding areas with the future predictions (Fig. 4b), we observe that, in 70 years from now, only 11 breeds will live in regions that will not change their classification. Such a scenario will likely pose new threats to those populations living in colder climates, whereas those breeds coming from the warmer parts of the country might have a chance to expand their range, with direct repercussions on the genetic diversity and survival of these breeds.
Koppen-Geiger climate maps with classification for Italy (a) present day (b) future (extended name reported in Table 1). The Maps were generated using R combining the information available in Rubel et al.42 and Beck et al.14.
Among them, nine (ASP, BIA, CAP, GRF, MES, NIC, RCC, RME, SAR) populate “Temperate dry hot summer (Csa)” areas, one (GAR) is present in an “Arid step cold (Bsk)” area, and one (NVE) in a “Temperate without dry season hot summer (Cfa)” region. The remaining 21 breeds populate regions with a warmer and drier climate in the future (Table 2).
A one-way ANOVA analysis applied on the groups based on the Koppen-Geiger classification identified 27 SNPs that significantly differentiate the groups DRY/NOTDRY (seven within a gene region) and 11 that differentiate the groups HOT/NOTHOT (two within a gene region) (Supplementary Table 5). The linear regression model, applied to verify the variation of the genotype frequencies over time based on the value of their related variables, allowed us to identify five significant SNPs out of nine, intercepting the genes CHD2, ARL13B, KLF12, and PAK5 for the DRY/NOTDRY group and RACGAP1 for the HOT/NOTHOT group (Supplementary Table 6). Then, we calculated the expected future variation of allelic and genotypic frequencies of the significant SNPs in these groups. For instance, the SNP “snp32991-scaffold385-133908” intercepts the ARL13B gene and is associated to “Isothermality” with the genotype GG. At present, the frequency of the G allele of this SNP is 0.4296 in the DRY group and 0.6109 in the NOTDRY group and the delta of the variable “Isothermality” for the two groups is respectively − 0.1253 for the DRY group and − 0.0935 for the NOTDRY group. Using the regressor of the linear regression model (b = 0.3278), we predicted the future G allele frequency for this SNP in both groups (0.3885 and 0.5802 for the DRY and NOTDRY group, respectively) and consequently the expected GG genotype frequency (respectively 0.1509 for the DRY group and 0.3366 in the NOTDRY group). This simplified model suggests a future reduction of the genotype currently associated with the reference variable (“Isothermality”) in both groups. Interestingly, the gene intercepted by ARL13B interacts with RABGEF1, related to the reduction of the circadian cycle in humans according to the GenomeRNAi human phenotypes database (http://www.genomernai.org). In general, the prediction analysis identified SNPs that might go to stabilization of the frequencies or fixation (see “snp44855-scaffold611-263638” and “snp40739-scaffold521-1667886”, respectively; Table 3).
Source: Ecology - nature.com