in

State of ex situ conservation of landrace groups of 25 major crops

Crops and their landrace study areas

Food crops whose genetic resources are researched and conserved by CGIAR international agricultural research centres or by the CePaCT of the SPC were included in this study. Crop landrace distributions were modelled and conservation analyses conducted within recognized primary and, for some crops, secondary regions of diversity, where these crops were domesticated and/or have been cultivated for very long periods, and where they are, thus, expected to feature high genetic diversity and adaptation to local environmental and cultural factors (Supplementary Tables 1 and 2)9,13. These regions were identified through literature review (Supplementary Information) and confirmed by crop experts.

Occurrence data

Our crop landrace group distribution modelling and conservation gap analysis rely on occurrence data, including coordinates of locations where landraces were previously collected for ex situ conservation and reference sightings. For ex situ conservation records, occurrences marked as landraces were retrieved from two major online databases: the Genesys Plant Genetic Resources portal33 and the World Information and Early Warning System on Plant Genetic Resources for Food and Agriculture (WIEWS) of the Food and Agriculture Organization of the United Nations34. Occurrences were also obtained directly from individual international genebank information systems: AfricaRice, the International Transit Centre and Musa Germplasm Information System of Bioversity International35, CePaCT, International Center for Tropical Agriculture (CIAT), International Maize and Wheat Improvement Center (CIMMYT), International Potato Center (CIP), International Center for Agricultural Research in the Dry Areas (ICARDA), International Crops Research Institute for the Semi-arid Tropics (ICRISAT), International Institute of Tropical Agriculture (IITA) and International Rice Research Institute (IRRI), as well as from the United States Department of Agriculture (USDA) Genetic Resources Information Network (GRIN)–Global36 and the Comisión Nacional para el Conocimiento y Uso de la Biodiversidad (CONABIO)37. Occurrences were compiled from the Global Biodiversity Information Facility (GBIF), with ‘living specimen’ records classified as ex situ conservation records and the remaining serving as reference sightings for use in distribution modelling. Reference occurrences were also drawn from published literature (Supplementary Information). Duplicated observations within or between data sources were eliminated, with a preference to utilize the most original data. Coordinates were corrected or removed when latitude and longitude were equal to zero or inverted, located in water bodies or in the wrong country or had poor resolution (<2 decimal places). Occurrences were clipped to study areas per crop. The complete occurrence dataset is available in Supplementary Dataset 2.

Spatial predictors

We compiled and calculated spatially explicit gridded information for 50 potential environmental and cultural predictors of landrace distributions, including climatic, topographic, evolutionary history and socioeconomic variables (Supplementary Table 3)13. For climate data, we gathered or derived 39 variables from WorldClim version 2 (ref. 38) and Environmental Rasters for Ecological Modeling (ENVIREM)39. We included elevation from the Shuttle Radar Topography Mission (SRTM) dataset of the CGIAR–Consortium on Geospatial Information portal40,41. Two crop evolutionary history proxies were included: distance to human settlements before the year ad 1500 (ref. 42) and distance to known wild progenitor populations13. The eight socioeconomic variables included population density43, distance to navigable rivers44, percentage of the area under irrigation45, population accessibility46,47, geographic distributions of ethnic or cultural groups48 and crop harvested area, production quantity and yield49. All predictor data were scaled to 2.5-arc-minute resolution with World Geodetic System (WGS) 84 as a datum. Extended descriptions of the sources and their justification for inclusion are provided in Ramirez-Villegas et al.13. The complete spatial predictor dataset is available in Supplementary Dataset 3.

Crop landrace group classifications

Crop landraces are cultivated plant populations managed by Indigenous or traditional farmers through cultivation, selection and diffusion1. They are typically genetically heterogeneous, although some types, such as clonally propagated populations, may be relatively homogeneous. They have recognizable characteristics, identities and geographic origins are in an ongoing process of adaptation to their local environments and societal conditions1,2,31. For most crops, landraces number in the thousands, with major global staple cereals such as rice and wheat potentially represented by hundreds of thousands of landraces50,51, although precise numbers and consensus regarding differentiations among landraces within crops have not been established. Given the diversity of landraces and the complexity of environmental and cultural drivers differentiating them, our method seeks a compromise between, on the one hand, acknowledgement of this diversity and, on the other, the feasibility and performance of distribution modelling and conservation gap analysis.

For each crop, we, therefore, conducted an extensive literature review to identify recognized infraspecific groups with distinct morphological, physiological, chemical, genetic, nomenclatural or other characteristics that could be tested for environmental and cultural associations (Supplementary Table 1 and Supplementary Information). The nature of these groups varied by crop and included genepools, races, genetic clusters and geographic or environmental groupings. Crops often had more than one proposed grouping or classification.

We then built and tested classification models to determine how well the proposed groups could be predicted and distinguished based on spatial predictors, drawing from the occurrence database and training datasets compiled from the literature review. A random forest52, a support vector machine53, the K-nearest neighbour (KNN) algorithm54 and artificial neural networks55 were used to determine classification performance. The response variable was the group to which a given occurrence was assigned, whereas the explanatory variables were the spatial predictors. Models were combined into an ensemble using the mode—that is, the most frequent predicted value among the models—and tested using 15-fold cross-validation with 80% training and 20% testing. We accepted a given classification if each of its components was predicted with an average cross-validated accuracy of at least 80%. In the case of multiple classification proposals per crop, we selected the one with the best overall performance. Finally, we used the trained models to predict the corresponding group for occurrences missing such information. All landrace groups for all crops are provided in Supplementary Table 2, with the best-performing groups identified.

Crop landrace group distribution modelling

To predict the probability of geographic occurrence for each landrace group within each crop, we generated MaxEnt models56,57 using the ‘maxnet’ R package58. Group-specific spatial predictors were selected using a combination of the variance inflation factor (VIF) and a principal component analysis (PCA) to control for excessive model complexity and variable collinearity59. We removed variables that did not contribute significantly to the variance in the PCA, defined as contributing less than 15% to the first component, and we further discarded variables with a VIF > 10 (ref. 60). The predictors and whether they were selected for the modelling of each landrace group are presented in Supplementary Table 4.

We generated a random sample of pseudo-absences as background points in areas that (1) were within the same ecological land units61 as the occurrence points, (2) were deemed potentially suitable according to a support vector machine classifier that uses all occurrences and predictor variables and (3) were farther than 5 km from any occurrence62. The number of pseudo-absences generated per crop group was ten times its number of unique occurrences.

MaxEnt models were fitted through five-fold (K = 5) cross-validation with 80% training and 20% testing. For each fold, we calculated the area under the receiving operating characteristic curve (AUC), sensitivity, specificity and Cohen’s kappa as measures of model performance. To create a single prediction that represents the probability of occurrence for the landrace group, we computed the median across K models. Geographic areas in the form of pixels with probability values above the maximum sum of sensitivity and specificity were treated as the final area of predicted presence13.

Ex situ conservation status and gaps

Three separate but complementary metrics were developed to compare the geographic and environmental diversity in current ex situ conservation collections to the total geographic and environmental variation across the crop landrace group distribution model and, thus, to identify and quantify ex situ conservation gaps13.

A connectivity gap score (SCON) was calculated for each 2.5-arc-minute pixel within the distribution model by drawing a triangle63,64 around each pixel using the three closest genebank accession occurrence locations as vertices and then deriving normalized values for the pixel based on distance to the triangle centroid and vertices13. The SCON of a pixel is high—closer to 1 on a scale of 0–1—when its corresponding triangle is large, when the pixel is close to the centroid of the triangle or when the distance to the vertices is large. A high SCON represents a greater probability of the pixel location being a gap in existing ex situ collections.

An accessibility gap score (SACC) was calculated for each 2.5-arc-minute pixel in the distribution model by computing travel time from each pixel to its nearest genebank accession occurrence location based both on distance and the speed of travel, defined by a friction surface13,45. Travel time scores were normalized by dividing pixel values by the longest travel time within the distribution model, with the final score ranging from 0 to 1. A high SACC value for a pixel reflects long travel times from existing genebank collection occurrences and, thus, represents a higher probability of the pixel location being a gap in existing ex situ collections.

An environmental gap score (SENV) was calculated for each 2.5-arc-minute pixel in the distribution model by conducting a hierarchical clustering analysis using Ward’s method with all the predictor variables from the distribution modelling. The Mahalanobis distance between each pixel and the environmentally closest genebank accession occurrence location was then computed13. Environmental distance scores were normalized between 0 and 1. A high SENV value for a pixel reflects a large distance to areas with similar environments where landraces have previously been collected for genebank conservation and, thus, represents a higher probability of the pixel location being a gap in existing ex situ collections.

Spatial ex situ conservation gaps were determined from the conservation gap scores using a cross-validation procedure to derive a threshold for each score. We created synthetic gaps by removing existing genebank occurrences in five randomly chosen circular areas with a 100 km radius within the distribution model. We then tested whether these artificial gaps could be predicted by our gap analysis, identifying the threshold value of each score that would maximize the prediction of these synthetic gaps. Performance for each of the five gap areas was assessed using AUC, sensitivity and specificity. The average cross-area threshold value was calculated for each score to discern pixels with a high likelihood of finding ex situ conservation gaps and that, thus, were higher priority for further field sampling. These were pixels with combined gap scores above the threshold, assigned a value of 1, as opposed to the relatively well-conserved areas below the threshold, which were assigned a value of 0.

The three binary conservation gap scores were then mapped in combination, resulting in pixels across the distribution model with gap values ranging from 0 to 3. Pixels with a value of 0 display no connectivity, accessibility or environmental gaps and are considered well represented ex situ. Pixels with a value of 1 indicate a conservation gap in connectivity, accessibility or the environment; we consider these ‘low-confidence’ gaps. Pixels with a value of 2 indicate gaps in two metrics or ‘medium-confidence’ gaps, and values of 3 indicate gaps across all metrics or ‘high-confidence’ gaps. High-confidence gap areas are displayed on crop-conservation-gap maps (Fig. 2b and Supplementary Information) and conservation hotspot maps across crops (Fig. 4 and Extended Data Figs. 5–8).

The representation of crop landrace groups in current ex situ conservation collections was calculated based on the final 1–3 value conservation-gap maps. The complement of the proportion of the modelled distribution considered as a potential conservation gap by any single gap score represents the minimum estimate of current representation; the complement of the proportion considered by all three scores as a gap, which is to say high-confidence gap areas, represents the maximum estimate (Supplementary Tables 1 and 2).

While distribution modelling and conservation gap analyses were conducted at the crop landrace group level and results are presented in full in the Supplementary Information, for ease of comparison of results across crops, and to avoid bias towards crops with many landrace groups, we also calculated summary results at the crop level. Crops that had been assessed with geographic differentiations, including maize in Africa and Latin America and yams in the New World and the Old World, were also combined. For spatial results, the pixels in crop landrace group models were summed—that is, constituent landrace group models were combined. The minimum and maximum current conservation representation estimations at the crop level were then calculated based on combined spatial models.

GBIF occurrence downloads

The following occurrence downloads from the Global Biodiversity Information Facility (GBIF; https://www.gbif.org/, 2017−2021) were used: 10.15468/dl.rrntfr, 10.15468/dl.2f2v4h, 10.15468/dl.2ywlb7, 10.15468/dl.lnfelh, 10.15468/dl.ryrmfj, 10.15468/dl.8adf61, 10.15468/dl.nff5ys, 10.15468/dl.erxs6e, 10.15468/dl.vbfgho, 10.15468/dl.mjjk3x, 10.15468/dl.uppz1n, 10.15468/dl.938bgm, 10.15468/dl.hr87hm, 10.15468/dl.k1va80, 10.15468/dl.coqpu2, 10.15468/dl.lkoo9u, 10.15468/dl.e998mp, 10.15468/dl.vfbmm7, 10.15468/dl.tnp478, 10.15468/dl.6zxsea, 10.15468/dl.0lray8, 10.15468/dl.5sjgsw, 10.15468/dl.wkju6h, 10.15468/dl.7xzfvc, 10.15468/dl.autlf5, 10.15468/dl.fe2amw, 10.15468/dl.2zblvz, 10.15468/dl.ddplkj, 10.15468/dl.jbzejg, 10.15468/dl.ej5bha, 10.15468/dl.905pxd, 10.15468/dl.pim1vs, 10.15468/dl.vdridc, 10.15468/dl.b43gyv, 10.15468/dl.nnw3z7, 10.15468/dl.bnt9jc, 10.15468/dl.f5x2cg, 10.15468/dl.ub7zbg, 10.15468/dl.sggf2v, 10.15468/dl.ath5ve, 10.15468/dl.23k3ug, 10.15468/dl.cym376, 10.15468/dl.53bwzk, 10.15468/dl.fsad7h and 10.15468/dl.fm6p7z.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.


Source: Ecology - nature.com

Arboreal camera trap reveals the frequent occurrence of a frugivore-carnivore in neotropical nutmeg trees

Team creates map for production of eco-friendly metals