Our results suggest that certain plant functional traits can be retrieved from simple RGB photographs. The key for this trait retrieval are deep learning-based Convolutional Neural Networks in concert with Big Data derived from open and citizen science projects. Although these models are subject to some noise, there is a wealth of applications for this approach, such as global trait maps, monitoring of trait shifts in time and the identification of large-scale ecological gradients. This way, the problem of limited data that still impedes us to picture global gradients7 could be alleviated by harnessing the exponentially growing iNaturalist database16. The performance of the CNN models across traits varied strongly, but revealed a clear trend: As expected, the more a trait referred to morphological features, the more accurate the predictions were. The models of the Baseline setup explained a substantial amount of the variance for LA and GH, whereas traits that are partly related to morphological features, SLA and SM, show moderate (R^2) values. The predictions of LNC and SSD explain almost none of the variance, suggesting that tissue constituents are not directly expressed or related to visible features. It also indicates that the strong covariance among these traits13 does not suffice in supporting their predictions from photographs. If the RGB images do not contain relevant information, the model will minimise the prediction errors by the regression-to-the-mean bias seen in Fig. 3 (especially lower panels).
The value of informing the model on the known trait variability through an augmentation of the target values (Plasticity setup) depended on the results of the Baseline setup of the trait. That is, the better the predictive performance of the Baseline setup, the more the trait seemed to profit from the Plasticity setup, rendering it ineffective for SSD, LNC and SM (Fig. 3). Refraining to cling to species mean values by considering within-species trait variation has been applied before using conventional methods26, but to our knowledge has never been tried in CNN models, yet. We expected that providing a distribution of trait values rather than a single mean for each species can convey to the CNN that different trait realisations can be expected from the same species. Obviously, this idea can only work if a distribution rather than a single value is available for each species. The SM dataset, for instance, contained only one image per species (Table 1). In this case, the Plasticity setup reduced the predictive performance compared to the Baseline setup, possibly by increasing the discrepancy to the true trait value. Since the traits with more accurate predictions profited most from the Plasticity setup, we assume that it supports the model in learning to predict the trait expressions themselves rather than extracting them indirectly through taxa-specific morphological features. Given that we restricted the number of images per species to a maximum of 8, while successful deep learning-based plant species identification usually requires thousands of images12,22, it seems very unlikely that the models inferred traits from species-specific plant features visible in the imagery. The latter was underpinned by our finding that the predictions of most traits are void of phylogenetic autocorrelation (Supplementary Information 1 and Supplementary Table 5), indicating that taxonomic relationships were insignificant for the trait predictions. The absence of phylogenetic autocorrelation of the prediction errors underlines that the models did not learn species-specific features for most traits, as this would imply similar trait predictions for related species.
On the contrary, the SSD model predictions express a phylogenetic signal (Supplementary Information 1 and Supplementary Table 5). Trait expressions are generally clustered under similar climatic conditions29,30,31. Simultaneously, climatic conditions constrain the geographic distribution of species and growth forms29,30,31,32. The SSD dataset is biased towards woody species (Table 1), which confines it to a smaller taxonomic range. Hence, the phylogenetic signal of SSD might result from its phylogenetic clustering and predominant dependence on bioclimatic information rather than on RGB imagery (Fig. 2).
Nevertheless, the benefit of including climate information on temperature, precipitation and their seasonality8,20,21,26 on predicting trait expressions was confirmed for all traits in this study, which underlines the value of contextual constraints in CNN models10 (see below for a discussion of the relevance of climate vs. image data). This also highlights the general flexibility of deep learning frameworks in adapting to variable input data from different scales and sensors10, which makes them a promising tool for ecological research. Our results particularly revealed this effect for SSD, SLA and LA, whereas it was smaller for GH, LNC and SM (Fig. 2). For the latter traits, other physical constraints such as disturbance33,34, seasonal variation35,36 and soil conditions6,26,28 come into consideration. As the focus of the Worldclim setup was to show that contextual cues can improve the trait retrieval from photographs rather than identifying the best set of auxiliary data, we confined the analysis to the most promising20,26 data source (WorldClim37).
In the Worldclim setup, a single model accumulated knowledge about the trait learning task. Combined predictions of different CNN models, however, have shown to surpass the predictive performance of single CNN, e.g. in plant species identification tasks22. Each of the CNN models is prone to literally ‘look’ at different aspects of the learning task by focusing on different image features. Previous research also showed increased model performance in a trait prediction task in case of ensembles of regression and machine learning models26. Accordingly, and as demonstrated by our results, an ensemble approach seems promising to further enhance predictive performance of CNN models concerning trait prediction.
The predictive performance of these Ensembles has shown to be reproducible with different sets of training images (cp. Figs. 2, 3, Supplementary Fig. 2). In our heterogeneous dataset, model performance was not affected by different growth forms, image qualities and image-target distances (Fig. 4). Different growth forms and plant functional types show their own characteristic trait spectrum13. Possibly, contextual cues within the image might have supported the CNN in inferring the plant functional type of a species, e.g. by a long-distance image being indicative of a tree species. Yet, since the majority of the images only shows single plant organs on close-up photographs (Fig. 4), we assume that the trait predictions are not confounded by the identification of growth forms. Furthermore, the absence of a phylogenetic signal in the prediction errors for most traits highlights the model’s ability to generalise by extracting trait information independent of taxonomic relationships, meaning that the models (except for SSD) did not learn species-specific mean trait expressions (see Supplementary Information 1 and Supplementary Table 5).
Additionally, we disclosed the high generalisability of these results by investigating the datasets’ underlying distributions both spatially (Supplementary Fig. 3) and across biomes (Supplementary Fig. 4). Although some regions such as Central Europe and North America show higher data coverage, the datasets used for this study contain data across all biomes and regions on Earth. Therefore and despite this clustering, we expect the models to be applicable for all biomes around the globe. This was highlighted by an additional analysis showing that the predictive performance of the models is reasonably constant across biomes (Supplementary Fig. 5). As suggested by refs.38,39, we tolerated a certain spatial bias in favor of larger datasets. Although the SSD dataset predominantly contained woody species, neither of the six datasets expressed an exclusion of either growth form (Table 1, Fig. 4).
The application of our models to global gradients of traits revealed that our GTDM indeed cover macroecological patterns and trends known from other publications: The latitudinal distributions could roughly be confirmed for GH26, LNC26,27 and SM6,8 (Fig. 5). Predicted trends for maximum leaf size hint at the applicability of our GTDM of LA40. The trait gradients for North America were confirmed for SLA6,8,26,27,28, SM6,8, LNC27 and SSD6 alike. Although based on different input data and modelling methods, the major global latitudinal gradients found in previous studies could be reproduced by our GTDM, which indicates the plausibility of the latter6,8,26,27.
We further validated the GTDM quantitatively by means of correlations with other GTDM. Regarding SSD, the detected high correlations might be due to method similarity, as our GTDM product of SSD primarily builds upon climate data (see above), just as ref.6,26. For GH, SLA and SM, however, the high correlations are unlikely to result from climate data exclusively, as the explained variances of the RGB imagery ((R^2) of Plasticity setup) are higher than the additional contributions of the Worldclim setup (approx. 94%, 70% and 79% share of imagery on total explained variance, respectively; Supplementary Table 2). We decoupled the GTDM products from bioclimatic information in an additional analysis (Supplementary Fig. 6). Remarkably, the macroecological patterns could roughly be reproduced when the GTDM were based exclusively on RGB imagery, which shows that the bioclimatic information merely serves to smooth the macroecological trait patterns for most of the traits.
Despite of all GTDM being at least partly build upon climate data and using trait data from the same source (TRY database), some GTDM of SLA and all GTDM of LNC vary strongly in their correlations (Supplementary Fig. 1). On the one hand, this might indicate that LNC varies at a different scale, e.g. on account of its seasonal and within-species variation35,36. On the other hand, other GTDM products are based on mean trait values weighted by abundances of plant functional types27,28 rather than single trait predictions, which might explain negative or non-significant correlations as well.
Hence, a potential pitfall of the presented approach is that it is prone to express an observation bias, e.g. by citizen scientists only taking pictures of the most striking species. The sampling design underlying the GTDM does not account for plant community composition, meaning that we cannot tell if plant photographs at a certain location represent the actual community structure. Since many images contain more than one individual plant and different species, the CNN model predictions, however, might be based on more than one species, thereby partly resembling trait expressions of the community. The representativeness of trait data for plant communities, though, remains an ubiquitous problem of global trait maps, including those fully based on trait data from the TRY database7, since every available dataset is far from representing the actual plant community composition7. Hence, at present our GTDM have to be considered a plausibility check of the model predictions rather than an application-ready trait map product, not least because the sampling of images might not be representative of the respective plant community.
Nevertheless, our results indicate that exploiting a Big Data approach is viable to reveal macroecological trait patterns, maybe because the most striking species of an ecosystem are likely to suffice in describing its functional footprint5. Since the strong growth of the iNaturalist database leads to a steadily increasing geographic coverage, the representativeness of these data is likely to grow as well. A recent study investigating the records of FloraIncognita12, a citizen-science and deep learning-based application for identifying plant species from photographs, suggested that such crowd-sourced data can reproduce primary dimensions in plant community composition41. This underlines the future potential of harnessing citizen science databases for identifying these patterns. Here, we demonstrated the practical value and applicability of the CNN models by producing GTDM that were able to reproduce known macroecological trait patterns while displaying one anticipated application of this method. Additionally, in these GTDM we bypass the issue of spatial error analysis that is challenging for most GTDM products26 by obtaining a potentially arbitrary number of observations in light of the strongly increasing number of observations in iNaturalist, almost rendering an extrapolation obsolete. Our GTDM are based on individual trait measurements rather than estimated on behalf of a small set of covariates, which is typical for climate-based GTDM26. Since plant traits vary strongly within species17,18,19, these measurements express a high practical relevance. As the iNaturalist plant photograph database is witnessing an exponential growth of data inputs, the potential of exploiting this data source for plant trait predictions is growing rapidly. It is worth mentioning that this approach also led to the first publication of a GTDM of mean LA (available for download under https://doi.org/10.6084/m9.figshare.13312040), since former publications were limited to modelling upper limits of LA based on climatic constraints40.
Future studies building on our work, which benefit from the ever-growing data accumulation of both the iNaturalist and TRY database, might not face restrictions of dataset size as we did in our study. This might allow for more representative samples in future studies, e.g. enabling to stratify training data by species while simultaneously balancing the trait distribution. This might support a reduction of the regression-to-the-mean bias seen in all of the results (Fig. 3) by avoiding to overrepresent common trait expressions. Another possible approach would be to select only species with particularly low variability for model training, since it decreases the chance of incorporating images showing plants with an extreme trait expression that differ strongly from the chosen mean trait values from TRY. By that, we might be able to derive more reliable and accurate predictions in the context of weakly supervised learning by reducing noise in the training data.
Although weakly supervised learning approaches generally have shown to be an effective way of compensating a shortage of individually labeled data42,43, an image dataset including in-field trait measurements under natural conditions representing the global trait spectrum would be necessary for a conclusive validation. In our study, it even remains unclear to what extent the trait values actually refer to the individual plant shown in an image, particularly as the images sometimes show more than one individual plant and more than one species (Supplementary Fig. 7). This may hinder the model from predicting a trait value corresponding to the dominant species in the image (but might also partly resemble the community composition, see above). Although we attempted to compensate the lack of a dataset that enables a conclusive validation by eliminating possible biases concerning image settings (Fig. 4), growth forms (Fig. 4), phylogenetic autocorrelation (Supplementary Information 1, Supplementary Table 5), predictions based only on climate data (Supplementary Fig. 6), predictive performance across biomes (Supplementary Fig. 5), a training dataset subject to limited geographic or climatic coverage (Supplementary Figs. 3, 4) and effects of a specific set of training data (Supplementary Fig. 2), we cannot conclusively prove that the model predictions are based on causal relationships. Our model results suggest that the trait predictions reflect the feature space of natural trait expressions (Fig. 3), but an in-depth analysis of the image features the models learned for inferring the respective traits will be necessary to rule out any possible biases in future studies. An explicit analysis might involve investigating which plant organs are relevant for the trait predictions by means of feature attribution techniques and could ultimately provide clear evidence. This may not only enable to build trust in such artificial intelligence (AI) models, but also to generate new knowledge from them in order to deepen our understanding of plant morphology and trait covariance.
Nevertheless, this study can only be considered a pioneering work testing the feasibility of the approach, as application-ready models require a conclusive and explicit validation. A dataset enabling this has to incorporate image-trait pairs measured and photographed on the same individuals. One possible solution would be to generate a database of plant traits including respective photographs, which then can serve as a benchmark for future studies.
Source: Ecology - nature.com