A multilevel carbon and water footprint dataset of food commodities

With the aim of obtaining a useful tool for stakeholders to explore, assess and use the information related to CF and WF of food commodities, we implemented a multi-step methodological framework to create an easy to use CF and WF repository of food items, which can be expanded or modified for tailored requirements using a science based approach for each step of its creation (Fig. 1).

The overall methodological procedure is made of 3 steps. Step 1 is related to CF and WF data collection from literature, eligibility check and harmonization, to create the base level of the database (level 1). Step 2 is about the creation of other three informative layers with higher level of data aggregation. These might be the data of direct interests for stakeholders of the food systems. A rigorous statistical approach is proposed to evaluate the quality of analysed data and criteria for the correct use of data, based on statistical evidence, are set and applied to the data. In Step 3 the complex set of statistical evaluations, done for each informative level, is summarized into an easy to use dataset reporting values of CF and WF of food items. Thanks to its multilevel approach, the database provides a flexible tool for different purposes and levels of expertise. Each step is based on transparent procedures that allow users to replicate, to implement and to modify each level of the database.

The three steps are described in details in the following paragraphs.

Step 1 – CF and WF data collection, harmonization and compilation of level 1 of SEL database

The first step was to review the published data of CF and WF of food commodities. We revised literature data published till January 2020 including peer-reviewed papers, conference proceedings, public reports or studies where methods of data collection and handling were described, and Environmental Product Declarations (EPDs).

For the collection of CF data, a significant input came from the systematic review of Clune et al.¹¹, who reviewed 369 published studies, covering the period 2000–2015, proving 168 varieties of fresh food products based on 1718 data entries. An additional source of studies reporting both CF and WF was the Double Pyramid database 2016 built on the previous version 2014¹⁴ (BCFN2016 https://www.barillacfn.com/en/publications/double-pyramid-2016/), which reports 1202 CF values from 468 sources covering 240 food items and 309 WF values from 136 data sources covering 152 food items (reference period 1998–2016). Part of CF data of this latter dataset, up to year 2014, were already revised and included in the Clune et al.¹¹ study. To avoid double counting from these two sources, data from both sources were checked for authorship, plus the CF reported data were compared and if in disagreement the original data were checked in the paper. Data reported in the Double Pyramid database 2016 but not present in Clune et al.¹¹, mostly referring to processed food, were checked for eligibility applying the exclusion criteria reported in Table 2 and if considered eligible they were included in the present database.

Table 2 Exclusion criteria to be applied to CF and WF data collected from literature to create SEL database level 1.

Full size table

A new literature search was done to integrate data not covered by the previous reviews using three online bibliographic sources SCOPUS (https://www.scopus.com/home.uri), Google Scholar (https://scholar.google.com/) and the Google search engine (https://www.google.com/), which was concluded in January 2020. To search the bibliographic sources, we used the combinations of two sets of words. The first set referred to “impacts” and included the following words: carbon footprint, water footprint, virtual water, greenhouse gases, environmental impact, life cycle, LCA, LCI, EPD. The second set referred to “products” and included words like food, beverages, fish, shellfish, crops, vegetables, fruit, meat, eggs, dairy. EPDs were updated based on data reported on the International EPD’s System database (www.environdec.com). Added studies were evaluated for exclusion criteria (Table 2).

The final list of data from single studies reported in the SEL database was distributed as follow: 3349 CF data, including 1397 data of fresh food commodities already reported in Clune et al.¹¹, 803 CF data originally reported in Double Pyramid 2016 database, which were checked for eligibility and harmonized, and 701 CF data added with this study; 938 WF data, including 288 WF data originally reported in double pyramid 2016 and 650 WF data added with this study.

All the CF and WF values extracted from the collected studies were assigned a group, a typology, a sub-typology when this applied, and an item name (Table 1) and were recorded on an excel sheet including the following additional information: type of bibliographic source, full reference, publication year, system boundary at distribution, country of production, region of production, relevant notes, presence of the same value in other data collections (i.e. Clune et al.¹¹ or Double Pyramid 2016).

After data collection, CF data where further analysed and handled for the harmonization of the system boundary following the approach as reported in Clune et al.¹¹. The system boundary considered in the SEL database is the distribution centre to consumers located in the country of origin. It hence excludes post market phase like for example cooking. The system boundaries at distribution have a wide range of specifications in the published papers. We accepted regional distribution centre (RDC), international distribution centre (IDC), European distribution centre (EDC), country ports of final destination, warehouses, wholesalers, city markets, up to retailers. For the specific case of international transport, which includes also the emissions for shipping to regional distribution centres of the hosting country, rather than excluding the studies we have created a dedicated typology “imported”, which however includes very few studies. The imported commodity is indicated in the SEL database by a capital letter “I”.

If CF values collected from literature referred to the system boundary “farm gate” or “slaughterhouse”, additional post farm gate GHG emissions were added as proposed by Clune et al.¹¹. These additional emissions also included packaging if not reported in the publication. We adopted the median value for distribution to RDC (0,09 kg CO₂/kg or kg CO₂/L) and packaging (0,05 kg CO₂/kg or kg CO₂/L) used by Clune et al.¹¹. Data referring to slaughterhouse emissions were also taken from the same publication.

To address the share of WF for packaging and transportation to the market we analysed 256 EPD’s. No significant increase of WF in downstream stages associated to packaging and distribution was found. Thus we included in the analysis all system boundaries with the exception of ‘cooking’, human excretion and waste disposal.

To transform CF values from carcass or live weight to bone free meat, ratios reported in in Clune et al.¹¹ were used, while the ratio carcass weight to bone free meat for buffalo meat (1:0.684) was estimated from the studies of Gerber et al.¹⁵, Gurunathan et al.¹⁶, Li et al.¹⁷.

The final version of CF and WF data, after data handling was recorded in a sheet where, in addition to the information mentioned above for each study, we also reported additional post farm gate emissions (transport T, slaughtering S, packaging P) or meat conversion factors (cf) when applying. This complete dataset represents the level 1 information sheet of the SEL database (Fig. 1).

A change in 100-year global warming potential (GWP) factors provided by the International Panel on Climate Change reports AR3 (2001), AR4 (2007) and AR5 (2013) might have introduced additional variability in the studies of LCA on which CF data of level 1 are based. The extent of such variability is difficult to quantify as it depends on the relative weight of each GHG on the total CF of the item. However, the analysis of some item groups (tomato, rice, beef meat, chicken meat), used as sample test, did not show any clear trend of CF average reduction or increase over the years (1998–2020), suggesting that differences among production processes and conditions were the dominant source of CF variability.

Step 2 – Creation of derived CF and WF datasets with higher aggregation level (2, 3 and 4)

This step provides footprints of food commodities with a higher level of aggregation corresponding to food items, typologies and sub-typologies (Table 1), which might be of particular interest for different kinds of stakeholders. The item represents the higher detail of aggregated footprint data of a food commodity and it is often the most desirable information for food impact analysis and dietary assessments. We propose here a methodological framework to evaluate the uncertainty associated to data used to represent food items. The methodological framework will support the users in their choice of the optimal value to represent the food item on the basis of the available data present in the database. It also would easily allow for expansion and implementation of food item values.

Level 2, SEL CF ITEM & SEL WF ITEM datasets

These two datasets (CF and WF) report a comprehensive set of descriptive statistics for the list of food items present in the database. The population of data used to attribute a value and uncertainty to a food item is made of all the CF or WF values classified with that “item entry name” in the dataset of level 1 of SEL database.

The item data population is described in level 2 by the following set of information.

Size: number of studies used for the analysis of item population (n).

Location and central-tendency measures: in terms of mean, median, first quartile (Q1) and third quartile (Q3), including also the minimum (Min) and maximum (Max) observed values.

Variability measures: Standard Deviation (SD) Coefficient of Variation (CV) as absolute and relative dispersion indexes, the Interquartile Range (IQR) and the Median Absolute Deviation (MAD) as more robust indexes of variability.

Shape measures: Skewness (SK), kurtosis (KU) indexes and Shapiro-Wilk normality test (SW test).

The median of the item data population was chosen to assign a value of central tendency which represents the item. The median offers the advantage of not being influenced by the presence of outliers which misrepresent the value of the mean, making it a less meaningful measure. As such, the median represents the location estimator with the highest breakdown point (equal to 0.5) and with “the maximum proportion of observations that can be contaminated (i.e., set to infinity) without forcing the estimator to result in a “false” and not-representative value^18,19. With these properties, the median also represents the most appropriate measure of central tendency to describe both positively and negatively skewed distributions²⁰.

To describe the uncertainty associated to the position value (median) we used descriptive statistic data relative to dispersion and shape of item data distribution. In particular, we used skewness and kurtosis indexes, which gave us information on the existence of symmetric or skewed distributions, as well as on their ‘peakedness’ measured as relative to the weights of the tails²¹, thus enabling us to evaluate (for each distribution) the importance of extreme values over the entire set of data and the related level of dispersion (platykurtic versus leptokurtic distributions). We completed the shape analysis by carrying out the Shapiro-Wilk test^22,23 (4 ≤ n ≤ 2000).

To define the uncertainty of the item value we created an assignment method based on a combination of the three quality flags (Fig. 2).

Fig. 2

Method for attribution of CF (or WF) value to a food item based on data quality flags. The scheme shows the procedure applied to evaluate the level of uncertainty associated to CF or WF value of a food item and how this information is used to decide the best value that should be used to represent the item. Three quality flags related to a statistical aspect of the data population are calculated to attribute the level of uncertainty. Each flag has different level of quality, red being the worst, green the best. Flags are then combined and expert judgement is used to associate a suggestion for data use to each flag combination. If the item median value is characterized by high uncertainty it poorly represents the item and caution is needed to use this data to represent the food commodity, the users is therefore redirected to a higher level of aggregation such as the sub-typology or the typology which includes the analysed item.

Full size image

Flag 1, evaluation of the ‘size’ (n) of the “item data population”

Red if n < 4, as three is the minimum number of observations needed for distinguishing the median from the mean and for evaluating the approximation of the empirical distributions to known parametric distributions, in accordance with the minimum requirements specified by Royston²³.
Yellow if 4 < n ≤ 8, minimum level of observations needed for jointly evaluating kurtosis and skewness of a distribution²⁴.
Green if n > 8 number of observations required to identify cases in which location (as well as variability and shapes) measures can be properly computed and evaluated.
Green with asterisk (*): applies to WF values which are global or regional or national average estimates and green is assigned independently by “n”. In this case the flags 2 and 3 do not apply for the evaluation of the WF item value.

Flag 2, evaluation of outlier position of items characterized by RED or YELLOW Flag 1

This flag is used to test if items with population size n ≤8, are outliers for their respective typology population. If they are outliers they cannot be alternatively represented by the typology value, as they will be particularly higher or lower than the item data population representing the reference typology. An example of this in the SEL database is the CF value of the item “lobster” which has a yellow flag 1 (n = 5). The item lobster is an outlier for the typology “shellfish”. In this case even if there is a certain level of uncertainty in the “lobster” item value, it is not advisable to substitute this value with the typology value “shellfish”.

To attribute flag 2 output, the Tukey’s rule²⁵ was used. The outlier identification is based on the quartiles of data distribution, where the first quartile Q1 is the value ≥1/4 of the data, the second quartile Q2 or the median is the value ≥1/2 of the data, and the third quartile Q3 is the value ≥3/4 of the data. The interquartile range, IQR, is Q3 − Q1. The data used to estimate quartile values are the medians of the items composing the typology population.

Red if the median value of the analysed item (x), is an outlier, i.e. following the Tukey’s rule it is more than 1.5 times the interquartile range from the quartiles, either x < Q1 − 1.5 IQR, or x > Q3 + 1.5 IQR.
Green flag if the median value of the uncertain item analysed (x), is not an outlier, i.e. it is within 1.5 times the interquartile range from the quartiles, Q1 - 1.5 IQR < x < Q3 + 1.5 IQR.
NA, no flag: the Tukey rule was not applied because the items coincide with the typology, i.e. the typology is only made by this sole item for the time being.

Flag 3 adherence to the normal distribution

It evaluates the level of dispersion and clustering of the observed data points to the centre. To test the adherence of the item data population distribution to the normal distribution, the Shapiro-Wilk test was carried out. The three following colours were assigned:

Red: characterizing those items: i) whose size was lower than 4, thus preventing the evaluation of normal distribution approximation, as detailed above; ii) items for which we rejected the null hypothesis of bell-shaped distribution at the 1% level of significance (p-value < 0.01), therefore highlighting substantial asymmetric distribution and/or a heavy-tail distributions characterized by a level of clustering (low or high) not adequate to describe the findings with a synthetic measure, computed at the same item level.
Yellow: for those items whose empirical distribution, even if departing from the Normal distribution, lead us to reject the null hypothesis with a greater level of errors (0.01 ≤ p-value < 0.05).
Green: for those items whose empirical distribution lead us to not reject the null hypothesis, therefore confirming the validity of central tendency measures (at the item level) to be used for summary description.
NA no flag: the Shapiro-Wilk test could not be run due to an insufficient number of data in the population (n < 3).

The outputs of the three flags were considered together to evaluate the uncertainty related to the median CF or WF value of food items, and based on the level of uncertainty indications for an optimal data use were provided (Table 3).

Table 3 Flag output table.

Full size table

A brief rational of the data handling indication is as follow:

1.
item: the item statistics are sufficiently robust. The user can use the item median value to represent that food commodity.
2.
item or typology: although the population used to derive the median of the item is reduced in size and its distribution does not optimally fit a normal distribution, the median of the item is not an outlier for the typology population, i.e. the value of the item in not exceptionally high or low compared with other items present in the typology of reference. The user can use either the item median value or the typology median value to assign a footprint value to the chosen item.
3.
item matching with typology: the typology coincides with one single item, the two objects item and typology, represent hence the same food commodity. The user will find the same median value in the Item and Typology tables, the choice is hence univocal. In this case the level of uncertainty can be estimated from flag 1 and flag 3 because flag 2 cannot be calculated (n.a.).
4.
typology better than item: different statistical combinations could lead to this option. The uncertainty associated to the item value is sufficiently high to prefer the typology value to represent the food commodity although the item value is not to be discarded.
5.
typology (or sub-typology): the high uncertainty suggests precaution in using the item value to represent the food commodity and alternatively a higher level of aggregation for this food commodity can be used. When the indication suggests both typology or sub-typology is because there is no statistical difference between the two values and the user can choose which one to use.

Level 3, SEL CF Typologies & SEL WF Typologies

In this informative level descriptive statistics is reported for CF and WF data of food typologies. The CF or WF data population of “typologies” is composed by the CF or WF median values of each item that is included in the typology. The value ‘n’ hence represents the number of items in the typology. The same statistical parameters reported for the item relative to size, location and central-tendency measures and variability measures are reported for typologies. The flag approach is not used to test typologies because independently from the uncertainty there is no other meaningful footprint data at higher hierarchical level that can be attributed to a food commodity without losing its specificity. The users can choose to create different typologies from the ones proposed using CF and WF data provided at level 2 (items).

Level 4, SEL CF sub-Typologies & SEL WF sub-Typologies

Sub-typologies represent a subgroup of typologies, which have been used when the typology refers to a wide range of food items which could have very different CF and WF values on the basis of some commodity characteristic.

An example is represented by fresh crop products where the yield per hectare strongly influences the CF value (Fig. 3).

Fig. 3

CF value of vegetables vs. their yield. Carbon footprint value of food items included in the typology “vegetables outdoor” is plotted versus their average yield value as reported in FAOSTAT (data EU-28, year 2017).

Full size image

In the SEL database only 3 typologies have been further divided into sub-typologies, as they include very different items in terms of their potential LCA outputs. These are vegetable outdoors, fruit outdoor, and shellfish. The population of data used to evaluate the descriptive statistic for each sub-typology is composed by the median of each item that is included in the sub-typology. Additional statistical information in this informative level 4 are the output of the Kruskal-Wallis ANOVA test on ranks²⁶ (all pairwise multiple comparison procedures based on Dunn’s Method²⁷) to determine if the median values of the sub-typologies within one typology were significantly different from each other, while the Mann-Whitney Rank Sum Test²⁸ was used to determine if a sub-typology footprint value was significantly different from its reference typology data population.

Step 3 – Creation of summary sheet for easy and quick consultation of CF and WF values of food commodities (SEL CF or WF data for users)

The two summary sheets (CF and WF data) can be considered the most interesting and innovative output of the database, as they translate the complex series of data reported in the 4 informative levels of the database into a list of footprint values of food based on statistical robust analysis. Footprint values of the food items are represented with additional information about value robustness and, where the uncertainty is high, alternative values with higher levels of aggregation are proposed. The summary sheets are meant to provide a scientifically robust and easy to use tool for experts and not experts who want to analyse the impact of food commodities and dietary plans. The users are free to accept the expert-based suggestions or to make their own considerations.

Data summary

At present, the SEL database contains 3349 carbon footprint values extrapolated from 841 publications (1998–2019) and 937 water footprint values extrapolated from 88 publications (2005–2018). The CF data are summarized into a total of 85 typologies, 11 sub-typologies, 323 items. WF data are summarized into a total of 72 typologies, 9 sub-typologies, 320 items. A detailed breakdown of the CF and WF into the four food commodity groups, agricultural processed, animal husbandry, crop and fishing is reported in Table 4.

Table 4 Number of CF and WF data of food commodities reported in the database informative levels as items (level 2), typologies (level 3) and sub-typologies (level 4).

Full size table

In terms of geographical distribution, the source data of CF have a Eurocentric prevalence while the WF data are more evenly distributed among America, Asia and Europe, the relative contribution depending on the commodity group (Table 5).

Table 5 Geographical distribution of CF and WF data sources as reported in level 1 of the database.

Full size table

Potential applications of the database

The SEL database was created based on the necessity to estimate the CF and WF values of food recipes for meals served in canteens during a set of experiments run in the framework of the EU SU-EATABLE LIFE project aiming at engaging citizens on healthy and sustainable diets to reduce greenhouse gas emissions and water use in EU. During the experiments, researchers and canteen managers were faced with the problem of lacking a quick tool to calculate the food-related environmental impacts based on reliable and science-based estimates. The SEL database was created with this purpose and the summary tables have been used for quick decision making of sustainable recipes and data management. Following this experience, the SEL database would give a significant contribution to bridge the gap between the scientific knowledge provided by the scientific community and the actors of the food system. This historical moment sees a growing attention of the food providers and food consumers to improve the sustainability of food systems. The complexity and challenges of the climate mitigation policies, aiming to reduce greenhouse gas emissions, may find in dietary shift to more sustainable patterns an additional useful element to achieve the policy targets set out by the Paris agreement.

Many data related to the impact of food commodities are available as public or private repositories or publications, ranging from complex scientific studies^3,6,11 to online anonymous data. The complexity to extrapolate food commodity data from scientific publications and the necessity to evaluate robustness and scientific value of the data available to users requires from the scientific community and extra effort to provide both meaningful data and a scientific methodology for stakeholders from technical, scientific and policy sectors to easily implement and expand the database with a verified and scientifically sound approach. With its different layers of information, the SEL database could hence be a valid tool for to support caterers, chefs, restaurants, nutritionists, municipalities, policy makers, to analyse different management options related to food and dietary planning.

Database flexibility and directions for database improvement

The SEL database is not an exhaustive collection of CF and WF data for all the possible food items, although the bibliographic search covers a significant part of accessible data sources. The way the database is structured offers the possibility for further implementation actions:

1)
Missing items can be included following the methodological framework proposed.
2)
Items values and uncertainty can be implemented by adding new studies to level 1 and re-assessing uncertainty.
3)
Level 1 data could be used for ex novo evaluations.
4)
Technical and scientific users might like to make different assumption to further evaluate the uncertainty of specific items, in this case the clear and transparent description of the steps done to assign uncertainty labels allows to easily accept or reject any of the assumptions made in this paper and to implement the overall scheme making new attributions and decisions about the use of the items.
5)
The role of geographic data distribution could be evaluated selecting and re-analysing level 1 data by regional groups, which might be relevant for indexes like the WF which are driven by climatic factors. The uncertainty analysis might allow to establish if CF and WF data attributed to an item are still sufficiently statistically robust to extract scientifically sound information. Regionalization of WF values might be relevant when combined with water scarcity studies for specific geographic areas of production.
6)
The database flexibility allows to introduce new indexes or footprints by adding a level 1 collection of published data and including the relative levels 2, 3 and 4 calculated using the statistic recommendations provided in this paper. The same aggregation criteria proposed for the CF and WF might be used (groups, typologies, sub-typologies, items) for combined analyses of more indexes.

Source: Ecology - nature.com

A multilevel carbon and water footprint dataset of food commodities

Step 1 – CF and WF data collection, harmonization and compilation of level 1 of SEL database

Step 2 – Creation of derived CF and WF datasets with higher aggregation level (2, 3 and 4)

Level 2, SEL CF ITEM & SEL WF ITEM datasets

Flag 1, evaluation of the ‘size’ (n) of the “item data population”

Flag 2, evaluation of outlier position of items characterized by RED or YELLOW Flag 1

Flag 3 adherence to the normal distribution

Level 3, SEL CF Typologies & SEL WF Typologies

Level 4, SEL CF sub-Typologies & SEL WF sub-Typologies

Step 3 – Creation of summary sheet for easy and quick consultation of CF and WF values of food commodities (SEL CF or WF data for users)

Data summary

Potential applications of the database

Database flexibility and directions for database improvement

How trees and forests reduce risks from climate change

Ekotrope makes building energy-efficient homes easier

ITALIAN LANGUAGE

ENGLISH LANGUAGE