A multilevel carbon and water footprint dataset of food commodities
With the aim of obtaining a useful tool for stakeholders to explore, assess and use the information related to CF and WF of food commodities, we implemented a multi-step methodological framework to create an easy to use CF and WF repository of food items, which can be expanded or modified for tailored requirements using a science based approach for each step of its creation (Fig. 1).The overall methodological procedure is made of 3 steps. Step 1 is related to CF and WF data collection from literature, eligibility check and harmonization, to create the base level of the database (level 1). Step 2 is about the creation of other three informative layers with higher level of data aggregation. These might be the data of direct interests for stakeholders of the food systems. A rigorous statistical approach is proposed to evaluate the quality of analysed data and criteria for the correct use of data, based on statistical evidence, are set and applied to the data. In Step 3 the complex set of statistical evaluations, done for each informative level, is summarized into an easy to use dataset reporting values of CF and WF of food items. Thanks to its multilevel approach, the database provides a flexible tool for different purposes and levels of expertise. Each step is based on transparent procedures that allow users to replicate, to implement and to modify each level of the database.The three steps are described in details in the following paragraphs.Step 1 – CF and WF data collection, harmonization and compilation of level 1 of SEL databaseThe first step was to review the published data of CF and WF of food commodities. We revised literature data published till January 2020 including peer-reviewed papers, conference proceedings, public reports or studies where methods of data collection and handling were described, and Environmental Product Declarations (EPDs).For the collection of CF data, a significant input came from the systematic review of Clune et al.11, who reviewed 369 published studies, covering the period 2000–2015, proving 168 varieties of fresh food products based on 1718 data entries. An additional source of studies reporting both CF and WF was the Double Pyramid database 2016 built on the previous version 201414 (BCFN2016 https://www.barillacfn.com/en/publications/double-pyramid-2016/), which reports 1202 CF values from 468 sources covering 240 food items and 309 WF values from 136 data sources covering 152 food items (reference period 1998–2016). Part of CF data of this latter dataset, up to year 2014, were already revised and included in the Clune et al.11 study. To avoid double counting from these two sources, data from both sources were checked for authorship, plus the CF reported data were compared and if in disagreement the original data were checked in the paper. Data reported in the Double Pyramid database 2016 but not present in Clune et al.11, mostly referring to processed food, were checked for eligibility applying the exclusion criteria reported in Table 2 and if considered eligible they were included in the present database.Table 2 Exclusion criteria to be applied to CF and WF data collected from literature to create SEL database level 1.Full size tableA new literature search was done to integrate data not covered by the previous reviews using three online bibliographic sources SCOPUS (https://www.scopus.com/home.uri), Google Scholar (https://scholar.google.com/) and the Google search engine (https://www.google.com/), which was concluded in January 2020. To search the bibliographic sources, we used the combinations of two sets of words. The first set referred to “impacts” and included the following words: carbon footprint, water footprint, virtual water, greenhouse gases, environmental impact, life cycle, LCA, LCI, EPD. The second set referred to “products” and included words like food, beverages, fish, shellfish, crops, vegetables, fruit, meat, eggs, dairy. EPDs were updated based on data reported on the International EPD’s System database (www.environdec.com). Added studies were evaluated for exclusion criteria (Table 2).The final list of data from single studies reported in the SEL database was distributed as follow: 3349 CF data, including 1397 data of fresh food commodities already reported in Clune et al.11, 803 CF data originally reported in Double Pyramid 2016 database, which were checked for eligibility and harmonized, and 701 CF data added with this study; 938 WF data, including 288 WF data originally reported in double pyramid 2016 and 650 WF data added with this study.All the CF and WF values extracted from the collected studies were assigned a group, a typology, a sub-typology when this applied, and an item name (Table 1) and were recorded on an excel sheet including the following additional information: type of bibliographic source, full reference, publication year, system boundary at distribution, country of production, region of production, relevant notes, presence of the same value in other data collections (i.e. Clune et al.11 or Double Pyramid 2016).After data collection, CF data where further analysed and handled for the harmonization of the system boundary following the approach as reported in Clune et al.11. The system boundary considered in the SEL database is the distribution centre to consumers located in the country of origin. It hence excludes post market phase like for example cooking. The system boundaries at distribution have a wide range of specifications in the published papers. We accepted regional distribution centre (RDC), international distribution centre (IDC), European distribution centre (EDC), country ports of final destination, warehouses, wholesalers, city markets, up to retailers. For the specific case of international transport, which includes also the emissions for shipping to regional distribution centres of the hosting country, rather than excluding the studies we have created a dedicated typology “imported”, which however includes very few studies. The imported commodity is indicated in the SEL database by a capital letter “I”.If CF values collected from literature referred to the system boundary “farm gate” or “slaughterhouse”, additional post farm gate GHG emissions were added as proposed by Clune et al.11. These additional emissions also included packaging if not reported in the publication. We adopted the median value for distribution to RDC (0,09 kg CO2/kg or kg CO2/L) and packaging (0,05 kg CO2/kg or kg CO2/L) used by Clune et al.11. Data referring to slaughterhouse emissions were also taken from the same publication.To address the share of WF for packaging and transportation to the market we analysed 256 EPD’s. No significant increase of WF in downstream stages associated to packaging and distribution was found. Thus we included in the analysis all system boundaries with the exception of ‘cooking’, human excretion and waste disposal.To transform CF values from carcass or live weight to bone free meat, ratios reported in in Clune et al.11 were used, while the ratio carcass weight to bone free meat for buffalo meat (1:0.684) was estimated from the studies of Gerber et al.15, Gurunathan et al.16, Li et al.17.The final version of CF and WF data, after data handling was recorded in a sheet where, in addition to the information mentioned above for each study, we also reported additional post farm gate emissions (transport T, slaughtering S, packaging P) or meat conversion factors (cf) when applying. This complete dataset represents the level 1 information sheet of the SEL database (Fig. 1).A change in 100-year global warming potential (GWP) factors provided by the International Panel on Climate Change reports AR3 (2001), AR4 (2007) and AR5 (2013) might have introduced additional variability in the studies of LCA on which CF data of level 1 are based. The extent of such variability is difficult to quantify as it depends on the relative weight of each GHG on the total CF of the item. However, the analysis of some item groups (tomato, rice, beef meat, chicken meat), used as sample test, did not show any clear trend of CF average reduction or increase over the years (1998–2020), suggesting that differences among production processes and conditions were the dominant source of CF variability.Step 2 – Creation of derived CF and WF datasets with higher aggregation level (2, 3 and 4)This step provides footprints of food commodities with a higher level of aggregation corresponding to food items, typologies and sub-typologies (Table 1), which might be of particular interest for different kinds of stakeholders. The item represents the higher detail of aggregated footprint data of a food commodity and it is often the most desirable information for food impact analysis and dietary assessments. We propose here a methodological framework to evaluate the uncertainty associated to data used to represent food items. The methodological framework will support the users in their choice of the optimal value to represent the food item on the basis of the available data present in the database. It also would easily allow for expansion and implementation of food item values.Level 2, SEL CF ITEM & SEL WF ITEM datasetsThese two datasets (CF and WF) report a comprehensive set of descriptive statistics for the list of food items present in the database. The population of data used to attribute a value and uncertainty to a food item is made of all the CF or WF values classified with that “item entry name” in the dataset of level 1 of SEL database.The item data population is described in level 2 by the following set of information.Size: number of studies used for the analysis of item population (n).Location and central-tendency measures: in terms of mean, median, first quartile (Q1) and third quartile (Q3), including also the minimum (Min) and maximum (Max) observed values.Variability measures: Standard Deviation (SD) Coefficient of Variation (CV) as absolute and relative dispersion indexes, the Interquartile Range (IQR) and the Median Absolute Deviation (MAD) as more robust indexes of variability.Shape measures: Skewness (SK), kurtosis (KU) indexes and Shapiro-Wilk normality test (SW test).The median of the item data population was chosen to assign a value of central tendency which represents the item. The median offers the advantage of not being influenced by the presence of outliers which misrepresent the value of the mean, making it a less meaningful measure. As such, the median represents the location estimator with the highest breakdown point (equal to 0.5) and with “the maximum proportion of observations that can be contaminated (i.e., set to infinity) without forcing the estimator to result in a “false” and not-representative value18,19. With these properties, the median also represents the most appropriate measure of central tendency to describe both positively and negatively skewed distributions20.To describe the uncertainty associated to the position value (median) we used descriptive statistic data relative to dispersion and shape of item data distribution. In particular, we used skewness and kurtosis indexes, which gave us information on the existence of symmetric or skewed distributions, as well as on their ‘peakedness’ measured as relative to the weights of the tails21, thus enabling us to evaluate (for each distribution) the importance of extreme values over the entire set of data and the related level of dispersion (platykurtic versus leptokurtic distributions). We completed the shape analysis by carrying out the Shapiro-Wilk test22,23 (4 ≤ n ≤ 2000).To define the uncertainty of the item value we created an assignment method based on a combination of the three quality flags (Fig. 2).Fig. 2Method for attribution of CF (or WF) value to a food item based on data quality flags. The scheme shows the procedure applied to evaluate the level of uncertainty associated to CF or WF value of a food item and how this information is used to decide the best value that should be used to represent the item. Three quality flags related to a statistical aspect of the data population are calculated to attribute the level of uncertainty. Each flag has different level of quality, red being the worst, green the best. Flags are then combined and expert judgement is used to associate a suggestion for data use to each flag combination. If the item median value is characterized by high uncertainty it poorly represents the item and caution is needed to use this data to represent the food commodity, the users is therefore redirected to a higher level of aggregation such as the sub-typology or the typology which includes the analysed item.Full size imageFlag 1, evaluation of the ‘size’ (n) of the “item data population”
Red if n More