MesopTroph, a database of trophic parameters to study interactions in mesopelagic food webs
Data sourcesData for the trophic parameters and data categories listed in Tables 1 and 2 were gathered from peer-reviewed scientific publications, grey literature (e.g., agency reports, theses, and dissertations) and unpublished data by the authors of this paper. Data compilation on stomach contents, stable isotopes, FATM, and trophic positions, focussed on mesopelagic organisms, their potential prey and predators. For major and trace elements, energy density and estimates of diet proportions, our search concentrated on mesopelagic taxa. Nevertheless, we also gathered information from small or intermediate-sized epi-, bathy- or benthopelagic species found in the compiled data sources. These species were included because they play key roles in most marine ecosystems, both as important consumers of phytoplankton and zooplankton, and prey for many top predators, and can represent alternative energy pathways to mesopelagic organisms. However, we stress that the data coverage for these species in the current version of the database is very incomplete. Our main interest was on data from the central and eastern North Atlantic, and the Mediterranean, corresponding to the study regions of the SUMMER project. When we could not find suitable data within this region, we extended the geographic scope of our literature search to the western North Atlantic. We did not search for datasets in open access repositories since those data can be easily accessed and extracted. However, some of the data provided by the authors of this paper have been previously deposited in PANGAEA.DNA sequencing-based methods, such as metabarcoding and direct shotgun sequencing, are emerging as promising tools in dietary analyses due to the high resolution in taxonomic identification of many prey simultaneously, and the potential to provide quantitative diet estimates from relative read abundance29. However, recent studies have shown that various methodological and biological factors can break the correlation between the number and abundance of ingested prey and the prey DNA present in the sample, and lead to biased estimates of taxonomic diversity and composition of diet29,30. Given the uncertainties remaining in the interpretation of DNA sequencing-based diet data, we decided not to include these data in MesopTroph until additional research demonstrates that these techniques can be confidently applied for quantitative diet assessment.We identified available data sources in the literature through systematic searches on Web of Science, Google Scholar, ResearchGate, and the Google search engine. We used multiple combinations of terms related to specific data categories (Table 3), in conjunction with the common or scientific taxon names (from genus to order), and the ocean basin. For example, the search for stomach content data of fishes belonging to the family Myctophidae was undertaken using the following terms: “stomach content” OR “gut content” OR “prey composition” OR “diet composition”, AND “mesopelagic fish” OR “myctophid” OR “Myctophiformes” OR “Myctophidae”, AND “Atlantic” or “Mediterranean”. For the mesopelagic and predator species known to be numerically abundant in the SUMMER study regions, we performed a second literature search using the common or scientific name of the species, along with the terms “diet”, “feeding habits”, “trophic ecology”, “trophic markers”, or “food web”. We also examined the literature cited within each collected publication to locate additional data sources.Table 3 Terms used in the literature search for each data category.Full size tableWe next screened the full text of the compiled studies and retained data sources that: (1) were collected within the region of interest, (2) reported quantitative data for the trophic parameters of interest, (3) reported the number of samples for pooled or aggregated data, and (4) provided sufficient details on the methodology to enable a quality check. In the case of stable isotope data, we only included data sources reporting both δ13C and δ15N measurements.Data extraction, cleaning, and formattingWe created a template table for each data category in Microsoft Excel to assemble all datasets into a single file, and to facilitate cleaning and standardization of data records. We added a large number of metadata fields to the tables to annotate details about the sampling (e.g., location, date, methods), sampled specimen(s) (e.g., taxonomy, number and size of individuals, number of replicates, tissue analysed), and data source (e.g., full reference, DOI) for every record.Data contributors formatted and incorporated their datasets directly into the tables. For published sources, the data and associated metadata were extracted manually or digitized from the article text, tables, or supplementary material into the tables. Extraneous or hidden characters, and values such as “NA” (Not Available) or “ND” (Not Determined), were deleted from the parameter and metadata fields. Measurements of trophic parameters were standardized to the same units (see Tables 1 and 2). Parameter values that were clearly incorrect (e.g., δ15N > 20, or the frequency of occurrence of a prey higher than the number of stomachs sampled) were corrected by searching for the value within the data source. When values could not be corrected, we deleted that data record.When available, we extracted information at the individual level. However, most studies reported data obtained from pooled samples of the same species. In some cases (e.g., small specimens such as planktonic organisms), a minimum and maximum number of individuals in the sample was provided instead of the actual number of individuals sampled. We added two columns to the tables presenting the minimum and maximum number of individuals in the sample. By filtering the column “Ind No (maximum per sample)” for values >1, users can easily identify records with aggregated data and differentiate them from records where information was drawn from a single individual (i.e., where “Ind No (maximum per sample)” =1). In addition, the tables Stomach contents and Estimates of diet proportions include a field “Sample ID” with a unique identifier of the sample. If data are reported at the individual level (i.e., “Ind No (maximum per sample)” =1) then Sample ID is the individual animal ID. If the data are from a group of individuals (i.e., “Ind No (maximum per sample)” >1), then Sample ID identifies that group.We standardized the taxonomic classification and nomenclature of fishes and elasmobranchs following the Eschmeyer’s Catalog of Fishes (http://researcharchive.calacademy.org/research/ichthyology/catalog/fishcatmain.asp)31,32. For the remaining taxa, we used the World Register of Marine Species (http://www.marinespecies.org/)33. Unaccepted or alternate taxon names were replaced by the most up-to-date valid name. When the identification of a taxon was uncertain, the taxonomic level of identification was decreased to a satisfactory level. For example, prey reported as “Cephalopods” were changed to “Cephalopoda”, “Sepiolids” to “Sepiolidae”, and “Myctophum punctatum?” to the genus “Myctophum”.Stomach contentsStomach contents analysis is a standard dietary assessment method that potentially enables quantifying diet components with high taxonomic resolution34. Three parameters are typically used to describe diet composition from stomach contents: the number of individuals of a prey type as a proportion of the total number of prey items (%N), the proportion of a prey item by weight or volume (%W), and the proportion of stomachs containing a particular prey item (i.e., percent frequency of occurrence, %F)35. When available, we collected data on the three parameters, as well as on the absolute number, weight, and frequency of occurrence of each prey type in the stomachs of each sampled individual or group of individuals. If stated in the data source, we indicate if prey weights were directly measured or reconstructed from hard remains (fish otoliths and vertebrae, cephalopod beaks), and if they represent dry or wet weight. Some datasets contained records of prey items without corresponding weights or numbers. As a result, the cumulative percent of all prey items did not sum to 100%. This occurred in 11 data records for the cumulative %W, and nine for the cumulative %N. While we checked the accuracy of percentage values and adjusted rounding errors, we did not attempt to fill in missing values nor did we remove records with missing values. When prey values were reported by an upper bound (e.g., “ More