Site description
We define “kelp forest” as rocky-reef habitat within the 5–20 m depth range that supports dense stands of giant kelp, Macrocystis pyrifera. For this study, we considered the Santa Barbara Channel (SBC) to include the mainland region between Point Conception (−120.476° longitude, 34.455° latitude) and Point Mugu (−119.065° longitude, 34.079° latitude), as well the northern and southern sides of the four northern Channel Islands (Fig. 1). Although the SBC is a subset of the Southern California Bight, its strong west-east gradient in cold to warm temperature means the study system includes many of the kelp-forest species throughout California31. This means the SBC kelp-forest food web is a large “metaweb”, characterizing kelp forest meta-communities, rather than a site-specific web. In other words, the system includes cold water and warm water species that might not necessarily co-occur at a single site. However, there are site-specific food webs embedded in the metaweb at particular locations where a subset of species occur.
Data sources
Our goal was to assemble the food web using both published and novel empirical observations. To this end, we first used published data sets and species’ range boundaries to create free-living species lists. The initial list of fishes, algae, and free-living invertebrates was assembled from the Channel Islands National Park Kelp Forest Monitoring program (CINP KFM, annual reports available at https://irma.nps.gov/DataStore/SavedSearch/Profile/1508, accessed March 6, 2017, or visit https://www.nps.gov/im/medn/kelp-forest-communities.htm to contact David Kushner or Joshua Sprague) and the SBC Long Term Ecological Research program’s ongoing kelp-forest community timeseries (SBC LTER, https://sbclter.msi.ucsb.edu/data/catalog/, accessed March 12, 2017). We added to these lists using primary literature, technical reports (e.g., NOAA, USFW), direct observations, expert opinion, crowd-sourced observations (e.g., eBird.org), guidebooks, and grey literature. We sampled the local kelp forest zooplankton and the algae-associated small-invertebrate community, because these organisms were not well represented in surveys (see below).
We created initial lists of parasite species using published literature and host-parasite databases. A systematic review was conducted to collect parasite records for each free-living species. We searched the Natural History Museum (NHM) of London host-parasite database (https://www.nhm.ac.uk/research-curation/scientific-resources/taxonomy-systematics/host-parasites/database/search.jsp), the FishPest database32, WoRMs (http://www.marinespecies.org/aphia.php?p = search), BIOSIS citation index (http://webofscience.com), and Google Scholar™(https://scholar.google.com/) (Genus + species + parasit*, expanded to Genus + parasit* if no records were found). For each host species, we recorded the number of records found in BIOSIS and NHM as an estimate of study effort. Although parasites are often reported at the host and parasite species level, we were often able to infer parasite and host life stages based on knowledge about life cycles. We added to these lists by sampling local fish and invertebrates, with a focus on hosts that were common in the system and not well-studied (see below). As for any food-web study, we were most interested in including common or important parasites, rather than rarities.
Published diet observations (including in grey literature), direct observations, and inference were used to determine trophic links (see below).
Free-living species sampling methods
Certain groups of free-living species were under-represented in published survey data, so we conducted sampling to assess species diversity in the following areas.
Zooplankton tows
We conducted vertical zooplankton tows within kelp forests at two island locations (on the same date) and two mainland locations (repeated tows, four dates at one site, three of those dates at a second site, including one nighttime sampling date), for eight site by date samples30. While the vessel was at anchor within a kelp forest, a 30 cm diameter, 200 micron plankton net was dropped to the bottom and pulled to the surface at a rate of 0.33 m per second. Care was taken not to scrape the net against kelp plants. The collection jar attached to the net was kept vertical with a small lead weight to ensure that the net did not collect organisms on the way down to the bottom. The depth and time of collection were recorded30. We held collection jars on ice while in the field, then preserved specimens in 95% ethanol when we returned to the lab (within a few hours of collection). All organisms were counted and identified to species when possible, but some groups were identified to Order or Family, and then cross-checked with lists of known local species. If this was not possible, specimens were assigned to morphospecies, indicating they appeared to be a unique species based on morphology. Representative specimens from each species or morphospecies were photographed and measured.
Giant kelp holdfasts
Giant kelp holdfasts were sampled for free-living invertebrates. In the field, holdfast circumference and two slant height measures were taken, as well as basal stipe circumference. A subsample of approximately 25% of the holdfast was collected by SCUBA in a large plastic zip bag, and frozen until processing (n = 7). The samples were processed for organisms > 500 microns, and holdfast tissue was weighed after organisms and debris were removed. Organisms were counted, identified to species or morphospecies when possible, and measured30. Some groups were identified to Family, and then matched to lists of known local species.
Taxon-specific methods: gastropods
Small gastropods are a diverse but overlooked group that lives in benthic turf algae. Algal clumps were collected haphazardly by either laying down a 7 × 7 cm quadrat and collecting all algae within the quadrat, or by collecting clumps of a particular alga and weighing at the lab. All gastropods were removed by hand under a stereomicroscope, counted, identified to species or morphospecies, measured, and photographed30.
Parasitological collections
We collected fish and invertebrates and dissected them for parasites, with the goal of identifying the most common parasites in the food web. We targeted host groups that are known to transmit trophically-transmitted parasites in other systems. We collected most organisms from mainland sites, and sampled opportunistically at sites on Anacapa, Santa Cruz, and Santa Rosa islands30 (Fig. 2). A list of all species dissected and sample sizes is provided30.
Fish collections
We prioritized collecting the most common and abundant fish species based on survey data from 2000–2014 (SBC LTER), as well as personal observation, expert opinion, and amount of parasite data in the literature. Other species (lower abundance or higher past study effort) were collected opportunistically. Fish were collected primarily by spear on SCUBA. Specific size classes were not targeted and the spear tips used were appropriate for the focal species. Small benthic fish were collected using dip nets. All fish were collected under UCSB IACUC protocol 549.2. Fish were either stored on ice and processed within 24 hours of collection or frozen until processing.
Invertebrate collections
Invertebrates are necessary intermediate hosts in many parasite life cycles, but relatively few parasite life cycles have been described in marine environments. We targeted invertebrate species that were abundant and potentially important as intermediate hosts for parasites. We did not collect sessile colonial taxa, such as hydroids, gorgonians, sponges, and tunicates, as they were not expected to be hosts for trophically transmitted parasites (but these hosts do merit further study). Most sampled invertebrates were gastropods and small crustaceans, as they host trophically-transmitted parasites in other food webs. Bivalves, large crustaceans, echinoderms, and polychaetes were also dissected. Large invertebrates were collected by hand or using a rock chisel and scraper when appropriate. Small invertebrates were sampled by collecting benthic substrates in plastic or fine mesh bags and removing organisms in the lab. Invertebrates were held live in flow-through seawater until the time of dissection or frozen until processing.
Parasitological assessment
For each host dissection, the exterior and all internal soft tissues were examined for parasite life stages. For larger species, entire host organs were usually searched by pressing soft tissues thin between two glass plates (“squashed”) and examining with a stereomicroscope. However, to increase sample size, bilaterally symmetric organs (e.g. gills) were examined from one randomly determined side, and large organs (e.g. muscle, liver) were subsampled in larger fishes. Small crustaceans and soft-bodied invertebrates were squashed whole. We identified gut contents where feasible to improve host diet data and inform parasite life cycles. We recorded host mass, length (or other species-appropriate measurement), collection method, and host condition at time of dissection (e.g. frozen, fresh). We counted and identified all parasites to the lowest possible taxonomic level and assigned a morphospecies code when species-level identification was not possible. Only a few putative parasites were excluded from additional analysis because they had no identifying features. Dissection data30 includes species not included in the full food web (see below for discussion of justifications for node inclusion).
Node list assembly
Nodes in the web included free-living species that used the water column and benthic zones within kelp forests as feeding habitat (including transient kelp-forest visitors but excluding rare and vagrant species) and parasites of those free-living species. Species was the preferred taxonomic unit, and life stages were included as separate nodes if that life stage was present in the system and had distinct trophic interactions from the adult stage. The fully-resolved free-living food web was constructed with life stage (e.g., larva, adult) nested within species (or morpho-species) (excepting benthic diatoms, planktonic diatoms, dinoflagellates, foraminifera, free-living nematodes, bacteria, free-living ciliates, copepod nauplii, filamentous algae, and invertebrate eggs, which were aggregate nodes). As various forms of detritus are important to energy flow in kelp forests, detritus was broken into four categories based on the typical feeding modes of detritivores and main sources of detritus: carrion, drift macroalgae, small mixed origin (such as would be consumed by a deposit or suspension feeder, with the recognition that this alone is a complex system deserving further resolution) and dissolved organic material. The “drift macroalgae” component was especially important to distinguish, as certain herbivores (sea urchins) are known to prefer drift algae as food but will turn to feeding on live algae when drift algae are sparse. This is a very distinct type of interaction from suspension feeders, which consume small particles of detritus that may be largely bacteria. “Parasites” are consumers which fit the seven types of parasitism defined by Lafferty and Kuris33. Commensal organisms were also recorded. We limited the parasite species list to metazoan species that use kelp-forest species as hosts for at least one stage in their life cycle. Bacterial, viral, fungal, and protozoan pathogens that are important in kelp-forest food webs merit inclusion in further work.
We assigned each node a justification code (see below), confidence level, literature reference, and locality of the reference. Additional node metadata includes site on host (ecto-vs. endoparasite), taxonomic information, and life cycle information30 (see below). The node list contains columns with a species ID, and a species-by-stage ID. To work with the life-stage resolution, select the species-by-stage ID as the node identifier in analyses. To work with the species version, select the species ID as the node identifier in analyses. This will collapse all of the interactions to the species, so all of the trophic interactions are preserved and linked to the species node. Network analysis packages in R (such as Cheddar34) will automatically remove duplicate links if they are generated in this process.
Life stages as nodes
Species were partitioned into life-stage nodes (e.g., larva, juvenile, adult) if a species changed its trophic position from one stage to the other and multiple stages were present in the system. Whether or not a distinct life stage resided in the kelp forest was indicated by various data sources (e.g. dissections, published records), or inferred from species life history or trophic interactions. For example, amphipods brood offspring and have crawl-away juveniles. These juveniles remain in the kelp forest (rather than having a pelagic phase), and due to their small size are subject to different predators than adults (e.g. adults are eaten by fishes, while juveniles are eaten by hydroids). This was justification for juvenile amphipods being a distinct node from adult amphipods. On the other hand, many species have planktonic larvae that develop outside of the kelp forest, so only the adult stages were included at the species level. Larval stages of parasites were included if there was no feasible alternative for the focal host to become infected. We assumed that kelp-forest resident hosts became infected through life-cycle stages found within the kelp-forest food web, but that transient hosts could have acquired some parasites outside the kelp forest (e.g., if intermediate hosts were not known from the kelp forest). Likewise, presence of larval parasites in dissections was evidence for including adult stages. For some species, there was insufficient data on life history to infer additional stages. Metadata in the node list indicates whether parasites have additional life stages inside the kelp forest, outside, or unknown. When comparing this food web with others (which rarely separate species into life stages), using our data it is easy to collapse life-stage nodes into species nodes.
Justifications for node inclusion
We used multiple lines of evidence to justify whether or not to include a node in the food web. Free-living species were included if they were known from the SBC (see site description above) and were indicated by the data sources described above (e.g. reports, surveys, published papers, guidebooks, expert opinion, etc.). Species lists from regional guidebooks included non-kelp-forest species, so these lists were compared with species lists from long-term monitoring surveys. Following the methods of Lafferty et al. 2006, we excluded most rare species (<1% frequency of detection in surveys, or those described as “rare” qualitatively). However, we included species that seemed rare because they were cryptic or not looked for, if the species ecological role exceeded abundance (top predators), or if the presence of a final host was inferable based on presence of parasites that require it to complete their life cycle. For instance, a cryptic fish species listed in a guidebook may appear rare in monitoring program surveys, but inclusion might be warranted based on personal observations. For top predators, larval parasites in prey species were evidence for the presence of final-host species (e.g. finding shark tapeworm larvae in a fish indicates a shark is likely present in the system). We also included a few locally extinct or rare species of special conservation or fisheries interest that had a larger historical role (e.g. the sea otter, Enhyrdra lutris)35 or potential expanded role with global warming. These species are indicated in the node list so they can be excluded or included based on research questions. The justifications for including a node in the food web were included as metadata, as well as the localities of the species observation and references, and then used to determine a categorical confidence score.
Parasites are not as well studied as free-living species, so we used parasite-host records from San Luis Obispo, California to Punta San Hipolito, Baja California, Mexico, corresponding to the dominant biotic province of the SBC. We excluded parasites from outside this range or those known to have freshwater life cycles, as well as ectoparasites of birds. We made exceptions for parasites with additional evidence of presence (such as a larval stage found locally, or a local occurrence in another host species), and for those with transient and wide-ranging hosts30. For example, if an adult trematode was observed in pelicans in Florida, but larval stages of this worm had been observed in the Carpinteria (CA) Salt Marsh adjacent to our system, the worm was included. We extended the northern range of acceptable parasite records to San Francisco Bay, California for hosts that were known to migrate between northern and southern California regularly (several species of elasmobranchs, birds, and mammals). This also helped account for the relatively low study effort for these hosts in southern California.
Assignment of node confidence
Depending on the evidence for including a node, we rated confidence from 1–4, with 1 being the most confident. Nodes that were observed by monitoring surveys or this study were assigned a confidence value of 1 (61.2% of free-living nodes, 35.5% of parasite nodes). Nodes that were known from the SBC through other sources (e.g. guide books, published literature), but that were not reported in surveys were included with a confidence value of 2 (28% of free-living nodes, 37.7% of parasite nodes). For example, gammarid amphipods were not monitored at the species level in monitoring surveys, but other studies in the region provide lists of local species. Species known from the broader Southern California Bight and with reported ranges north to Point Conception or beyond were included with a confidence value of 3 if they were from a taxonomic group that may not have been sampled effectively by methods utilized in the SBC (6.7% of free-living nodes, 14.4% of parasite nodes). This included several sponge species that were not monitored at the species level by monitoring programs. Transient species indicated by expert opinion and crowd-sourced observations, as well as some life stages that were inferred to be present (e.g. juvenile gammarid amphipod species) were also assigned confidence values of 3. Some parasite life stages that were inferred to be present, but were observed north of Point Conception or outside the greater southern California region were included with a confidence value of 4 (4% of free-living nodes, 12.4% of parasite nodes). We also assigned a confidence level of 4 to parasite nodes whose presence in the kelp forest was less certain due to host transience (large mobile predators that forage across multiple different habitats, not exclusive to kelp forests). Parasites are sometimes mis-identified in published records, so, to avoid false positives, we excluded some parasites on the basis of questionable identifications. These were typically parasites that were only known from one host specimen in one local study but were known from an entirely different group of host organisms in a distant locality. Readers can use confidence scores to filter their own node list.
Additional node metadata
Additional metadata for each node includes species functional group (e.g. predator, herbivore, detritivore, omnivore, autotroph, filter-feeder, ectoparasite, etc.), taxonomic information (phylum, class, order, family), habitat association (e.g. holdfast, water column, rock surface, host), small-scale habitat association (e.g. rock, water-column, macroalgae, etc.), geographic range, thermal association, consumer trophic type (Table 1), and consumer strategy (e.g. autotroph, omnivore, detritivore, filter-feeder, carnivore)30.
Link assignment
Because links in previously published kelp-forest food webs contained errors, we constructed links from scratch using primary sources where possible. Given N nodes in the node list, there are N2 potential trophic links (including cannibalism). Many of these potential feeding interactions are easy to exclude based on logic (e.g., giant kelp doesn’t eat animals) and species life history. For example, a subset of free-living species are possible hosts for each life stage and taxonomic group of parasites (e.g. adult tapeworms in the order Trypanorhyncha only infect elasmobranchs). Parasite-host records in the literature are incomplete lists, so we inferred additional links using species life histories and logic. Parasites can also be killed by free-living species when their hosts are eaten (concomitant predation). We used free-living trophic interactions to infer these feeding links between free-living consumer and parasite. Where possible, this food web reports links at the stage level, but these links could be aggregated to the species level, or even the group level for comparison with other food webs. Each link was assigned a literature reference, locality of the observation, justification code, and confidence level30.
Justifications for link inclusion
Links were assigned using several data sources and logic. A systematic literature review was conducted in Google Scholar™ to collect diet records for each free-living species (including synonyms) using standardized search terms (“Genus species” [diet* OR feed* OR prey]). If these search terms did not yield results, the search was expanded to records of the species (“Genus species”). We also used direct observations from gut contents. In many cases, diet information was not available at the species level, creating the possibility of false negative links (e.g., failing to report a diet item due to lack of direct observation). To reduce the probability of false negative links, the search was expanded to the next higher taxonomic level where information was available, under the assumption that diets are often taxonomically conserved. Such links were inferred by assessing both the compatibility of the interaction (e.g., body size ratios, diet generality), as well as the probability of encounter between the species. For example, if two species were known to encounter each other through shared habitat and behaviors, and general feeding habits of the consumer were compatible with the resource species, a link was inferred. Certain trophic links may only be present seasonally or may vary through time. Temporal data sets provided by the SBC LTER and CINP KFM programs provide abundances over time for many key species in the food web. These data sets could be used to assess temporal stability of links in future studies.
Parasite presence was also used to infer links between free-living consumers and resources when life cycles of parasites were known. The presence of a trophically transmitted parasite in a host indicates that the intermediate host of the parasite was ingested by that host, so a link between those two hosts would be inferred. For some understudied species, expert opinion was used to inform trophic links, so these experts are cited in the links list as the reference for that link. Many experts have unpublished data or observations on feeding interactions and parasite-host interactions, and so were a valuable source of information. We report the strongest justification type for each link in the food web as a justification code, along with all relevant references. For example, if we observed a link directly that was also reported by literature studies, we indicate we used direct observation to justify the link. However, the references for that link indicate it was observed directly and also list the relevant literature, so the reader would know that it was both indicated in the literature and observed directly. For inferences, we list all references that provide the logical basis for an inference (e.g. descriptions of foraging behavior, diet of related species). Justification codes and the numbers of references should not be used as a measure of link weight (e.g. diet proportion), as these often relate more to study effort on the species rather than the importance of the links.
Links between parasites and hosts were assigned using several data sources, as in the free-living web. Direct observations of parasite-host interactions through our sampling or published studies (as detected through systematic review, see “Data Sources” section above) were assigned. However, direct observation of all possible interactions was unfeasible and sampling effort varied among hosts, so parasite-host interactions are often under-sampled. To account for this, links between parasites and hosts were added in stages using the free-living web, host life history, and parasite life history. First, parasite life cycles were inferred based off of known hosts and host trophic interactions. Trophic interactions among free-living species were then used to infer either transmission of parasites to additional hosts or concomitant predation (predator-parasite links) if parasites were not ingested by suitable hosts. Each link is identified by a code that indicates whether it was observed directly (and the source), or whether it was inferred (and the method of inference, described below). Users of the food web can choose to filter links by link justification to suit their needs.
Life cycle inference
We used several data sources and considered parasite life histories to assign links with likely hosts. If the life cycle was known for the parasite in another system, we inferred links with analogous hosts in the system (a kelp forest species in the same genus or family). For trophically transmitted parasites, we assessed parasite compatibility with potential hosts, and used free-living trophic interactions to determine whether a parasite would encounter a suitable host. For species with unknown life histories, we considered the life history of the next lowest taxonomic grouping and assumed generalism within that level. For example, the trematode Podocotyle californica has an unknown life cycle, but Podocotyle enophrysi is known to infect the snail Lacuna marmorata as its first intermediate host36. Trematodes are host-specific at this stage, and Lacuna unifasciata was the only analogous host species in kelp-forest food web, so it was assigned as the most-likely intermediate host for Podocotyle californica. On the other hand, marine acanthocephalans are thought to be generalists at the ordinal level in the first intermediate host (D. Marcogliese, pers. comm.) and are trophically transmitted. Although a second intermediate host is not necessarily required for development, acanthocephalans of top predators often use fishes as paratenic hosts. In dissections, fishes were often infected with larval acanthocephalans of birds and mammals, so we assigned amphipod species eaten by infected fish as possible first intermediate hosts. For the 15% of the nodes where a parasite from the dissections could not be identified to family, those without a clear possible host in the kelp forest, or those where nothing was known of the parasite’s life history, we did not make any inferences based on life cycle. Such parasites appear as specialists in the data (but see the false-negative assessment below).
Parasite-host inference
The number of parasite species detected is often a function of study effort37,38,39. Because study effort varied among hosts, and was sometimes low, we assigned additional parasite-host links based on potential for encounter with infectious stages of parasites and expected host compatibility. Encounter with trophically transmitted parasites occurs through host diet (i.e. through intermediate hosts eaten as prey) and was informed using the free-living food web and life-cycle inferences as described above. Encounter with directly transmitted parasites occurs through shared habitat or contact with other hosts and was informed by other parasite-host records. We based compatibility on the host-specificity, known hosts in the system, as well as the life stage of the parasite (e.g. adult tapeworms do not survive if their host is eaten, whereas juvenile tapeworms can infect repeated paratenic hosts and remain viable). For example, if a monogene was reported from 15 rockfish species in British Columbia and observed in two species locally, it was assumed to infect other rockfish species present in the SBC kelp-forest food web.
Predator-parasite interactions
Host death by predation is a major source of parasite mortality and may strongly influence parasite-host dynamics. We inferred these predator-parasite interactions using trophic interactions between free-living species. For each free-living consumer interaction, we assessed whether the parasites of the prey host would be killed or transmitted to the predator. If the predator was not a compatible host (see discussion above), we assigned a consumptive link between the free-living consumer and parasite. Users should consider that although predator-parasite links influence parasite vulnerability, they rarely constitute a significant flow of energy from parasites to predators. Food-web metrics that imply energy transfer (e.,g robustness and other bottom-up effects) should therefore not include predator-parasite links.
Assignment of link confidence
Although inferring links from logic reduces the frequency of false negative links, it also increases the possibility of reporting false positive links (reporting links that do not in fact occur). To help indicate confidence, links were assigned a code from 1–4 based on the strength of the justification for the link, with 1 being the most confident, 4 being the least. Links from the literature were assigned a confidence code based on the proximity between SBC and the region where the interaction was observed. Any links indicated by direct observations, or other studies conducted within the SBC were assigned a confidence value of 1. Links indicated by literature conducted within the greater southern California region were assigned a confidence level of 2, if the links were species-specific. Species-specific links in the literature that were from outside southern CA were assigned a confidence value of 3. Some non-species-specific links from within the SBC or southern CA were also assigned a confidence value of 3 if there was evidence that the species involved matched those in this web. Links that were inferred from only a single line of indirect evidence, or those that lacked locality or reference information were assigned a confidence level of 4. When inferred host-parasite links were based on information from inferred predator-prey links, confidence values were set to the lowest confidence value of the information that led to the inference. For example, if an adult trematode infected kelp rockfish with confidence level 3, and leopard sharks ate kelp rockfish with confidence level 2, a concomitant mortality link (predator-parasite) was assigned between the leopard shark and the trematode with confidence level 3. Therefore, the confidence score should correlate inversely with the probability that a proposed link is a false positive and indicates where more study is needed. A caveat is that the confidence values are qualitative as they represent categories, rather than a quantitative spectrum of certainty, so some uses of the confidence values (such as assigning averages or variances) would not necessarily represent a sound use of the data. Confidence values should also not be used to indicate strength or weight of a link, as they only indicate confidence in the presence of the link.
False negative estimation for host-parasite links
Even though many unobserved host-parasite links were inferred to occur based on logic, under-sampling leads to the potential for other false negative links. Such links are particularly likely for generalist parasites that have low prevalence in under-sampled hosts. For instance, if a metacercaria species infects any rockfish species at 5% prevalence, and we sample ten individuals from each of ten rockfish species, we can expect by chance to observe the parasite in only six of the ten species. The remaining four rockfish species might appear to be uninfectable by the parasite, but, assigning 0 s in the bipartite host-parasite network would result in false negative links. False negative links make parasites look more like specialists than they actually are, thereby underestimating their importance in food-web measures such as generality, vulnerability, linkage density, and connectance (the proportion of realized links relative to the number of possible links). We estimated false-negative probabilities for unobserved links at the species level and individual host level (we assumed the probability of a false positive observation was low enough to be ignored unless noted). We applied this approach separately to the following bipartite networks: trophically transmitted parasite-fish, directly transmitted parasite-fish, parasite-shark, parasite-bird, and parasite-mammal.
The first step to estimating a false negative probability is to calculate a prior statistical expectation that a parasite group infects a host group based on previously reported host-parasite links in the literature. At the node-level, we used a generalized linear model with observed or inferred link (0,1) as a dependent variable and taxonomic information (host order, host family, parasite order, parasite family, parasite species), host trophic level (calculated from the free-living web), host habitat association, and proportion of the host diet that may contain infective stages as independent variables (JMP Pro V1440). Because false negatives arising from under-sampling are common in the parasitological literature39, we included a square-root transformed sampling effort term (the number of parasite studies on the host in the literature). Model selection was based on Akaike information criterion (AIC)41, and found that host and parasite taxonomy and traits helped predict links (see Table 2 for model results of each network). The interaction between host order and parasite family was important in all bipartite networks, indicating parasite specialization at higher taxonomic levels. Study effort was less important in subnetworks with higher sampling effort across hosts. From the best-fitting model, we generated predicted probabilities for each link between species i and j, at existing effort ({widehat{psi }}_{ij}). We then assumed that with increasing effort, the probability that a link was observed ({widehat{psi }}_{ij}) approached the probability that the link exists Ψij. Then, by parameterizing the prediction equation with a hypothetical “high” effort (see Table 2) for values for each bipartite network), we projected the probability that a link exists ({widehat{varPsi }}_{ij}). According to Bayes’ Theorem, the probability of a false negative Fij, is:
$${mathbb{P}}({varPsi }_{ij}=1,& ,{psi }_{ij}=0)/{mathbb{P}}({psi }_{ij}=0)$$
Which translates to:
$${F}_{ij}=({widehat{varPsi }}_{ij}-{widehat{psi }}_{ij})/(1-{widehat{psi }}_{ij})$$
Which is a first approximation for the probability of a false negative link based on species-level data. Namely, the more likely a link occurs based on taxonomy and traits, and the less likely it is to be sampled with existing effort, the more likely an unobserved link is a false negative link due to insufficient sampling effort. We therefore estimated ({widehat{varPsi }}_{ij}) (and its standard error) and ({widehat{F}}_{ij}) from data at the species level.
We also had individual-level data for many potential links, making it possible to refine the estimate for ({widehat{F}}_{ij}) based on dissections. Now, Bayes’ Theorem translates to:
$${widehat{F}}_{ij}={widehat{varPsi }}_{ij}(1-{widehat{d}}_{ij})/(1-{widehat{d}}_{ij},{widehat{varPsi }}_{ij})$$
Where ({widehat{varPsi }}_{ij}) is estimated as above from the prior species-level data and is ({widehat{d}}_{ij}) link detectability from dissections (the probability of detecting a link in a sample if that link occurs). ({widehat{d}}_{ij}) can be estimated from individual-level data (e.g., several dissected host individuals). In a host species j that is known to be infected by a parasite species i, the probability dij of finding an infected individual after dissecting K hosts is akin to a series of K independent Bernoulli trials, each with a probability of detecting a parasite in a host equal to the parasite’s prevalence in the host population, pij.
$${widehat{d}}_{ij}=1-{(1-{p}_{ij})}^{{{rm{K}}}_{j}}$$
In the case of a host species where a parasite species i has never been detected, the parasite’s detectability in dissections is also akin to a series of K independent Bernoulli trials, but the parasite’s prevalence in the host population must be estimated from infectable hosts. The simplest assumption is that infectable species do not differ in prevalence, so that ({widehat{p}}_{ij}) is just the number of individual (h) parasitized hosts (left({sum }_{left(h=1,j=1right)}^{left(h=K,j=mright)}{psi }_{hij}right)) found in combined samples from those m host species that are infectable by parasite species i. E.g.,
$${p}_{ij}=frac{{varPsi }_{ij}{sum }_{left(h=1,j=1right)}^{left(h=K,j=mright)}{psi }_{hij}}{{sum }_{j=1}^{m}{K}_{ij}}$$
Which we estimated as
$${widehat{p}}_{ij}=frac{{psi }_{ij}{sum }_{left(h=1,j=1right)}^{left(h=K,j=mright)}{psi }_{hij}}{{sum }_{j=1}^{m}{K}_{ij}}$$
Although there are more complicated ways to estimate prevalence that take into account individual host traits, and biases from excluding infectable hosts where infections have not been detected, the simple method was sufficient to distinguish between likely and unlikely false negatives. Thus, to recap, we estimated ({widehat{varPsi }}_{ij}) using species-level data as above, then further refined the estimate of ({widehat{F}}_{ij}) from dissection data. We used error propagation to report 95% confidence limits30.
With information about ({widehat{F}}_{ij}), we estimated unseen parasite-host links as probabilities, rather than as 0 s (observed links were set to 1, and unobserved links were set to ({widehat{F}}_{ij})). Doing so identified some likely parasite links that were missed. In this case, when the probability of a false negative was >0.5, we assumed that an unobserved link actually occurred unless otherwise contradicted by species life history. We also then noted the probability of a false positive link (1 – ({widehat{F}}_{{rm{ij}}})). We further identified those few host and parasite species that generated substantial error in the network. To keep the overall error rate to <4%, we therefore removed error-prone species from the network30. These species were typically rare generalists that were easily missed in dissections. To that extent, the decision to remove them was consistent with our decision to remove rare free-living species from the network. We report these removed species and their known links30 as potentially useful information for other purposes. Finally, we used the false-negative estimates to correct for biases in network and species-level measures like generality, connectance, and linkage density.
Additional link metadata
In addition to metadata on locality, literature source, justification, and confidence, we categorize links based on different types of trophic interactions. We specified the interaction type for each consumer-resource link following the framework of Lafferty and Kuris33 (Table 1). For instance, links where a consumer kills the resource were coded as predator-prey interactions, while links where a consumer eats a small portion of a resource individual without killing it (e.g. herbivores) were assigned as micropredator/grazer interactions. Thus, the free-living web contained predation and micropredation/grazing links. Some organisms often referred to as “parasites” fit the definition of micropredation (e.g. gnathiid isopods). Several more types of interactions are possible between symbiotic organisms and their hosts, depending on transmission strategy (trophic transmission or direct transmission), effects on host fitness, and reproduction method (within the host or in the environment). Metadata in the node list (such as site of infection) allows investigators to simplify these link types according to research questions of interest.
Source: Ecology - nature.com