2017 honey sampling
Beekeepers were invited to provide honey for analysis via a nationwide campaign publicised on the gardening programme, BBC Gardener’s World (broadcast July 2017). Participating beekeepers were asked to supply ~30 ml of honey from any date in 2017, reporting the date of sample collection and the location of the apiary, using a grid reference or postcode. In total 441 honey samples were processed from beekeepers.
Honey DNA extraction
Any wax was removed using sterile forceps and DNA was extracted from 10 g of honey using a modified version of the DNeasy Plant Mini extraction kit (Qiagen). Firstly, the 10 g of honey was made up to 30 ml with molecular grade water and incubated in a water bath at 65 °C for 30 min. Samples were then centrifuged (Sorvall RC-5B) for 30 min at 15,000 rpm, the supernatant was discarded, and the pellet resuspended in 400 μL of a buffer made from a mix of 400 μL AP1 from the DNeasy Plant Mini Kit (Qiagen), 80 μL proteinase K (1 mg/ml) (Sigma) and 1 μL RNase A (Qiagen). This was incubated again for 60 min at 65 °C in a water bath and then disrupted using a TissueLyser II (Qiagen) for 4 min at 30 Hz with 3 mm tungsten carbide beads. The remaining steps were carried out according to the manufacturer’s protocol, excluding the use of the QIAshredder and the second wash stage. The extracted DNA was purified using the OneStep PCR Inhibitor Removal Kit (Zymo Research) and diluted 1 in 10.
PCR and library preparation
Illumina MiSeq paired-end indexed amplicon libraries were created via a two-step PCR protocol. Two libraries were prepared for the DNA barcode regions, rbcL and ITS2. Initial amplification used the template specific primers rbcLaf and rbcLr50637, and ITS2F and ITS3R, with universal tails designed to attach custom indices in the second-round PCR. To improve clustering on the Illumina MiSeq, a 6N sequence was also added between the forward template specific primer and the universal tail.
Forward universal tail, 6N sequence and rbcLaf: [ACACTCTTTCCCTACACGACGCTCTTCCGATCT]NNNNNN[ATGTCACCACAAACAGAGACTAAAGC]
Reverse universal tail and rbcLr506: [GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT][AGGGGACGACCATACTTGTTCA]
Forward universal tail, 6N sequence and ITS2F: [ACACTCTTTCCCTACACGACGCTCTTCCGATCT]NNNNNN[ATGCGATACTTGGTGTGAAT]
Reverse universal tail and ITS3R: [GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT][GACGCTTCTCCAGACTACAAT]
This first PCR used a final volume of 20 μl: 2 μl template DNA, 10 μl of 2× Phusion Hot Start II High-Fidelity Mastermix (New England Biolabs UK), 0.4 μl (2.5 µM) forward and reverse primers, and 7.2 μl of PCR grade water. Thermal cycling conditions for rbcL were: 98 °C for 30 s, 95 °C for 2 min; 95 °C for 30 s, 50 °C for 30 s, 72 °C for 40 s (40 cycles); 72 °C for 5 min, 30 °C for 10 s. Thermal cycling conditions for the first ITS2 PCR were: 98 °C for 30 s 94 °C for 5 min; 94 °C for 30 s, 56 °C for 30 s, 72 °C for 40 s (40 cycles); 72 °C for 10 min, 30 °C for 1 min. The initial PCR was carried out three times and pooled.
The pooled products from the first PCR were purified following Illumina’s 16S Metagenomic Sequencing Library Preparation protocol using Agencourt AMPure XP beads (Beckman Coulter). The purified PCR product from round one was followed by a second round of amplification to anneal custom unique and identical i5 and i7 indices to each sample (Ultramer, Integrated DNA Technologies).
This index PCR stage used a final volume of 25 μl reaction (12.5 μl of 2× Phusion Hot Start II High-Fidelity Mastermix, 1 μl of i7 Index Primer and i5 Index Primer, 6.5 μl of PCR grade water, and 5 μl of purified first-round PCR product). Thermal cycling conditions were: 98 °C for 30 s; 95 °C for 30 s, 55 °C for 30 s, 72 °C for 30 s (8 cycles); 72 °C for 5 min, 4 °C for 10 min. Following the index PCR, a 1% gel was run to verify its success. The index PCR product was then purified following the PCR clean-up two sections of the Illumina protocol. The purified products of the index PCR were quantified using a Qubit 3.0 fluorescence spectrophotometer (Thermo Fisher Scientific) and pooled at equal concentrations to produce the final library. Positive and negative controls were amplified and sequenced alongside honey samples. The positive control was made from a mixture of five tropical tree species that were not present in the survey site. The species Baccaurea stipulata, Colona serratifolia., Dillenia excelsa, Kleinhovia hospita, and Pterospermum macrocarpum were used, taking 5 μl from each separate DNA extraction and mixing, before following the protocol as with the honey samples. All five species were detected within the sequencing results.
Bioinformatic analysis
Sequence data were processed using a modified data analysis pipeline14,38. Raw reads were trimmed to remove low-quality regions (Trimmomatic v. 0.33), paired, and then merged (FLASH v. 1.2.11), with merged reads shorter than 450 bp discarded. Identical reads were dereplicated within samples and then clustered at 100% identity across all samples (vsearch v. 2.3.2), with singletons (sequence reads that occurred only once across all samples) discarded.
The Barcode Wales and Barcode UK projects provide 98% coverage for the native flowering plants and conifers of the UK37. This reference library was supplemented with a curated library of the non-native and horticultural species, downloaded from GenBank. This UK species list was generated using the list of native species of the UK from Stace (2010)39, 505 naturalised alien species (BSBI), and horticultural species from the IRIS BG database at the National Botanic Garden of Wales.
The sequence data from the honey samples were compared against the reference database using blastn, using the script vsearch-pipe.py. The top BLAST hits were then summarised using the script vsearch_blast_summary.py. Sequences with bit scores below the 1st percentile were excluded. If the top bit scores of a sequence matched to a single species, then the sequence was identified to that species. If the top bit scores matched to different species within the same genus, then the result was attributed to the genus level. If the top bit score belonged to multiple genera within the same family then a family level designation was made. Sequences that returned families from different clades were excluded. These automated identifications were then checked manually for botanical veracity. To check identified plant species against their availability across the UK, species records from the BSBI (Botanical Society of Britain and Ireland) were used for native species, while commercial availability for horticultural species was verified with the RHS Plant Finder40. Within each sample, the number of sequences returned from rbcL and ITS2 for each plant taxon was summed to combine the results of each marker.
The proportion of sequences was used in the analysis, which has been shown to be an appropriate method to control for differences in read number41. Alternatively, the sequencing data can be rarefied, but this has been criticised as a statistical technique, due to requiring the removal of valid data41. To investigate the impact of rarefying on the conclusions drawn from the data, all analyses were rerun with rarefied data (Supplementary Results).
1952 Honey sampling
In 1952, 855 honey samples were characterised from 66 counties across the UK and Ireland using melissopalynology15,16. The methods reported for the research conducted in 1952 are described here fully for comparison. Samples were obtained via a general appeal and were all collected during the honey season of 1952. For each honey sample, ~200 pollen grains were identified using the morphology of the pollen under the microscope, following a standardised protocol42. To extract the pollen, 10 g of honey was dissolved in 20 ml of distilled water, from which 10 ml was taken and centrifuged at ~2000 rpm for one minute. The supernatant was discarded, and the sediment retained, and then the process was repeated for the remaining liquid. From the sediment, a drop was transferred to a glass slide and spread out over an area of 1 cm2, before being stained with fuchsin and dried. Euparal vert was used as a final mounting medium. Pollen was identified by comparison with a reference library of pollen preparations and available pollen morphological data43,44. Each plant taxon found in the sampled honey was reported according to the proportion of pollen grains found and classed into predominant (>45% of pollen grains), secondary (15–45% of pollen grains) and important minor (1–15% of pollen grains). The location data for the honey samples were restricted to the county level, and summary data tables were presented for each UK county that returned honey.
Comparing the 1952 and 2017 honey samples
The plants detected using DNA metabarcoding and melissopalynology have been compared in previous studies with concordance found between the two methods45,46,47,48. Both methods detect the same major taxa, but rarer species in a sample are less likely to be found consistently, both when comparing methods and also during replicates of the same method45,46,47. DNA metabarcoding is often able to detect more taxa when compared to melissopalynology, by identifying rarer species in the sample and by achieving higher taxonomic resolution in certain cases. While melissopalynology uses counts of pollen grains to provide a starting point for quantitative analysis, DNA metabarcoding as a process is semi-quantitative, with biases associated with the process of DNA extraction, PCR and sequencing33,45. To allow for these considerations we placed the proportion of DNA sequence reads and pollen counts into four broad abundance classes matching the classifications used in melissopalynology (predominant, secondary, important minor and minor) and focus our analyses and conclusions on changes in the frequency of occurrence of the major taxa, classed as predominant and secondary. Both methods capture information on both nectar and pollen plants within the honey, however, certain species can be over or under represented in pollen analysis compared to their relative nectar contribution49. Both pollen and nectar plants are required to meet the foraging requirements of pollinators.
Statistics and reproducibility
Statistical analysis of DNA metabarcoding data
To understand how the plant taxa composition within the honey sample was structured in space and time, the effect of time (measured as the calendar month number in 2017), latitude and longitude of sampling location were included in a single, two-tailed generalized linear model using the ‘manyglm’ function in the package ‘mvabund’50. Honey samples with missing metadata were excluded, giving a sample size of 428. An abundance table of taxa (number of sequence reads) found in each sample was set as the multivariate response variable and a common set of predictor variables (month, latitude and longitude) were fit using a negative binomial distribution. The number of sequence reads per sample was included as an “offset” in the model in order to control for differences in the number of sequence reads between samples. Monte Carlo resampling was used to test for significant community-level responses to our predictors. The strong mean-variance relationship in the data (Supplementary Fig. 6) and the distribution of the count data (Supplementary Figs. 7, 8) support the use of a negative binomial distribution in the model. The appropriateness of the models was checked by visual inspection of the residuals against predicted values from the models (Supplementary Figs. 9–11).
We completed a spatial eigenfunction analysis using distance-based Moran’s eigenvectors. Moran’s Eigenvector Maps were computed using the ‘mem’ function from the adespatial package. Moran’s I was computed for each taxa using the ‘moran.randtest’, with Bonferroni correction for multiple testing. The direction of autocorrelation (positive and negative) was tested using the ‘moranNP.randtest’ function, using the adespatial package in R.
Statistical analysis of the 1952 and 2017 honey samples
Abundance classes were assigned based on the percentage of reads returned for the two DNA regions rbcL and ITS2, matching the classifications used in melissopalynology. Plant taxa represented by over 45% of reads were designated predominant for that sample; between 15 and 45% were secondary; between 1 and 15% were important minor taxa, and <1% of reads were classed as minor taxa. The number of times each taxon occurred at each level of abundance was then calculated, with the sum of this giving the frequency of occurrence across all the samples.
The results of the 2017 analysis were then compared with 855 honey samples characterised in 1952, from across the UK and Ireland using melissopalynology15,16. The relationship between the frequency of occurrence for the matched plant taxa between 1952 and 2017 was assessed using Kendall’s rank correlation.
To compare the major taxa (classed as predominant and secondary) between 2017 and 1952, the effect of sample location (UK county name) and sample year (2017 or 1952) were included in a two-tailed generalized linear model using the ‘manyglm’ function in the package ‘mvabund’50. In the absence of latitude and longitude for honey samples collected in 1952, UK ceremonial county names were used as a proxy for the location for both 2017 and 1952 honey samples. Honey samples from 2017 were assigned their ceremonial county based on latitude and longitude and matched to the counties listed in 1952. Using a binomial distribution, the effect of county location and year (1952 and 2017) were included as explanatory variables in the model and the presence or absence of each taxa was set as the response variable. The appropriateness of the models was checked by visual inspection of the residuals against predicted values from the models (Supplementary Fig. 11).
The change in proportion of major (predominant and secondary) taxa between 1952 and 2017 was examined for the plant taxa that occurred as major taxa in more than 1% of samples for both honey surveys. Chi-squared contingency tests were used to assess differences, with Bonferroni correction for multiple testing. All statistical analyses were carried out using R (v. 3.5.2).
Countryside Survey vegetation plot data frequency changes
Changes in the local frequency of the major plant forage species found in both 1952 and 2017 were assessed using the Countryside Survey data from 1978 to 200751,52,53. In 1978, the survey looked at 256 1 km squares within which fixed plots were established, representing fields and unenclosed land (200 m2) as well as linear features including hedgerows, streams and roadsides (10 m2). In each plot, a list of all vascular plants was recorded. Where possible, squares and plots were then revisited in 2007, representing 236 1 km squares containing 1577 plots. For these revisited plots, the percentage change in plot frequency was calculated54.
Landscape data
The Land Cover 2017 map was used to characterise habitat in a 2 km radius of the hives55 while the 2017 CEH Land Cover Plus: Crops map was used to assess the presence and absence of crop species, Brassica napus (oilseed rape) and Vicia faba (field beans), within a 2 km radius of each hive. A chi-squared contingency test was used to analyse the differences between the presence of the crop species in the honey and the presence and absence of the crop within the landscape. Non-metric multidimensional scaling (NMDS) ordination was used to visualise differences in the composition of the honey relating to the dominant habitat type in a 2 km radius, based on the proportion of reads returned for each taxon. Ordinations were carried out using the metaMDS function in the vegan package56 in R using Bray-Curtis dissimilarity indices. The differences in plant community composition and surrounding dominant habitat type were tested using the adonis function from vegan, with 999 permutations. Analyses and maps were generated in R (v. 3.5.2).
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Source: Ecology - nature.com