Thousands of reptile species threatened by under-regulated global trade

Website sampling

We used five different search terms, all translations of ‘reptiles for sale’ (reptile à vendre, reptilien zu verkaufen, 爬虫類の販売, reptil para la venta), on the appropriately localized version of two search engines (Google: https://www.google.com/, https://www.google.fr/, https://www.google.de/, https://www.google.jp/, https://www.google.es/ and Bing: https://www.bing.com/?cc=en, https://www.bing.com/?cc=fr, https://www.bing.com/?cc=de, https://www.bing.com/?cc=jp, https://www.bing.com/?cc=es) to retrieve a list of reptile selling websites, extracting URLs using the XML v.3.99.0.3³⁹, assertthat v.0.2.1⁴⁰ and stringr v.1.4.0 packages⁴¹ in R v.3.5.3⁴² and R studio v.1.2.1335⁴³ (Supplementary Code 1). We completed searches in Firefox⁴⁴, while signed out of search engine accounts and in a private window to minimise browsing history’s impact on searches.

Once we had generated a list of search result URLs, we manually reviewed each URL’s content—679 websites led to 151 searchable reptile selling websites. Our review had three goals: ensure the website was selling reptiles, check whether the website terms and conditions did not explicitly forbid automated data collection, and identify the most appropriate method of searching the content of the website (see Supplementary Data 1 for example of review datasheet).

We employed a hierarchy of five search methods depending on the structure of the website and the display of stocklists (Supplementary Code 3). The hierarchical approach minimised the analysis of irrelevant pages and minimised server load.

(1) Searched only a single html or pdf page. Where the seller has supplied a full list of stock, we could review all animals sold with only a single page using the downloadr v.0.4 package⁴⁵. Occasionally animals were listed on large pages that displayed a store’s full non-animal stock as well. For stocklists supplied as a pdf, we manually downloaded them and accessed the text using the pdftools v.2.2 package⁴⁶.

(2) Systematic cycling through search results. Forty-nine websites with adequate search functions allowed us to request all reptiles for sale, then examine the pages of search results one-by-one. We employed this method when website search results contained the complete details of the reptiles for sale on the search page. We ceased cycling through search pages when a URL returned a 404 error, or when 100 pages had been cycled through. Hundred pages were surveyed to prevent endless cycling back onto initial pages, or deriving errors from misinterpreting the number of search pages returned, while still exceeding the number of pages on most sites. We performed a post hoc review of ten sites searched using a cycle search method to check whether species ordering could have led systematic biases for names near the beginning of the alphabet or price. For four websites we could not determine how species were ordered, for six websites species listings were ordered by date, and for one website, species were ordered by popularity. Thus even for sites with more pages, we feel the results will not be impacted by biases given the inconsistency of approach for ordering entries on different sites. The 100-page limit may have led to missing species on large websites, but undercounting likely only affected a small portion of the websites searched via cycling methods and overlap between websites species lists mitigate suboptimal sampling on any particular website (see species-accumulation curves, Supplementary Figs. 1 and 2).

(3) Systematic cycling followed by level 1 crawl. We employed this method when sites had adequate search functionality but the details or full names of species for sale were buried one level deeper into the search results. In these instances, we ran a level 1 crawl on every search results page (Rcrawler v.0.1.9.1 package⁴⁷). We followed the same stop criteria as the single page systematic cycling, 404 error or 100 search result pages.

(4) Basic level 1 crawl. Some sites had a full species list but split between clades or categories. In this case we passed the page containing links to all the different clade lists and completed a level 1 crawl⁴⁰.

(5) Basic level 2 crawl. We required a level 2 crawl⁴⁷ when the subsection of reptiles for sale was more specific. For example, to detect a Boa constrictor on a site that divided its stock multiple times would require moving from the ‘snake’ section, through to the ‘Boa’ section where the details of the stock are listed.

We employed these five search methods hierarchically, 1–5, and included 20 s delays between crawled requests to minimise server load on reptile selling websites. For search method 3, there was a significant chance of duplicated pages being returned; we removed duplicated pages prior to keyword searching. A few sites required multiple methods to extract complete stocklists. For search methods 3, 4 and 5, we limited the crawl further by selecting (where possible) keywords within a website’s URL that must be included for a page to be searched. For example, a website may list animals on pages that all include the pattern ‘/category=reptiles/’, therefore limiting the search of irrelevant non-stock pages.

We augmented our 2019 snapshot sampling by exploring the archived web pages stored on the Internet Archive⁴⁸. For the most species-rich site (from the 2019 snapshot) we retrieved all archived web pages using the Internet Archive’s Wayback machine API⁴⁹, adapting code from the wayback v.0.4.0 package⁵⁰, with functions from httr v.1.4.1⁵¹, jsonlite v.1.6.1⁵², downloader v.0.4⁴⁵, lubridate v.1.7.4⁵³ and tibble v.2.1.3 packages⁵⁴. We limited the search to pages directly pertaining to sales.

Though our online search analysis provided the number of mentions per species per page, we do not detail these numbers because sellers may list multiple individuals at once, sellers may post the same advertisement numerous times, or that advertisements can be repeated on different pages within the same website. Therefore, numbers derived from online analysis did not provide a reliable estimate of numbers per species for sale, and we elected to restrict analysis to binary species appearances.

Keyword generation

We use the complete list of 11,050 reptiles created by Reptile Database¹⁶, updated 14 August 2019, as our naming standard. We downloaded the complete list from http://www.reptile-database.org/data/, then fed the list of species into code designed to query and extract all common names, historic scientific names and locality information for each species from Reptile Database. The extraction code made use of functions from stringr v.1.4.0⁴¹, XML v.3.99.0.3³⁹, xml2 v.1.2.2⁵⁵ and rvest v.0.3.5 packages⁵⁶. We combined the resulting list with names, both common and scientific, supplied by CITES (http://checklist.cites.org/#/en [accessed 6 September 2019]) using the dplyr v.0.8.4 package⁵⁷ (Supplementary Code 2). Five CITES listed species had no matching counterpart in the Reptile Database; we determined that this was caused by minor spelling mistakes. We included both spellings in our complete list of species keywords. Overall, our species keyword list comprised all scientific and common names from both Reptile Database and CITES (Supplementary Data 2), with an average of 5.82 ± 0.06 s.e. (standard error) per species and grand total of 64,342 terms (s.e. calculated using the pracma v.2.2.5 package⁵⁸). Common names were predominantly English, French, German or Spanish, but occasionally included local names. We compared the number of species detected via scientific names, to the number of species detected via a combination of scientific and common names because previous work highlighted that, while correlated, they can produce different search results⁵⁹.

Keyword searching and species comparison

On a site by site basis, we cleaned each page’s html code of extraneous punctuation, numbers and spacing, replacing them with single spaces. That way, two-word keywords split by line breaks, punctuation or double-spacing appeared the same as those split only by spaces.

After cleaning the html code, we searched each page for keyword matches using the stringr v.1.4.0 package⁴¹ (Supplementary Code 4, 5). Because of the large quantity of keywords and the high computational cost of collation string matching, we used a fixed string matching set to be case insensitive. Fixed string matching has the disadvantage of being sensitive to variation in how diacritics or ligatures are displayed as single or multiple characters. Our keyword searches returned the website, page number (as an index relating to the total number of pages retrieved from a given website), the keywords detected, and the corresponding Reptile Database name (Supplementary Data 3).

We searched the pages obtained from The Internet Archive⁴⁸ using the same list of species (Supplementary Code 4, 7). Because there is likely a connection between the number of pages available and the number of species detected, we regressed the number of species detected in a year against the number of pages searched (n = 15, intercept = 483.72, gradient = 1.65). We excluded 2002, 2003 and 2019 for this regression because they had considerably fewer pages than all other years (mean of 3.7 ± 1.2 pages, compared to mean of 296.6 ± 48.8 pages). We plotted the residuals from the regression alongside counts of unique species per year and the number of species included in CITES appendices. To show the sensitivity to the keywords used, we counted the number of unique species in two ways: (1) counting all species detected using either scientific or common name keywords, (2) counting species only detected using scientific name keywords. The two keyword groups produce slightly different yearly species lists; therefore, changing the number of unique species per year and yearly residuals.

We compared the list of species names generated by our keyword searches to those listed in CITES. Because names of species have changed, we first converted the CITES scientific names to the most recent Reptile Database used name. However, due to species synonymisations, splits and name changes, comparisons between the list of traded species and CITES species contain some ambiguity. The ambiguity can be seen in the variation between the number of traded species covered by CITES when comparing to only the top Reptile Database name, versus when comparing the CITES list to any historically used name of traded species. For general reporting we used the more generous matching using any historic name, boosting the estimations of CITES coverage. For examination of counts of CITES covered species traded over time (Fig. 2d) we used the more stringent single name matching because of the added complexity of a changing list of CITES species and the assumption that new CITES listings would use the most recently accepted name.

Data exploration and display

We used forcats v.0.4.0⁶⁰ and dplyr v.0.8.4⁵⁷ to manipulate data, and ggplot2 v.3.2.1⁶¹, scico v.1.1.0⁶², ggpubr v.0.2⁶³ and ggforce v.0.3.1⁶⁴ to generate the plots. We undertook keyword searching in R v.3.5.3⁴² and R studio v.1.2.1335⁴³. Silhouette images were obtained from http://phylopic.org/, in cases where the images were not public domain for free from attribution they were produced by Aline M. Ghilardi (CC BY-NC 3.0) and Roberto Díaz Sibaja (CC BY 3.0).

We explored the completeness of our samples—2019 snapshot and temporal—in two ways. The first was only applied to the snapshot data. We built an accumulation curve illustrating the relationship between the number of sites sampled and the number of species detected. We accomplished this by randomly resampling a subset of websites, increasing the subsample by one website until all were included. We repeated the resampling process 100 times, and plotted the results with a loess smoothed curve. The second method we applied to both snapshot and temporal data. Using the iNEXT v.2.0.19 package^65,66, treating our data as raw incidences, we calculated both sample-size and coverage-based rarefaction and extrapolation metrics providing us with estimates of total species richness and sample completeness. For snapshot data we used ‘website’ (n = 151) as the resampling method, for the temporal data we used ‘year’ (2002–2019, n = 18).

We compared our data to two international trade databases (compiled species list is available in Supplementary Data 5, code for review of data sources is available in Supplementary Code 9): CITES and United States Fish and Wildlife Service’s Law Enforcement Management Information System (LEMIS). Following the online web scraping, the same types of analysis and cleaning were applied to all three databases. CITES data was retrieved from https://trade.cites.org/# on 13 May 2020) using the comparative tabulations for all ‘reptilia’ and the appropriate years (the snapshot of 2019, and 2004–2019) to download all reptile species traded over this time. We retrieved LEMIS data^67,68 (v.1.1.0) via R using the lemis package⁶⁹ (Supplementary Code 6). LEMIS data includes records of imports to the USA, alongside information pertaining to purpose, quantity, origin, date among other metadata, therefore quantitative data on imports for each species, or based on origins and source could be calculated in R using dplyr v.0.8.4⁵⁷. As for the CITES species lists, the unstandardised LEMIS names were matched to those present in Reptile Database (operating as our backbone nomenclature), leading to both synonymisations and splits. A LEMIS name was converted to a Reptile Database name if it matched any current, common, or historically used name. Names would fail to match if misspelled. By LEMIS naming, there were 639 instances of genus level listing, that were matched to 510 Reptile Database names. Of the 510 converted names, 442 appeared in other sources, suggesting genus level listings in LEMIS did not inflate species counts. Outside of generic level listings, 83 full names could not be matched. We compared the 83 names to the traded list from other sources, looking for names with fewer than 5 different characters (using the similiars v.0.1.0 package⁷⁰); 56 species were found to be present in other sources by this metric. Those that failed to be converted were not included in total species counts; as final counts were entirely based on Reptile Database explicit species naming. Final species counts from all data sources are based on unique Reptile Database names and do not include any remaining generic identifiers after this synonymization/split process.

Though the research focused on the percentages of species vulnerable to trade based on various forms of IUCN and CITES categorisation, we made some efforts to quantify the proportions of items with different statuses within CITES and LEMIS. Quantifications were made using a number of different approaches. Online assessments were not directly quantified due to the possibility of listing the same individuals multiple times, or having mixed batches of specimens with variable numbers. For CITES we used the summary statistics tool in ArcMap 10.3 to quantify the means and totals for the numbers exported and imported (and listings of both are provided throughout where the numbers differ), and the range for each species or endangerment status is provided in text (or a single number if they were the same). RedList status was associated with the data by joining the scientific name field between the two databases. Sums were made for various sources, purposes and endangerment statuses for CITES data using this same approach, based on the 2004–2019 data from the CITES trade portal. ‘Terms’ (i.e., skins) were also explored, recategorising the standard terms (57 were used for reptiles) into nine (i.e., fashion, live, food, decorative, medicinal, specimen, egg, body, other uses), then summing the total item number imported and exported and determining the percentage. In addition to this we tried to quantify the trade in wild captured individuals within CITES. To try to represent individuals, terms from the CITES trade database were filtered to only include bodies, carapaces, eggs, live, shells, skeletons, skins, specimens, trophies, as most of these are mutually exclusive, though the huge quantity of reptile leather and meat could not be converted to representative individuals, skins or bodies listed as weights were also removed. Following from this the individuals from each source imported and exported could be calculated to percentages of individuals from the wild or captive bred within CITES, though these percentage values were very similar to summed total values showing the results are consistent. To investigate the extent of wild capture in LEMIS data, we restricted our summaries to items that represent individuals (whole dead bodies, live eggs, dead specimens, live individuals, full specimens, substantially whole skins, and full animal trophies), filtering out 75.6% other reptile items (79,812,310/105,536,941) leaving us 25,724,631 items to review source and purpose. The filter terms are close to those used in other recent publications which also quantified elements of trade (“live”, “bodies”, “skins”, “gall bladder”, “skulls”, “heads”, “tails”, “trophies” and “skeletons”)⁷¹, but we also excluded body-parts that may have come from the same individual (i.e., skin and skull) which may otherwise inflate numbers (79,812,310 items including skulls and skeletons; 79,796,472 excluding skulls and skeletons). The filtering to individuals made negligible difference in summaries of origin (wild or captive): 58.17% wild-sourced without the filter, 58.05% wild-sourced with the filter (61,390,757/105,536,941 without; 14,933,888/25,724,631 with); 41.32% captive sourced without the filter, 41.23% captive sourced with the filter (43,611,039/10,5536,941 without; 10,605,330/25,724,631 with). Our quantification of non-commercial trade was calculated by the number of individual animal items listed as Scientific, Reintroduction, or Biomedical research; our quantification of captive sourced trade was calculated by the number of individual animal items listed as being bred/born in captivity, commercially bred, or from ranching operations. We excluded all instances of NA in either purpose or source filters (127,881 reptile items had a missing source, purpose, or description). We additionally include clade-based analysis of source, as some taxa (i.e., crocodylia) may be more impacted for fashion trade and are imported in greater numbers. For clade-based summaries of wild capture, we summarised the quantity of traded items by genus, and further simplified the genus-summary to clade using Reptile Database genera and family information. For genera missing from Reptile Database (e.g., where genus information was family such as Varanidae), we manually assigned the clade.

Maps were created using the Global Assessment of Reptile Distributions (GARD) database⁷² combined with each species list as appropriate using join field and then connecting by scientific names in ArcMap 10.3 based on the corrected lists. Join by field was also used to connect species to their RedList status (downloaded from https://www.iucnredlist.org/search) and CITES appendix (from http://checklist.cites.org/#/en). To create hotspot vulnerability maps we extracted each group with different IUCN classes then used ‘count overlapping polygons’ to count the number of species with each status in any given area. This was then repeated separately for the species listed within each of the three data sources, to map the species listed as traded within each separately in addition to the total number in trade.

To obtain overall number of species and percentage of species we separated each species polygon for species in trade, and all species using QGIS, then converted them to rasters with a resolution of ~1 km using ArcCatalog. Mosaic to new raster was then used in groups of 200 species, then all mosaics added to determine overall richness for reptiles, and richness for reptiles in trade, and the percentage of reptiles in trade determined using the raster calculator ((traded_species/all_species)*100). Other trends, i.e., the percentage of species coming from different sources or with different statuses was calculated in Excel using basic approaches to quantify listings with different qualities (i.e., seized, wild, commercial and personal use) and the percentage with that status within CITES based on the number of exports and imports. For more extensive analysis of multiple factors, summary statistics were used in ArcMap after joining fields to connect species data from traded specimens of the three data sources with RedList assessments. This provided some simple statistics to further understand patterns as detailed in text, as CITES data lacks the detail of some other data sources; it was largely used to understand what species were in trade relative to existing regulations and threat.

To determine trends on a country basis we joined the CITES appendix and RedList status to the GARD layer. We used QGIS to separate a global country layer (http://thematicmapping.org/downloads/world_borders.php) into constituent countries, then clipped the GARD layer into each country with trade status noted. ISO2 codes were added to each of the country layers, then each country merged again to list each species and country and thus provide a species list for all countries within the GARD database. The number of species in and out of trade, and with and without an appendix was then calculated for each species using summary statistics, and this was repeated for each RedList status (Supplementary Data 6). For all species listed in Reptile Database but with no GARD layer, countries were listed separately and the process repeated based on the listing on the website, then the total combined with that from the GARD layer to map country richness, and the number of species with each trade status and endangerment to provide an understanding of the level of potential threat to the reptile faunas of different regions based on the trade, threat and CITES appendix of species listed in those countries.

For exploration of the time lag between species descriptions and their detection in the trade, we relied on the date of description from the author details supplied by Reptile Database. We extracted the earliest description date for each species using the stringr v.1.4.0 package⁴¹ (Supplementary Code 8), and compared this to the year reported alongside the archived pages or trade date in LEMIS. For species detected only in the snapshot data we used 2019 as their date of appearance in the trade and did not include them in the calculation of mean lag time. We only included species that had been detected directly with the scientific name of the new descriptions, subsequent name changes or common names were ignored for this analysis. We also excluded species listed as only being traded for LEMIS non-commercial purposes in this part of analysis.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Source: Ecology - nature.com

Thousands of reptile species threatened by under-regulated global trade

Website sampling

Keyword generation

Keyword searching and species comparison

Data exploration and display

Reporting summary

Publisher Correction: Science diplomacy for plant health

Validating the physics behind the new MIT-designed fusion experiment

ITALIAN LANGUAGE

ENGLISH LANGUAGE