Deep learning increases the availability of organism photographs taken by citizens in citizen science programs

Citizen science program “Hanamaru-maruhana national census”

We asked citizens to take bee photographs and send them by e-mails in citizen science program “Hanamaru-Maruhana national census (Bumble bee national census in English)” (http://hanamaruproject.s1009.xrea.com/hanamaru_project/index_E.html)⁸. We gave citizens previous notice that their photographs were going to be used for scientific studies, and for other non-profit activities on our homepage and flyers. From 2013 to 2016, we collected roughly 5000 photographs taken by citizens. Citizens sent photographs of various bee species, but most of them were bumble bees and honey bees. They have interspecific similarity and intraspecific variation, making it difficult for non-experts to identify species. Since species identification was not a requirement for participants, most citizens sent bee photographs without species identification. These bees were identified by one of the authors, J. Yokoyama. These bees are relatively easy for experts to identify because only two honey bee species and 16 bumble bee species inhabit the Japanese archipelago excluding the Kurile Islands. The consistency of species identification by J. Yokoyama was 95% for 15 bumble bee species, and 97.7% for major six bumble bee species in our test using 100 bumble bee photographs⁸.

Bee photographs used for deep learning

From bee species observed in citizen science program “Hanamaru-maruhana national census (Bumble bee national census in English)”, we selected two honey bee species and 10 bumble bee species having interspecific similarity and intraspecific variation. Two honey bee species consisted of Apis cerana Fabricius, and A. mellifera Linnaeus. 10 bumble bee species consisted of Bombus consobrinus Dahlbom, B. diversus Smith, B. ussurensis Radoszkowski, B. pseudobaicalensis Vogt, B. honshuensis Tkalcu, B. ardens Smith, B. beaticola Tkalcu, B. hypocrita Perez, B. ignitus Smith, and B. terrestris Linnaeus. To increase training data of B. pseudobaicalensis, we added photographs of B. deuteronymus Schulz to photographs of B. pseudobaicalensis because they can rarely be distinguished using only photographic images (see http://hanamaruproject.s1009.xrea.com/hanamaru_project/identification_E.html for the details of their color patterns). We primarily used photographs taken by citizens from 2013 to 2015 in the citizen science program, but also used photographs taken by citizens in 2016 if the number of photographs for a certain class was small.

We cropped a bee part as a rectangle image from a photograph to reduce background effects. We increased the number of photographs by data augmentation (Fig. S1 in Appendix S1 in Supplementary information). Please see Appendix S1 in Supplementary information for the details of “Data augmentation.” We assigned 70, 10, and 20% of the total data of the training dataset, validation dataset, and test dataset, respectively. Please see Appendix S1 in Supplementary information for the details of “Data split and training parameters”.

Deep convolutional neural network (DCNN)

In this study, we chose a deep convolutional neural network Xception, as it provides a good balance between the accuracy of the model on one hand and a smaller network size on the other. We adopted transfer learning^21,22 and data augmentation²³ to solve the issue of a shortage of photographs. The Xception network has a depth of 126 layers (including activation layers, normalization layers etc.) out of which 36 are convolution layers. In this study, we employed the pretrained Xception V1 model provided on the Keras homepage. Please see Appendix S1 in Supplementary information for the details of “Xception”, and “Transfer learning.” For the training, we chose a learning rate of 0.0001 and a momentum of 0.9.

Species identification by biologists

We asked 50 biologists to identify the species present in nine photographs selected randomly from the photograph dataset using a questionnaire form. Their professions were forth undergraduate student (16%), Master’s student (14%), Ph.D. student (12%), Postdoctoral fellow (26%), Assistant professor (6%), Associate professor (12%), Professors (6%), and others (8%). Their research organisms were honey bees (6%), bumble bees (14%), bees (6%), insects (12%), plants and insects (12%), plants (22%), and others such as fishes, reptiles, and mammals (28%). 14% of the biologists were studying bumble bees, but they did not need to identify all bumble bee species in their researches because only several species inhabit their study areas. We allowed the biologists to see field guide books, illustrated books, and websites. We did not limit the method or time to identify the species of photographs to simulate the species identification of actual citizen science programs as much as possible, except for asking experts. The experiment was approved by the Ethics Committee in Tohoku University, and carried out in accordance with its regulations. Informed consent was obtained from the biologists.

Species identification in species class experiment by Xception

We conducted species class experiment by categorizing photographs into different classes according to species. A total of 3779 original photographs were used in species class experiment (Table S1 in Appendix S1 in Supplementary information). These photographs were classified into 12 classes according to species. We inputted test dataset to Xception, and recorded their predicted classes.

Species identification in color class experiment by Xception

We conducted color class experiment by categorizing photographs into different classes according to intraspecific color differences. Photographs of B. ardens were classified into the following four classes: female B. ardens ardens, B. ardens sakagamii, B. ardens tsushimanus, and male B. ardens (Table S1 in Appendix S1 in Supplementary information). Photographs of B. honshuensis, B. beaticola, B. hypocrita, and B. ignitus were classified into female and male classes. In trial experiments, we had found that the Xception cannot learn images in minor classes if the number of original photographs in the classes was less than 40. No photographs in the class were predicted correctly, and no photographs in the other classes were predicted as the class. Therefore, in color class experiment, we did not use the photographs of minor classes (B. ardens subspecies: B. ardens sakagamii and B. ardens tsushimanus, male B. honshuensis, and male B. beaticola). Therefore, a total of 3681 original photographs were used in color class experiment (Table S1 in Appendix S1 in Supplementary information). They were classified into 15 classes according to intraspecific color differences in addition to species classes. We inputted test dataset to Xception, and recorded their predicted classes. To compare the total accuracy of color class experiment by Xception with those of other experiments, it was normalized using the number of test data including those of the minor classes, assuming that all test data of the minor classes were misidentified.

The accuracy of species identification

We calculated total accuracy, precision, recall, and F-score in each class. Total accuracy is the number of total correct predictions divided by the number of all test datasets. Note that the total accuracy of color class experiment by Xception was normalized using the number of test data including those of the minor classes. It reduces the total accuracy of color class experiment by Xception, and enables to compare with those by biologists and species class experiment by Xception directly. Precision is the number of correct predictions as a certain class divided by the number of all predictions as the class returned by biologists or Xception. Recall, which is equivalent to sensitivity, is the number of correct predictions as a certain class divided by the number of test datasets as the class. F-score is the harmonic average of the precision and recall, (2 × precision × recall)/(precision + recall).

To show the effect of interspecific similarity on the accuracy of species identification, we used confusion matrix. The confusion matrix represents the relationship between true and predicted classes. Each row indicates the proportion of predicted classes in a true class. All correct predictions are located in the diagonal of the matrix, wrong predictions are located out of the diagonal. In species identification by biologists, “Others” class represents cases that they wrote no species name or a species name other than two honey bee species and 10 bumble bee species in the answer column.

Source: Ecology - nature.com

Deep learning increases the availability of organism photographs taken by citizens in citizen science programs

Citizen science program “Hanamaru-maruhana national census”

Bee photographs used for deep learning

Deep convolutional neural network (DCNN)

Species identification by biologists

Species identification in species class experiment by Xception

Species identification in color class experiment by Xception

The accuracy of species identification

Hotspots for social and ecological impacts from freshwater stress and storage loss

MIT Energy Initiative launches the Future Energy Systems Center

ITALIAN LANGUAGE

ENGLISH LANGUAGE