Abstract
Automatic recognition of insect sound could help us understand changing biodiversity trends around the world—but insect sounds are challenging to recognize even for deep learning, due to the broad frequency ranges and limited amount of training data. We present a new dataset comprised of 26298 audio files (226.6 hours), from 459 species of Orthoptera (310 species) and Cicadidae (149 species). InsectSet459 is the first large-scale dataset of insect sound that is easily applicable for developing novel deep-learning methods. Its recordings were made with a variety of audio recorders using varying sample rates to capture the extremely broad range of frequencies that insects produce. We benchmark performance with two state-of-the-art deep learning classifiers, demonstrating good performance but also significant room for improvement in acoustic insect classification. This dataset can serve as a realistic test case for implementing insect monitoring workflows, and as a challenging basis for the development of audio representation methods that can handle highly variable frequencies and/or sample rates.
Data availability
The dataset can be downloaded from Zenodo under the CC 4.0 license (https://doi.org/10.5281/zenodo.14056457).
Code availability
The code that was used to download the source data from iNaturalist and xeno-canto, as well as curating and processing the data as described in the methods section is published on Github (https://github.com/mariusfaiss/InsectSet459). All code for benchmarking the dataset using the InsectEffNet (https://github.com/danstowell/insect_classifier_GDSC23_insecteffnet) and the PaSST models (https://github.com/kkoutini/PaSST) is also available on Github.
References
Wagner, D. L., Grames, E. M., Forister, M. L., Berenbaum, M. R. & Stopak, D. Insect decline in the anthropocene: Death by a thousand cuts. Proceedings of the National Academy of Sciences, 118, https://doi.org/10.1073/pnas.2023989118 (2021).
Montgomery, G. A. et al. Is the insect apocalypse upon us? how to find out. Biological Conservation 241, 108327, https://doi.org/10.1016/j.biocon.2019.108327 (2020).
van Klink, R. et al. Emerging technologies revolutionise insect ecology and monitoring. Trends in Ecology & Evolution https://doi.org/10.1016/j.tree.2022.06 (2022).
Van Klink, R. et al. Towards a toolkit for global insect biodiversity monitoring. Philosophical Transactions of the Royal Society B 379, 20230101, https://doi.org/10.1098/rstb.2023.0101 (2024).
Kohlberg, A. B., Myers, C. R. & Figueroa, L. L. From buzzes to bytes: A systematic review of automated bioacoustics models used to detect, classify and monitor insects. Journal of Applied Ecology https://doi.org/10.1111/1365-2664.14630 (2024).
Riede, K. & Balakrishnan, R. Acoustic monitoring for tropical insect conservation. bioRxiv https://www.biorxiv.org/content/early/2024/07/05/2024.07.03.601657 (2024).
Riede, K. Acoustic profiling of Orthoptera: Present state and future needs. Journal of Orthoptera Research 27, 203–215 (2018).
Bennett, D., Nissen, H., Maschke, M. A., Reck, H. & Diekötter, T. Recent technological developments allow for passive acoustic monitoring of Orthoptera (grasshoppers and crickets) in research and conservation across a broad range of temporal and spatial scales. Basic and Applied Ecology 84, 147–157, https://doi.org/10.1016/j.baae.2025.03.004 (2025).
Pérez-Granados, C. BirdNet: applications, performance, pitfalls and future opportunities. Ibis (2023).
Sethi, S. S. et al. Large-scale avian vocalization detection delivers reliable global biodiversity insights. Proceedings of the National Academy of Sciences 121, https://doi.org/10.1073/pnas.2315933121 (2024).
Faiß, M. InsectSet47 & 66: Expanded datasets for automatic acoustic identification of insects (Orthoptera and Cicadidae) https://zenodo.org/record/7828438 (2023).
Faiß, M. & Stowell, D. Adaptive Representations of Sound for Automatic Insect Recognition. PLOS Computational Biology arXiv:2304.12739, https://doi.org/10.1371/journal.pcbi.1011541 (2023).
Vellinga, W.-P. & Planque, R. The Xeno-canto collection and its relation to sound recognition and classification https://ceur-ws.org/Vol-1391/166-CR.pdf (2015).
iNaturalist. available from https://www.inaturalist.org accessed 09 august 2024.
Chasmai, M., Shepard, A., Maji, S. & Horn, G. V. The iNaturalist sounds dataset. In Proceedings of the NeurIPS 2024 conference (Datasets and Benchmarks Track) https://doi.org/10.52202/079017-4213 (2024).
Faiß, M. & Stowell, D. Insectset459: A large dataset for automatic acoustic identification of insects (orthoptera and cicadidae). Zenodo https://doi.org/10.5281/zenodo.14056457 (2025).
Xeno-canto Foundation for Nature Sounds. Xeno-canto api query: Recordings of species in the “grasshoppers” group https://xeno-canto.org/api/2/recordings?query=grp:grasshoppers (2024).
GBIF.Org User. Occurrence download https://www.gbif.org/occurrence/download/0058185-240626123714530 (2024).
GBIF.Org User. Occurrence download https://www.gbif.org/occurrence/download/0058173-240626123714530 (2024).
Baker, E., Price, B. W., Rycroft, S. D., Hill, J. & Smith, V. S. BioAcoustica: a free and open repository and analysis platform for bioacoustics. Database 2015, bav054–bav054, https://doi.org/10.1093/database/bav054 (2015).
Faiß, M. InsectSet32: Dataset for automatic acoustic identification of insects (Orthoptera and Cicadidae) https://zenodo.org/record/7072196 (2022).
Döring, M., Jeppesen, T. & Bánki, O. Introducing checklistbank: An index and repository for taxonomic data. Biodiversity Information Science and Standards https://doi.org/10.3897/biss.6.93938 (2022).
Borowiec, M. L. et al. Deep learning as a tool for ecology and evolution. Methods in Ecology and Evolution 13, 1640–1660, https://doi.org/10.1111/2041-210X.13901 (2022).
Odé, B. et al. A list of sound producing European grasshoppers and the availability of their sounds in Xeno-canto 95–130, https://doi.org/10.62323/a100191 (2025).
SINA. Singing Insects of North America. https://orthsoc.org/sina/index.htm (2023).
Stowell, D. Computational bioacoustics with deep learning: a review and roadmap. PeerJ 10, https://doi.org/10.48550/arXiv.2112.06725 (2022).
Ghani, B. et al. Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics https://arxiv.org/abs/2409.15383 (2024).
Tan, M. & Le, Q. Efficientnetv2: Smaller models and faster training. In International conference on machine learning, 10096–10106, https://arxiv.org/abs/2104.00298 (PMLR, 2021).
Kahl, S., Wood, C. M., Eibl, M. & Klinck, H. BirdNET: A deep learning solution for avian diversity monitoring. Ecological Informatics, 61, 101236, https://doi.org/10.1016/j.ecoinf.2021.101236 (2021).
Dolbear, A. E. The Cricket as a Thermometer 31, 970–971 https://www.journals.uchicago.edu/doi/10.1086/276739 (1897).
Greenfield, M. D. Acoustic Communication in Orthoptera (CAB INTERNATIONAL, 1997).
Acknowledgements
We thank the many contributors to the xeno-canto and iNaturalist collections, for sharing their recordings, without whom this work would not have been possible. We highlight the following users who each contributed over 500 recordings: Baudewijn Odé (also for project advice), Christie (iNaturalist username), Joel Poyitt, K.-G. Heller, S. Ingrisch, Cedric Mroczko. The InsectEffNet classifier was developed as part of a CapGemini “Global Data Science Challenge”, supported by Amazon Web Services. The classifier code was contributed by a CapGemini team and authored by Raffaela Heily, Lukas Kemetinger, Dominik Lemm and Lucas Unterberger. It is used here with permission. MF was supported by the EU MSCA Doctoral Network Bioacoustic AI (BioacAI, 101071532). BG was supported by EU-funded Horizon projects MAMBO (101060639), GUARDEN (101060693) and TETTRIs (101081903).
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Contributions
M.F. and D.S. conceived the data selection process, wrote code for the data processing and figure creation, wrote and edited the draft; M.F. wrote code for downloading, processing and annotating the data; D.S. wrote the code for benchmarking the InsectEffNet model; B.G. developed the PaSST-based model; all authors reviewed the final draft.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary information: Classifier performance (download PDF )
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Reprints and permissions
About this article
Cite this article
Faiß, M., Ghani, B. & Stowell, D. A dataset of insect sounds from 459 species for bioacoustic machine learning.
Sci Data (2026). https://doi.org/10.1038/s41597-026-07123-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-026-07123-4
Source: Ecology - nature.com

