in

Efficient and scalable training set generation for automated pollen monitoring with Hirst-type samplers


Abstract

Automated pollen detection is essential for ecological monitoring, allergy forecasting, and biodiversity research. However, existing methods rely heavily on manual or semi-automated annotations, limiting scalability and broader applicability. We introduce a highly automated training dataset generation pipeline that combines one-shot detection with systematic refinement, producing tens of thousands of high-quality annotations from bright-field microscopy while significantly reducing manual effort and annotation costs. Using multi-regional datasets from France, Hungary, and Sweden, we trained object detection models on seven pollen taxa and evaluated their performance on both external pure and mixed species slides and real-world airborne samples. We assessed the reusability of pretrained vision models for pollen detection, aiming to reduce the need for extensive retraining. Using linear probing, we identified foundational Vision Transformers (ViTs) as the most effective feature extractors and integrated them into Faster R-CNN detection models. We benchmarked these models against ResNet50, a widely adopted backbone in biological imaging. On held-out regions of the training datasets, our models achieved high performance in both classification and detection tasks. On independent reference slides from other datasets, ViTs continued to outperform ResNet50 in classification. However, in full object detection and under real deployment conditions, ResNet50-based models remained competitive and achieved the highest accuracy for detecting Ambrosia, a major allergen with public health significance. Cross-dataset generalization remains a challenge, underscoring the need for domain adaptation techniques such as stain normalization and data augmentation. This study establishes a scalable framework for AI-assisted pollen monitoring, supporting large-scale slide digitization and enabling applications in long-term ecological research, allergen surveillance, and automated biodiversity assessment.

Similar content being viewed by others

Neural networks for increased accuracy of allergenic pollen monitoring

Explainable AI for unveiling deep learning pollen classification model based on fusion of scattered light patterns and fluorescence spectroscopy

Automated tick classification using deep learning and its associated challenges in citizen science

Data availability

All datasets used in this study, including digitized slides and corresponding hand annotations are available upon request. For additional information or specific requests, please contact the corresponding author.

Code availability

The source code utilized in this study is available from our GitHub repository at https://github.com/abiricz/pollen-auto-annot-init-paper. This repository includes all scripts and comprehensive documentation required to replicate the experiments and evaluations described in this work.

References

  1. Bastl, K., Berger, U. & Kmenta, M. Evaluation of pollen apps forecasts: The need for quality control in an ehealth service. J. Med. Internet Res. 19, e152 (2017).

    Google Scholar 

  2. Smith, M., Cecchi, L., Skjøth, C., Karrer, G. & Šikoparija, B. Common ragweed: A threat to environmental health in europe. Environ. Int. 61, 115–126. https://doi.org/10.1016/j.envint.2013.08.005 (2013).

    Google Scholar 

  3. Bourel, B. et al. Automated recognition by multiple convolutional neural networks of modern, fossil, intact and damaged pollen grains. Comput. & Geosci. 140, 104498. https://doi.org/10.1016/j.cageo.2020.104498 (2020).

    Google Scholar 

  4. Bertrand, C. et al. Seasonal shifts and complementary use of pollen sources by two bees, a lacewing and a ladybeetle species in european agricultural landscapes. J. Appl. Ecol. 56, 2431–2442, https://doi.org/10.1111/1365-2664.13483 (2019). https://besjournals.onlinelibrary.wiley.com/doi/pdf/10.1111/1365-2664.13483.

  5. Barnes, C. S. Impact of climate change on pollen and respiratory disease. Curr. Allergy Asthma Reports 18, 59. https://doi.org/10.1007/s11882-018-0813-7 (2018).

    Google Scholar 

  6. Galán, C. et al. Pollen monitoring: minimum requirements and reproducibility of analysis. Aerobiologia 30, 385–395. https://doi.org/10.1007/s10453-014-9335-5 (2014).

    Google Scholar 

  7. Halbritter, H. et al. Illustrated Pollen Terminology (2018).

  8. Holt, K. A. & Bennett, K. D. Principles and methods for automated palynology. New Phytol. 203, 735–742, https://doi.org/10.1111/nph.12848 (2014). https://nph.onlinelibrary.wiley.com/doi/pdf/10.1111/nph.12848.

  9. Holt, K., Allen, G., Hodgson, R., Marsland, S. & Flenley, J. Progress towards an automated trainable pollen location and classifier system for use in the palynology laboratory. Rev. Palaeobot. Palynol. 167, 175–183. https://doi.org/10.1016/j.revpalbo.2011.08.006 (2011).

    Google Scholar 

  10. Clot, B. et al. The eumetnet autopollen programme: establishing a prototype automatic pollen monitoring network in europe. Aerobiologia 40, 3–11. https://doi.org/10.1007/s10453-020-09666-4 (2024).

    Google Scholar 

  11. Sauvageat, E. et al. Real-time pollen monitoring using digital holography. Atmospheric Meas. Tech. 13, 1539–1550. https://doi.org/10.5194/amt-13-1539-2020 (2020).

    Google Scholar 

  12. Oteros, J. et al. An operational robotic pollen monitoring network based on automatic image recognition. Environ. Res. 191, 110031. https://doi.org/10.1016/j.envres.2020.110031 (2020).

    Google Scholar 

  13. Grant-Jacob, J. A., Praeger, M., Eason, R. W. & Mills, B. In-flight sensing of pollen grains via laser scattering and deep learning. Eng. Res. Express 3, 025021. https://doi.org/10.1088/2631-8695/abfdf8 (2021).

    Google Scholar 

  14. Cholleton, D. et al. Laboratory evaluation of the scattering matrix of ragweed, ash, birch and pine pollen towards pollen classification. Atmospheric Meas. Tech. 15, 1021–1032. https://doi.org/10.5194/amt-15-1021-2022 (2022).

    Google Scholar 

  15. Jardine, P. E., Gosling, W. D., Lomax, B. H., Julier, A. C. M. & Fraser, W. T. Chemotaxonomy of domesticated grasses: a pathway to understanding the origins of agriculture. J. Micropalaeontology 38, 83–95. https://doi.org/10.5194/jm-38-83-2019 (2019).

    Google Scholar 

  16. Dunker, S. et al. Pollen analysis using multispectral imaging flow cytometry and deep learning. New Phytol. 229, 593–606, https://doi.org/10.1111/nph.16882 (2021). https://nph.onlinelibrary.wiley.com/doi/pdf/10.1111/nph.16882.

  17. Lang, D., Tang, M., Hu, J. & Zhou, X. Genome-skimming provides accurate quantification for pollen mixtures. Mol. Ecol.Resour. 19, 1433–1446, https://doi.org/10.1111/1755-0998.13061 (2019). https://onlinelibrary.wiley.com/doi/pdf/10.1111/1755-0998.13061.

  18. Viertel, P. & König, M. Pattern recognition methodologies for pollen grain image classification: a survey. Mach. Vis. Appl. 33, 18. https://doi.org/10.1007/s00138-021-01271-w (2022).

    Google Scholar 

  19. HIRST, J. M. An automatic volumetric spore trap. Annals Appl. Biol. 39, 257–265 https://doi.org/10.1111/j.1744-7348.1952.tb00904.x (1952). https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1744-7348.1952.tb00904.x.

  20. Thibaudon, M. The allergy risk associated with pollens in france [risque allergigue lié aux pollens en france]. Eur. AnnalsAllergy Clin. Immunol. 35, 170–172 (2003).

    Google Scholar 

  21. Lind, T. et al. Pollen season trends (1973–2013) in stockholm area, sweden. PLOS ONE 11, e0166887. https://doi.org/10.1371/journal.pone.0166887 (2016).

    Google Scholar 

  22. Jocher, G., Chaurasia, A. & Qiu, J. Ultralytics YOLO (2023).

  23. Ren, S., He, K., Girshick, R. & Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Cortes, C., Lawrence, N., Lee, D., Sugiyama, M. & Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28 (Curran Associates, Inc., 2015).

  24. Ren, S., He, K., Girshick, R. & Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031 (2017).

    Google Scholar 

  25. Li, Y. et al. Benchmarking detection transfer learning with vision transformers. ArXiv arXiv:2111.11429 (2021).

  26. Li, B., Li, J., Zhu, Z., Zhao, L. & Cheng, W. A deep learning based method for microscopic object localization and classification. In 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC), 1426–1431, https://doi.org/10.1109/COMPSAC54236.2022.00226 (2022).

  27. Kubera, E. et al. Towards automation of pollen monitoring: Image-based tree pollen recognition. In (eds Ceci, M., Flesca, S., Masciari, E., Manco, G. & Raś, Z. W.) Foundations of Intelligent Systems, 219–229 (Springer International Publishing, Cham, 2022).

  28. Gallardo, R. et al. Automated multifocus pollen detection using deep learning. Multimed. Tools Appl. https://doi.org/10.1007/s11042-024-18450-2 (2024).

    Google Scholar 

  29. Chaves, A. J. et al. Pollen recognition through an open-source web-based system: automated particle counting for aerobiological analysis. Earth Sci. Inform. 17, 699–710. https://doi.org/10.1007/s12145-023-01189-z (2024).

    Google Scholar 

  30. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778, https://doi.org/10.1109/CVPR.2016.90 (2016).

  31. Dosovitskiy, A. et al. An image is worth 16×16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (2021).

  32. Touvron, H. et al. Training data-efficient image transformers & distillation through attention. In Proceedings of the 38th International Conference on Machine Learning (eds. Meila, M. & Zhang, T.) vol. 139 of Proceedings of Machine Learning Research, 10347–10357 (PMLR, 2021).

  33. Liu, Z. et al. Swin transformer: Hierarchical vision transformer using shifted windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 9992–10002 (2021).

  34. Carion, N. et al. End-to-end object detection with transformers. In (eds. Vedaldi, A., Bischof, H., Brox, T. & Frahm, J.-M.) Computer Vision – ECCV 2020, 213–229 (Springer International Publishing, Cham, 2020).

  35. McKinney, S. M. et al. International evaluation of an ai system for breast cancer screening. Nature 577, 89–94. https://doi.org/10.1038/s41586-019-1799-6 (2020).

    Google Scholar 

  36. Battiato, S. et al. Detection and classification of pollen grain microscope images. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 4220–4227, https://doi.org/10.1109/CVPRW50498.2020.00498 (2020).

  37. Olsson, O. et al. Efficient, automated and robust pollen analysis using deep learning. Methods Ecol. Evol. 12, 850–862, https://doi.org/10.1111/2041-210X.13575https://besjournals.onlinelibrary.wiley.com/doi/pdf/10.1111/2041-210X.13575 (2021).

  38. Li, J. et al. How to identify pollen like a palynologist: A prior knowledge-guided deep feature learning for real-world pollen classification. Expert Syst. Appl. 237, 121392. https://doi.org/10.1016/j.eswa.2023.121392 (2024).

    Google Scholar 

  39. Polling, M. et al. Neural networks for increased accuracy of allergenic pollen monitoring. Sci. Rep. 11, 11357. https://doi.org/10.1038/s41598-021-90433-x (2021).

    Google Scholar 

  40. Yang, N., Joos, V., Jacquemart, A. L., Buyens, C. & De Vleeschouwer, C. Using pure pollen species when training a cnn to segment pollen mixtures. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 1694–1703, https://doi.org/10.1109/CVPRW56347.2022.00176 (2022).

  41. Dhamija, A. R., Günther, M., Ventura, J. & Boult, T. E. The overlooked elephant of object detection: Open set. 2020 IEEE Winter Conference on Applications of Computer Vision (WACV) 1010–1019 (2020).

  42. Joseph, K. J., Khan, S. H., Khan, F. S. & Balasubramanian, V. N. Towards open world object detection. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 5826–5836 (2021).

  43. Minderer, M. Simple. et al. 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings. Part X 728–755, 2022. https://doi.org/10.1007/978-3-031-20080-9_42 (Springer-Verlag, Berlin, Heidelberg, 2022).

  44. Fang, Y. et al. You only look at one sequence: Rethinking transformer in vision through object detection. In Neural Information Processing Systems (2021).

  45. Caron, M. et al. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 9650–9660 (2021).

  46. Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862. https://doi.org/10.1038/s41591-024-02857-3 (2024).

    Google Scholar 

  47. von Allmen, R. et al. Method development and application of object detection and classification to quaternary fossil pollen sequences. Quat. Sci. Rev. 327, 108521. https://doi.org/10.1016/j.quascirev.2024.108521 (2024).

    Google Scholar 

  48. Sevillano, V., Holt, K. & Aznarte, J. L. Precise automatic classification of 46 different pollen types with convolutional neural networks. PLOS ONE 15, 1–15. https://doi.org/10.1371/journal.pone.0229751 (2020).

    Google Scholar 

  49. Zu, B. et al. Reswint: enhanced pollen image classification with parallel window transformer and coordinate attention. Vis. Comput. https://doi.org/10.1007/s00371-024-03701-y (2024).

    Google Scholar 

  50. Zhang, C.-J. et al. Deeppollencount: a swin-transformer-yolov5-based deep learning method for pollen counting in various plant species. Aerobiologia 40, 425–436. https://doi.org/10.1007/s10453-024-09828-8 (2024).

    Google Scholar 

  51. Kubera, E., Kubik-Komar, A., Kurasiński, P., Piotrowska-Weryszko, K. & Skrzypiec, M. Detection and recognition of pollen grains in multilabel microscopic images. Sensors 22, 2690. https://doi.org/10.3390/s22072690 (2022).

    Google Scholar 

  52. Li, Y., Mao, H., Girshick, R. & He, K. Exploring plain vision transformer backbones for object detection. In Computer Vision – ECCV 2022 (eds. Avidan, S., Brostow, G., Cissé, M., Farinella, G. M. & Hassner, T.) 280–296 (Springer Nature Switzerland, Cham, 2022).

  53. Zhu, X. et al. Deformable detr: Deformable transformers for end-to-end object detection. ArXiv arXiv:2010.04159 (2020).

  54. Meng, D. et al. Conditional detr for fast training convergence. 3631–3640, https://doi.org/10.1109/ICCV48922.2021.00363 (2021).

  55. Tummon, F. et al. Recommended terminology for aerobiological studies: automatic and real-time monitoring methods. Aerobiologia. 41, 847-853, https://doi.org/10.1007/s10453-025-09879-5 (2025).

  56. Selesnick, I., Baraniuk, R. & Kingsbury, N. The dual-tree complex wavelet transform. IEEE Signal Process. Mag. 22, 123–151. https://doi.org/10.1109/MSP.2005.1550194 (2005).

    Google Scholar 

  57. Ravi, J. & Narmadha, R. Optimized dual-tree complex wavelet transform aided multimodal image fusion with adaptive weighted average fusion strategy. Sci. Rep. 14, 30246. https://doi.org/10.1038/s41598-024-81594-6 (2024).

    Google Scholar 

Download references

Acknowledgements

This work was primarily supported by Information in Images Ltd. Special thanks to Michael Broderick, the director of the company, whose support was instrumental in restarting this research. We also acknowledge Zsolt Bedőházi for his contributions to the initial software development and preliminary prototyping. We are grateful to the teams at the National Public Health Center, the Swedish Museum of Natural History, and the Réseau National de Surveillance Aérobiologique in Lyon for their efforts in preparing the data and providing reference samples. A special acknowledgment is extended to János Fillinger and his team for providing access to their facility for scanning the samples. Their expertise in pathology brought a valuable external perspective beyond the field of pollen monitoring, further enriching this study. We thank Viktor Varga for his valuable input in the final refinement of the manuscript, including suggestions for minor corrections and additional evaluations that improved the clarity and completeness of the work. The authors thank the Wigner Scientific Computing Laboratory (WSCLAB) for providing computational resources that enabled large-scale evaluations and experiments for this publication. All code development was conducted independently prior to these computations, ensuring the integrity of proprietary research and potential industrial applications.

Funding

This work was further supported by the National Research, Development, and Innovation Office of Hungary within the framework of the MILAB Artificial Intelligence National Laboratory (RRF-2.3.1-21-2022-00004) (I.C.) and the Data-Driven Health Division of National Laboratory for Health Security (RRF-2.3.1-21-2022-00006) (P.P.) and under grant No. 2020-1.1.2-PIACI-KFI-2021-00298 (A.B.). Finally, we sincerely thank Semmelweis University for generously covering the publication fee for this paper.

Author information

Authors and Affiliations

Authors

Contributions

All authors read and approved the final version of the manuscript. András Biricz: conceptualization, data curation, formal analysis, investigation, methodology, project administration, software, validation, writing – original draft Donát Magyar: resources, project administration, validation, writing – review & editing Björn Gedda: resources, project administration, validation, writing – review & editing Antonio Spanu: resources, validation, writing – review & editing János Fillinger: data curation, resources, project administration, validation Adrián Pesti: data curation, resources, validation István Csabai: conceptualization, funding acquisition, project administration, supervision, writing – review & editing Péter Pollner: conceptualization, funding acquisition, project administration, supervision, writing – review & editing

Corresponding authors

Correspondence to
András Biricz or Péter Pollner.

Ethics declarations

Competing interests

András Biricz reports contractual work with Information in Images Ltd., directed by Michael Broderick, which supported this study and is engaged in the commercial sale of microscopy devices. The company may potentially benefit from findings related to digital microscopy and dataset generation. All other authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Biricz, A., Magyar, D., Gedda, B. et al. Efficient and scalable training set generation for automated pollen monitoring with Hirst-type samplers.
Sci Rep (2025). https://doi.org/10.1038/s41598-025-31646-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41598-025-31646-2

Keywords

  • Airborne allergen analysis
  • Automated pollen detection
  • Deep learning
  • Hirst-type sampler
  • Open-vocabulary object detection
  • Pollen monitoring
  • Vision Transformer


Source: Ecology - nature.com

Reconstructing Late Quaternary coastal landscapes by a machine-learning framework

Integrated assessment of greenhouse gas emissions in extensive livestock farming systems

Back to Top