in

Improving rare-class detection in deep-sea imagery via generative augmentation with stable diffusion


Abstract

Megabenthos play a critical role in maintaining deep-sea ecosystem stability, making accurate detection important for deep-sea conservation. However, the high cost of deep-sea exploration and the long-tailed distribution of available datasets lead to severe data scarcity for rare species, limiting deep-sea benthos detection. To address this challenge, we propose a data augmentation framework based on Stable Diffusion (SD) and ControlNet. Specifically, we fine-tune a pretrained SD model using Low-Rank Adaptation (LoRA) to synthesize images of rare benthos, and leverage ControlNet to composite the generated targets into deep-sea backgrounds with controllable layouts and automatic bounding-box annotation. We constructed two megabenthos datasets collected using an optically tethered underwater vehicle (OTV) and an autonomous underwater vehicle (AUV), covering 16 biological categories; data augmentation was applied to 7 rare species with the fewest samples. The generated images achieved a Fréchet Inception Distance (FID) of 117.11 and an Inception Score (IS) of 4.97. When combined with real data for RT-DETR training, the augmentation strategy increased the AP50-95 and AP50 on the OTV dataset to 45.2% and 75.2%, representing improvements of 3.7% and 6.1% over the baseline. Similarly, on the AUV dataset, it increased the AP50-95 and AP50 to 36.8% and 64.7%, yielding enhancements of 2.2% and 4.2% over the baseline. Gains were especially pronounced for tail classes, with AP50-95 increased by 23.6% and 21.9% for Octopus and Bryozoa on the OTV dataset, and by 15.1% and 14.6% for Bryozoa and Hydrozoa on the AUV dataset. Moreover, the proposed approach outperforms traditional augmented methods by 1.6% and 0.8% in AP50-95 on the OTV and AUV datasets, respectively, indicating its utility for improving detection in deep-sea megabenthic surveys.

Data availability

The source code and trained models weights are publicly available on GitHub at https://github.com/dengjunlan/deepsea-benthos-SD-Augmentation. The raw datasets and metadata supporting this study are available from the corresponding author upon reasonable request, subject to the data protection regulations of the Beijing Pioneer Hi-Tech Development Corporation.

References

  1. Shen, C. et al. Dissimilarity of megabenthic community structure between deep-water seamounts with cobalt-rich crusts: Case study in the northwestern Pacific Ocean. Sci. Total Environ. 945, 173914 (2024).

    Google Scholar 

  2. Gallego, R. et al. North Atlantic deep-sea benthic biodiversity unveiled through sponge natural sampler DNA. Commun. Biol. 7, 1015 (2024).

    Google Scholar 

  3. Jones, D. O. B. et al. Long-term impact and biological recovery in a deep-sea mining track. Nature 642, 112–118 (2025).

    Google Scholar 

  4. Stewart, E. C. D. et al. Impacts of an industrial deep-sea mining trial on macrofaunal biodiversity. Nat. Ecol. Evol. https://doi.org/10.1038/s41559-025-02911-4 (2025).

    Google Scholar 

  5. Liu, A. et al. DeepSeaNet: A bio-detection network enabling species identification in the deep sea imagery. IEEE Trans. Geosci. Remote Sens. 62, 1–13 (2024).

    Google Scholar 

  6. Gazis, I.-Z. et al. Monitoring benthic plumes, sediment redeposition and seafloor imprints caused by deep-sea polymetallic nodule mining. Nat. Commun. 16, 1229 (2025).

    Google Scholar 

  7. Tao, J. et al. Diffusion-enhanced underwater debris detection via improved YOLOv12n framework. Remote Sens. 17, 3910 (2025).

    Google Scholar 

  8. Liu, Z. et al. Water-aware real-time detection of floating plastic debris via an enhanced YOLOv13 framework for aquatic pollution monitoring. Expert Syst. Appl. 313, 131552 (2026).

    Google Scholar 

  9. Zhao, F. et al. Mamba-based super-resolution and semi-supervised YOLOv10 for freshwater mussel detection using acoustic video camera: A case study at Lake Izunuma, Japan. Ecol. Inform. 90, 103324 (2025).

    Google Scholar 

  10. Zhao, F. et al. A novel underwater Holothurians monitoring system using consumer-grade amphibious UAV with Mamba-based Super-Resolution Reconstruction and enhanced YOLOv10. Mar. Environ. Res. 212, 107510 (2025).

    Google Scholar 

  11. Zhu, Z. et al. Seafloor classification by fusing AUV acoustic and magnetic data: Toward complex deep-sea environments. IEEE Trans. Geosci. Remote Sens. 63, 1–15 (2025).

    Google Scholar 

  12. Zhivkoplias, E., Jouffray, J.-B., Dunshirn, P., Pranindita, A. & Blasiak, R. Growing prominence of deep-sea life in marine bioprospecting. Nat. Sustain. 7, 1027–1037 (2024).

    Google Scholar 

  13. Mbani, B., Buck, V. & Greinert, J. An automated image-based workflow for detecting megabenthic fauna in optical images with examples from the Clarion–Clipperton Zone. Sci. Rep. 13, 8350 (2023).

    Google Scholar 

  14. Katija, K. et al. FathomNet: A global image database for enabling artificial intelligence in the ocean. Sci. Rep. 12, 15914 (2022).

    Google Scholar 

  15. Lowe, S. C. et al. BenthicNet: A global compilation of seafloor images for deep learning applications. Sci. Data 12, 230 (2025).

    Google Scholar 

  16. Johnson, J. M. & Khoshgoftaar, T. M. Survey on deep learning with class imbalance. J. Big Data 6, 27 (2019).

    Google Scholar 

  17. Durden, J. M., Hosking, B., Bett, B. J., Cline, D. & Ruhl, H. A. Automated classification of fauna in seabed photographs: The impact of training and validation dataset size, with considerations for the class imbalance. Prog. Oceanogr. 196, 102612 (2021).

    Google Scholar 

  18. Zang, Y., Huang, C. & Change Loy, C. FASA: Feature augmentation and sampling adaptation for long-tailed instance segmentation. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 3437–3446 (IEEE, 2021). https://doi.org/10.1109/ICCV48922.2021.00344.

  19. Li, J., Wang, Q., Ma, J. & Guo, J. Multi‐defect segmentation from façade images using balanced copy–paste method. Comput. Aided Civ. Infrastruct. Eng. 37, 1434–1449 (2022).

    Google Scholar 

  20. Wang, T. et al. The devil is in classification: A simple framework for long-tail object detection and instance segmentation. In Computer Vision–ECCV 2020: 16th European Conference vol. Proceedings, Part XIV 16 728–744 (Springer International Publishing, 2020).

  21. Cui, Y., Jia, M., Lin, T.-Y., Song, Y. & Belongie, S. Class-balanced loss based on effective number of samples. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 9268–9277 (IEEE, 2019).

  22. Sharma, T., Cline, D. E. & Edgington, D. Making use of unlabeled data: Comparing strategies for marine animal detection in long-tailed datasets using self-supervised and semi-supervised pre-training. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 1224–1233 (IEEE, 2024). https://doi.org/10.1109/CVPRW63382.2024.00129.

  23. Tan, J. et al. Equalization loss for long-tailed object recognition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 11659–11668 (IEEE, 2020). https://doi.org/10.1109/CVPR42600.2020.01168.

  24. Zhao, Y. et al. DETRs beat YOLOs on real-time object detection. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 16965–16974 (IEEE, 2024). https://doi.org/10.1109/CVPR52733.2024.01605.

  25. Cui, Y., Song, Y., Sun, C., Howard, A. & Belongie, S. Large scale fine-grained categorization and domain-specific transfer learning. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 4109–4118 (IEEE, 2018). https://doi.org/10.1109/CVPR.2018.00432.

  26. Chen, H., Wang, Y., Wang, G. & Qiao, Y. LSTD: A low-shot transfer detector for object detection. In Proceedings of the AAAI Conference on Artificial Intelligence Vol. 32 (2018).

  27. Guo, C. & Huang, H. Enhancing camouflaged object detection through contrastive learning and data augmentation techniques. Eng. Appl. Artif. Intell. 141, 109703 (2025).

    Google Scholar 

  28. Takahashi, R., Matsubara, T. & Uehara, K. Data augmentation using random image cropping and patching for deep CNNs. IEEE Trans. Circuits Syst. Video Technol. 30, 2917–2931 (2020).

    Google Scholar 

  29. Zhong, Z., Zheng, L., Kang, G., Li, S. & Yang, Y. Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34 13001–13008 (2020).

  30. Wang, K. et al. Perspective transformation data augmentation for object detection. IEEE Access 8, 4935–4943 (2020).

    Google Scholar 

  31. Yun, S. et al. CutMix: Regularization strategy to train strong classifiers with localizable features. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV) 6022–6031 (IEEE, 2019). https://doi.org/10.1109/ICCV.2019.00612.

  32. Kaur, P., Khehra, B. S. & Mavi, Er. B. S. Data augmentation for object detection: A review. In 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS) 537–543 (IEEE, 2021). https://doi.org/10.1109/MWSCAS47672.2021.9531849.

  33. Salimans, T. et al. Improved techniques for training GANs. In Advances in Neural Information Processing Systems Vol. 29 (Curran Associates, Inc., 2016).

  34. Sandfort, V., Yan, K., Pickhardt, P. J. & Summers, R. M. Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks. Sci. Rep. 9, 16884 (2019).

    Google Scholar 

  35. Li, X., Lu, J., Han, K. & Prisacariu, V. A. SD4Match: Learning to prompt stable diffusion model for semantic matching. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 27548–27558 (IEEE, 2024). https://doi.org/10.1109/CVPR52733.2024.02602.

  36. Deng, B. & Lu, Y. Weed image augmentation by ControlNet-added Stable Diffusion for multi-class weed detection. Comput. Electron. Agric. 232, 110123 (2025).

    Google Scholar 

  37. M. Abdulghani, A., M. Abdulghani, M., L. Walters, W. & H. Abed, K. Multiple data augmentation strategy for enhancing the performance of YOLOv7 object detection algorithm. J. Artif. Intell. 5, 15–30 (2023).

    Google Scholar 

  38. Hong, S., Choi, B., Ham, Y., Jeon, J. & Kim, H. Massive-scale construction dataset synthesis through Stable Diffusion for machine learning training. Adv. Eng. Inform. 62, 102866 (2024).

    Google Scholar 

  39. Moreno, H., Gómez, A., Altares-López, S., Ribeiro, A. & Andújar, D. Analysis of Stable Diffusion-derived fake weeds performance for training convolutional neural networks. Comput. Electron. Agric. 214, 108324 (2023).

    Google Scholar 

  40. Liang, Y. et al. A Stable Diffusion enhanced YOLOV5 model for metal stamped part defect detection based on improved network structure. J. Manuf. Process. 111, 21–31 (2024).

    Google Scholar 

  41. Ghahfarokhi, S. S. et al. Deep learning for automated detection of breast cancer in deep ultraviolet fluorescence images with diffusion probabilistic model. In 2024 IEEE International Symposium on Biomedical Imaging (ISBI) 1–5 (IEEE, 2024). https://doi.org/10.1109/ISBI56570.2024.10635349.

  42. Ruiz-Ponce, P., Ortiz-Perez, D., Garcia-Rodriguez, J. & Kiefer, B. POSEIDON: A data augmentation tool for small object detection datasets in maritime environments. Sensors 23, 3691 (2023).

    Google Scholar 

  43. Zhang, T. et al. Advancing controllable diffusion model for few-shot object detection in optical remote sensing imagery. In IGARSS 2024 – 2024 IEEE International Geoscience and Remote Sensing Symposium 7600–7603 (IEEE, 2024). https://doi.org/10.1109/IGARSS53475.2024.10642625.

  44. Liu, B., Su, S. & Wei, J. The effect of data augmentation methods on pedestrian object detection. Electronics 11, 3185 (2022).

    Google Scholar 

  45. Lee, H., Kang, S. & Chung, K. Robust data augmentation generative adversarial network for object detection. Sensors 23, 157 (2022).

    Google Scholar 

  46. Yigit, M. & Can, A. B. GISD: Generation of corner cases in infrared autonomous driving dataset with stable diffusion. In Applications of Machine Learning 2024 (eds Narayanan, B., Zelinski, M. E., Taha, T. M., Awwal, A. A. & Iftekharuddin, K. M.) (SPIE, 2024). https://doi.org/10.1117/12.3027776.

  47. Xu, Y., Zhang, Y., Wang, H. & Liu, X. Underwater image classification using deep convolutional neural networks and data augmentation. In 2017 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC) 1–5. https://doi.org/10.1109/ICSPCC.2017.8242527 (2017).

  48. Yang, Z., Zhao, J., Yu, Y. & Huang, C. A sample augmentation method for side-scan sonar full-class images that can be used for detection and segmentation. IEEE Trans. Geosci. Remote Sens. 62, 1–11 (2024).

    Google Scholar 

  49. Huang, C., Zhao, J., Zhang, H. & Yu, Y. Seg2Sonar: A full-class sample synthesis method applied to underwater sonar image target detection, recognition, and segmentation tasks. IEEE Trans. Geosci. Remote Sens. 62, 1–19 (2024).

    Google Scholar 

  50. Walker, J., Yamada, T., Prugel-Bennett, A. & Thornton, B. The effect of physics-based corrections and data augmentation on transfer learning for segmentation of benthic imagery. In 2019 IEEE Underwater Technology (UT) 1–8 (IEEE, 2019). https://doi.org/10.1109/UT.2019.8734463.

  51. Cheng, C., Hou, X., Wen, X., Liu, W. & Zhang, F. Small-sample underwater target detection: A joint approach utilizing diffusion and YOLOv7 model. Remote Sens. 15, 4772 (2023).

    Google Scholar 

  52. Noh, J.-M., Jang, G.-R., Ha, K.-N. & Park, J.-H. Data augmentation method for object detection in underwater environments. In 2019 19th International Conference on Control, Automation and Systems (ICCAS) 324–328. https://doi.org/10.23919/ICCAS47443.2019.8971728 (2019).

  53. Huang, H. et al. Faster R-CNN for marine organisms detection and recognition using data augmentation. Neurocomputing 337, 372–384 (2019).

    Google Scholar 

  54. Peng, Y.-T., Lin, Y.-C., Peng, W.-Y. & Liu, C.-Y. Blurriness-guided underwater salient object detection and data augmentation. IEEE J. Ocean Eng. 49, 1089–1103 (2024).

    Google Scholar 

  55. Teng, S. et al. Unsupervised learning method for underwater concrete crack image enhancement and augmentation based on cross domain translation strategy. Eng. Appl. Artif. Intell. 136, 108884 (2024).

    Google Scholar 

  56. Dubrovinskaya, E. & Tuhtan, J. A. This fish does not exist: Fish species image augmentation using stable diffusion. In OCEANS 2023 – Limerick 1–6. https://doi.org/10.1109/OCEANSLimerick52467.2023.10244720 (2023).

  57. Zhang, H., Yao, F., Gong, Y. & Zhang, Q. Anemone image generation based on Diffusion-Stylegan2. IEEE Access 12, 37310–37325 (2024).

    Google Scholar 

  58. Liu, C. et al. A new dataset, Poisson GAN and AquaNet for underwater object grabbing. IEEE Trans. Circuits Syst. Video Technol. 32, 2831–2844 (2022).

    Google Scholar 

  59. Prakljačić, S., Grbić, R., Vranješ, M. & Herceg, M. Tool for image annotation in context of modern object detection. In 2024 Zooming Innovation in Consumer Technologies Conference (ZINC) 48–53. https://doi.org/10.1109/ZINC61849.2024.10579415 (2024).

  60. Amon, D. et al. Megafauna of the UKSRL exploration contract area and eastern Clarion-Clipperton Zone in the Pacific Ocean: Annelida, Arthropoda, Bryozoa, Chordata, Ctenophora, Mollusca. Biodivers. Data J. 5, e14598 (2017).

    Google Scholar 

  61. Simon-Lledó, E. et al. Carbonate compensation depth drives abyssal biogeography in the northeast Pacific. Nat. Ecol. Evol. 7, 1388–1397 (2023).

    Google Scholar 

  62. Wang, S., Yu, L. & Li, J. LoRA-GA: Low-rank adaptation with gradient approximation.

  63. Zhang, L., Rao, A. & Agrawala, M. Adding conditional control to text-to-image diffusion models. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV) 3813–3824 (IEEE, 2023). https://doi.org/10.1109/ICCV51070.2023.00355.

  64. Wallace, B. et al. Diffusion model alignment using direct preference optimization. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 8228–8238 (IEEE, 2024). https://doi.org/10.1109/CVPR52733.2024.00786.

  65. Lee, Y., Park, K., Cho, Y., Lee, Y.-J. & Hwang, S. J. KOALA: Empirical lessons toward memory-efficient and fast diffusion models for text-to-image synthesis. Adv. Neural Inf. Process. Syst. 37, 51597–51633 (2024).

    Google Scholar 

  66. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B. & Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems Vol. 30 (Curran Associates, Inc., 2017).

  67. Arbel, M., Sutherland, D. J., Bińkowski, M. aj & Gretton, A. On gradient regularizers for MMD GANs. In Advances in Neural Information Processing Systems Vol. 31 (Curran Associates, Inc., 2018).

  68. Chong, M. J. & Forsyth, D. Effectively unbiased FID and inception score and where to find them. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 6069–6078 (IEEE, 2020). https://doi.org/10.1109/CVPR42600.2020.00611.

  69. Hessel, J., Holtzman, A., Forbes, M., Le Bras, R. & Choi, Y. CLIPScore: A reference-free evaluation metric for image captioning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 7514–7528 (Association for Computational Linguistics, 2021). https://doi.org/10.18653/v1/2021.emnlp-main.595.

  70. Jayasumana, S. et al. Rethinking FID: Towards a better evaluation metric for image generation. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 9307–9315 (IEEE, 2024). https://doi.org/10.1109/CVPR52733.2024.00889.

  71. Horváth, D., Erdős, G., Istenes, Z., Horváth, T. & Földi, S. Object detection using Sim2Real domain randomization for robotic applications. IEEE Trans. Robot. 39, 1225–1243 (2023).

    Google Scholar 

  72. Wilson, S., Fischer, T., Dayoub, F., Miller, D. & Sünderhauf, N. SAFE: Sensitivity-aware features for out-of-distribution object detection. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV) 23508–23519 (IEEE, 2023). https://doi.org/10.1109/ICCV51070.2023.02154.

  73. Chen, S., Sun, P., Song, Y. & Luo, P. DiffusionDet: Diffusion model for object detection. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV) 19773–19786 (IEEE, 2023). https://doi.org/10.1109/ICCV51070.2023.01816.

  74. Bolya, D., Foley, S., Hays, J. & Hoffman, J. TIDE: A general toolbox for identifying object detection errors. https://doi.org/10.48550/arXiv.2008.08115 (2020).

  75. Cai, Z., Fan, Q., Feris, R. S. & Vasconcelos, N. A unified multi-scale deep convolutional neural network for fast object detection. In Computer Vision—ECCV 2016 Vol. 9908 (eds Leibe, B. et al.) 354–370 (Springer International Publishing, 2016).

    Google Scholar 

  76. Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K. & Yuille, A. L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2017).

    Google Scholar 

  77. Akyon, F. C., Onur Altinuc, S. & Temizel, A. Slicing aided hyper inference and fine-tuning for small object detection. In 2022 IEEE International Conference on Image Processing (ICIP) 966–970. https://doi.org/10.1109/ICIP46576.2022.9897990 (2022).

  78. Lin, T.-Y. et al. Feature pyramid networks for object detection. In 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2117–2125 (2017).

  79. Li, Y., Chen, Y., Wang, N. & Zhang, Z.-X. Scale-aware trident networks for object detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV) 6053–6062 (IEEE, 2019). https://doi.org/10.1109/ICCV.2019.00615.

  80. Hanser, T. et al. Data-driven federated learning in drug discovery with knowledge distillation. Nat. Mach. Intell. 7, 423–436 (2025).

    Google Scholar 

  81. Ma, J. et al. A generalizable pathology foundation model using a unified knowledge distillation pretraining framework. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-025-01488-4 (2025).

    Google Scholar 

Download references

Acknowledgements

This work was supported by the High-performance Computing Platform of China University of Geosciences Beijing.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No.52394252), and the National Key R&D Program of China (Grant No.2022YFC2804003 and Grant No.2023YFC2811405-4).

Author information

Authors and Affiliations

Authors

Contributions

J.D. conducted the literature search and conceived the experiments. M.D. collected the data and performed statistical analysis. D.W. prepared the figures. X.H. validated the dataset annotations. W.S. and J.X. led the preparation of the final version and provided critical feedback. All authors reviewed the manuscript.

Corresponding author

Correspondence to
Jianxin Xia.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information. (download DOCX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Deng, J., Duan, M., Wei, D. et al. Improving rare-class detection in deep-sea imagery via generative augmentation with stable diffusion.
Sci Rep (2026). https://doi.org/10.1038/s41598-026-45732-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41598-026-45732-6


Source: Ecology - nature.com

Biodiminution of lithium in forest floor food webs

Alleviating water scarcity by alternative cropping systems in the North China Plain

Back to Top