Abstract
Megabenthos play a critical role in maintaining deep-sea ecosystem stability, making accurate detection important for deep-sea conservation. However, the high cost of deep-sea exploration and the long-tailed distribution of available datasets lead to severe data scarcity for rare species, limiting deep-sea benthos detection. To address this challenge, we propose a data augmentation framework based on Stable Diffusion (SD) and ControlNet. Specifically, we fine-tune a pretrained SD model using Low-Rank Adaptation (LoRA) to synthesize images of rare benthos, and leverage ControlNet to composite the generated targets into deep-sea backgrounds with controllable layouts and automatic bounding-box annotation. We constructed two megabenthos datasets collected using an optically tethered underwater vehicle (OTV) and an autonomous underwater vehicle (AUV), covering 16 biological categories; data augmentation was applied to 7 rare species with the fewest samples. The generated images achieved a Fréchet Inception Distance (FID) of 117.11 and an Inception Score (IS) of 4.97. When combined with real data for RT-DETR training, the augmentation strategy increased the AP50-95 and AP50 on the OTV dataset to 45.2% and 75.2%, representing improvements of 3.7% and 6.1% over the baseline. Similarly, on the AUV dataset, it increased the AP50-95 and AP50 to 36.8% and 64.7%, yielding enhancements of 2.2% and 4.2% over the baseline. Gains were especially pronounced for tail classes, with AP50-95 increased by 23.6% and 21.9% for Octopus and Bryozoa on the OTV dataset, and by 15.1% and 14.6% for Bryozoa and Hydrozoa on the AUV dataset. Moreover, the proposed approach outperforms traditional augmented methods by 1.6% and 0.8% in AP50-95 on the OTV and AUV datasets, respectively, indicating its utility for improving detection in deep-sea megabenthic surveys.
Data availability
The source code and trained models weights are publicly available on GitHub at https://github.com/dengjunlan/deepsea-benthos-SD-Augmentation. The raw datasets and metadata supporting this study are available from the corresponding author upon reasonable request, subject to the data protection regulations of the Beijing Pioneer Hi-Tech Development Corporation.
References
Shen, C. et al. Dissimilarity of megabenthic community structure between deep-water seamounts with cobalt-rich crusts: Case study in the northwestern Pacific Ocean. Sci. Total Environ. 945, 173914 (2024).
Gallego, R. et al. North Atlantic deep-sea benthic biodiversity unveiled through sponge natural sampler DNA. Commun. Biol. 7, 1015 (2024).
Jones, D. O. B. et al. Long-term impact and biological recovery in a deep-sea mining track. Nature 642, 112–118 (2025).
Stewart, E. C. D. et al. Impacts of an industrial deep-sea mining trial on macrofaunal biodiversity. Nat. Ecol. Evol. https://doi.org/10.1038/s41559-025-02911-4 (2025).
Liu, A. et al. DeepSeaNet: A bio-detection network enabling species identification in the deep sea imagery. IEEE Trans. Geosci. Remote Sens. 62, 1–13 (2024).
Gazis, I.-Z. et al. Monitoring benthic plumes, sediment redeposition and seafloor imprints caused by deep-sea polymetallic nodule mining. Nat. Commun. 16, 1229 (2025).
Tao, J. et al. Diffusion-enhanced underwater debris detection via improved YOLOv12n framework. Remote Sens. 17, 3910 (2025).
Liu, Z. et al. Water-aware real-time detection of floating plastic debris via an enhanced YOLOv13 framework for aquatic pollution monitoring. Expert Syst. Appl. 313, 131552 (2026).
Zhao, F. et al. Mamba-based super-resolution and semi-supervised YOLOv10 for freshwater mussel detection using acoustic video camera: A case study at Lake Izunuma, Japan. Ecol. Inform. 90, 103324 (2025).
Zhao, F. et al. A novel underwater Holothurians monitoring system using consumer-grade amphibious UAV with Mamba-based Super-Resolution Reconstruction and enhanced YOLOv10. Mar. Environ. Res. 212, 107510 (2025).
Zhu, Z. et al. Seafloor classification by fusing AUV acoustic and magnetic data: Toward complex deep-sea environments. IEEE Trans. Geosci. Remote Sens. 63, 1–15 (2025).
Zhivkoplias, E., Jouffray, J.-B., Dunshirn, P., Pranindita, A. & Blasiak, R. Growing prominence of deep-sea life in marine bioprospecting. Nat. Sustain. 7, 1027–1037 (2024).
Mbani, B., Buck, V. & Greinert, J. An automated image-based workflow for detecting megabenthic fauna in optical images with examples from the Clarion–Clipperton Zone. Sci. Rep. 13, 8350 (2023).
Katija, K. et al. FathomNet: A global image database for enabling artificial intelligence in the ocean. Sci. Rep. 12, 15914 (2022).
Lowe, S. C. et al. BenthicNet: A global compilation of seafloor images for deep learning applications. Sci. Data 12, 230 (2025).
Johnson, J. M. & Khoshgoftaar, T. M. Survey on deep learning with class imbalance. J. Big Data 6, 27 (2019).
Durden, J. M., Hosking, B., Bett, B. J., Cline, D. & Ruhl, H. A. Automated classification of fauna in seabed photographs: The impact of training and validation dataset size, with considerations for the class imbalance. Prog. Oceanogr. 196, 102612 (2021).
Zang, Y., Huang, C. & Change Loy, C. FASA: Feature augmentation and sampling adaptation for long-tailed instance segmentation. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 3437–3446 (IEEE, 2021). https://doi.org/10.1109/ICCV48922.2021.00344.
Li, J., Wang, Q., Ma, J. & Guo, J. Multi‐defect segmentation from façade images using balanced copy–paste method. Comput. Aided Civ. Infrastruct. Eng. 37, 1434–1449 (2022).
Wang, T. et al. The devil is in classification: A simple framework for long-tail object detection and instance segmentation. In Computer Vision–ECCV 2020: 16th European Conference vol. Proceedings, Part XIV 16 728–744 (Springer International Publishing, 2020).
Cui, Y., Jia, M., Lin, T.-Y., Song, Y. & Belongie, S. Class-balanced loss based on effective number of samples. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 9268–9277 (IEEE, 2019).
Sharma, T., Cline, D. E. & Edgington, D. Making use of unlabeled data: Comparing strategies for marine animal detection in long-tailed datasets using self-supervised and semi-supervised pre-training. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 1224–1233 (IEEE, 2024). https://doi.org/10.1109/CVPRW63382.2024.00129.
Tan, J. et al. Equalization loss for long-tailed object recognition. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 11659–11668 (IEEE, 2020). https://doi.org/10.1109/CVPR42600.2020.01168.
Zhao, Y. et al. DETRs beat YOLOs on real-time object detection. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 16965–16974 (IEEE, 2024). https://doi.org/10.1109/CVPR52733.2024.01605.
Cui, Y., Song, Y., Sun, C., Howard, A. & Belongie, S. Large scale fine-grained categorization and domain-specific transfer learning. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 4109–4118 (IEEE, 2018). https://doi.org/10.1109/CVPR.2018.00432.
Chen, H., Wang, Y., Wang, G. & Qiao, Y. LSTD: A low-shot transfer detector for object detection. In Proceedings of the AAAI Conference on Artificial Intelligence Vol. 32 (2018).
Guo, C. & Huang, H. Enhancing camouflaged object detection through contrastive learning and data augmentation techniques. Eng. Appl. Artif. Intell. 141, 109703 (2025).
Takahashi, R., Matsubara, T. & Uehara, K. Data augmentation using random image cropping and patching for deep CNNs. IEEE Trans. Circuits Syst. Video Technol. 30, 2917–2931 (2020).
Zhong, Z., Zheng, L., Kang, G., Li, S. & Yang, Y. Random erasing data augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34 13001–13008 (2020).
Wang, K. et al. Perspective transformation data augmentation for object detection. IEEE Access 8, 4935–4943 (2020).
Yun, S. et al. CutMix: Regularization strategy to train strong classifiers with localizable features. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV) 6022–6031 (IEEE, 2019). https://doi.org/10.1109/ICCV.2019.00612.
Kaur, P., Khehra, B. S. & Mavi, Er. B. S. Data augmentation for object detection: A review. In 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS) 537–543 (IEEE, 2021). https://doi.org/10.1109/MWSCAS47672.2021.9531849.
Salimans, T. et al. Improved techniques for training GANs. In Advances in Neural Information Processing Systems Vol. 29 (Curran Associates, Inc., 2016).
Sandfort, V., Yan, K., Pickhardt, P. J. & Summers, R. M. Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks. Sci. Rep. 9, 16884 (2019).
Li, X., Lu, J., Han, K. & Prisacariu, V. A. SD4Match: Learning to prompt stable diffusion model for semantic matching. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 27548–27558 (IEEE, 2024). https://doi.org/10.1109/CVPR52733.2024.02602.
Deng, B. & Lu, Y. Weed image augmentation by ControlNet-added Stable Diffusion for multi-class weed detection. Comput. Electron. Agric. 232, 110123 (2025).
M. Abdulghani, A., M. Abdulghani, M., L. Walters, W. & H. Abed, K. Multiple data augmentation strategy for enhancing the performance of YOLOv7 object detection algorithm. J. Artif. Intell. 5, 15–30 (2023).
Hong, S., Choi, B., Ham, Y., Jeon, J. & Kim, H. Massive-scale construction dataset synthesis through Stable Diffusion for machine learning training. Adv. Eng. Inform. 62, 102866 (2024).
Moreno, H., Gómez, A., Altares-López, S., Ribeiro, A. & Andújar, D. Analysis of Stable Diffusion-derived fake weeds performance for training convolutional neural networks. Comput. Electron. Agric. 214, 108324 (2023).
Liang, Y. et al. A Stable Diffusion enhanced YOLOV5 model for metal stamped part defect detection based on improved network structure. J. Manuf. Process. 111, 21–31 (2024).
Ghahfarokhi, S. S. et al. Deep learning for automated detection of breast cancer in deep ultraviolet fluorescence images with diffusion probabilistic model. In 2024 IEEE International Symposium on Biomedical Imaging (ISBI) 1–5 (IEEE, 2024). https://doi.org/10.1109/ISBI56570.2024.10635349.
Ruiz-Ponce, P., Ortiz-Perez, D., Garcia-Rodriguez, J. & Kiefer, B. POSEIDON: A data augmentation tool for small object detection datasets in maritime environments. Sensors 23, 3691 (2023).
Zhang, T. et al. Advancing controllable diffusion model for few-shot object detection in optical remote sensing imagery. In IGARSS 2024 – 2024 IEEE International Geoscience and Remote Sensing Symposium 7600–7603 (IEEE, 2024). https://doi.org/10.1109/IGARSS53475.2024.10642625.
Liu, B., Su, S. & Wei, J. The effect of data augmentation methods on pedestrian object detection. Electronics 11, 3185 (2022).
Lee, H., Kang, S. & Chung, K. Robust data augmentation generative adversarial network for object detection. Sensors 23, 157 (2022).
Yigit, M. & Can, A. B. GISD: Generation of corner cases in infrared autonomous driving dataset with stable diffusion. In Applications of Machine Learning 2024 (eds Narayanan, B., Zelinski, M. E., Taha, T. M., Awwal, A. A. & Iftekharuddin, K. M.) (SPIE, 2024). https://doi.org/10.1117/12.3027776.
Xu, Y., Zhang, Y., Wang, H. & Liu, X. Underwater image classification using deep convolutional neural networks and data augmentation. In 2017 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC) 1–5. https://doi.org/10.1109/ICSPCC.2017.8242527 (2017).
Yang, Z., Zhao, J., Yu, Y. & Huang, C. A sample augmentation method for side-scan sonar full-class images that can be used for detection and segmentation. IEEE Trans. Geosci. Remote Sens. 62, 1–11 (2024).
Huang, C., Zhao, J., Zhang, H. & Yu, Y. Seg2Sonar: A full-class sample synthesis method applied to underwater sonar image target detection, recognition, and segmentation tasks. IEEE Trans. Geosci. Remote Sens. 62, 1–19 (2024).
Walker, J., Yamada, T., Prugel-Bennett, A. & Thornton, B. The effect of physics-based corrections and data augmentation on transfer learning for segmentation of benthic imagery. In 2019 IEEE Underwater Technology (UT) 1–8 (IEEE, 2019). https://doi.org/10.1109/UT.2019.8734463.
Cheng, C., Hou, X., Wen, X., Liu, W. & Zhang, F. Small-sample underwater target detection: A joint approach utilizing diffusion and YOLOv7 model. Remote Sens. 15, 4772 (2023).
Noh, J.-M., Jang, G.-R., Ha, K.-N. & Park, J.-H. Data augmentation method for object detection in underwater environments. In 2019 19th International Conference on Control, Automation and Systems (ICCAS) 324–328. https://doi.org/10.23919/ICCAS47443.2019.8971728 (2019).
Huang, H. et al. Faster R-CNN for marine organisms detection and recognition using data augmentation. Neurocomputing 337, 372–384 (2019).
Peng, Y.-T., Lin, Y.-C., Peng, W.-Y. & Liu, C.-Y. Blurriness-guided underwater salient object detection and data augmentation. IEEE J. Ocean Eng. 49, 1089–1103 (2024).
Teng, S. et al. Unsupervised learning method for underwater concrete crack image enhancement and augmentation based on cross domain translation strategy. Eng. Appl. Artif. Intell. 136, 108884 (2024).
Dubrovinskaya, E. & Tuhtan, J. A. This fish does not exist: Fish species image augmentation using stable diffusion. In OCEANS 2023 – Limerick 1–6. https://doi.org/10.1109/OCEANSLimerick52467.2023.10244720 (2023).
Zhang, H., Yao, F., Gong, Y. & Zhang, Q. Anemone image generation based on Diffusion-Stylegan2. IEEE Access 12, 37310–37325 (2024).
Liu, C. et al. A new dataset, Poisson GAN and AquaNet for underwater object grabbing. IEEE Trans. Circuits Syst. Video Technol. 32, 2831–2844 (2022).
Prakljačić, S., Grbić, R., Vranješ, M. & Herceg, M. Tool for image annotation in context of modern object detection. In 2024 Zooming Innovation in Consumer Technologies Conference (ZINC) 48–53. https://doi.org/10.1109/ZINC61849.2024.10579415 (2024).
Amon, D. et al. Megafauna of the UKSRL exploration contract area and eastern Clarion-Clipperton Zone in the Pacific Ocean: Annelida, Arthropoda, Bryozoa, Chordata, Ctenophora, Mollusca. Biodivers. Data J. 5, e14598 (2017).
Simon-Lledó, E. et al. Carbonate compensation depth drives abyssal biogeography in the northeast Pacific. Nat. Ecol. Evol. 7, 1388–1397 (2023).
Wang, S., Yu, L. & Li, J. LoRA-GA: Low-rank adaptation with gradient approximation.
Zhang, L., Rao, A. & Agrawala, M. Adding conditional control to text-to-image diffusion models. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV) 3813–3824 (IEEE, 2023). https://doi.org/10.1109/ICCV51070.2023.00355.
Wallace, B. et al. Diffusion model alignment using direct preference optimization. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 8228–8238 (IEEE, 2024). https://doi.org/10.1109/CVPR52733.2024.00786.
Lee, Y., Park, K., Cho, Y., Lee, Y.-J. & Hwang, S. J. KOALA: Empirical lessons toward memory-efficient and fast diffusion models for text-to-image synthesis. Adv. Neural Inf. Process. Syst. 37, 51597–51633 (2024).
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B. & Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems Vol. 30 (Curran Associates, Inc., 2017).
Arbel, M., Sutherland, D. J., Bińkowski, M. aj & Gretton, A. On gradient regularizers for MMD GANs. In Advances in Neural Information Processing Systems Vol. 31 (Curran Associates, Inc., 2018).
Chong, M. J. & Forsyth, D. Effectively unbiased FID and inception score and where to find them. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 6069–6078 (IEEE, 2020). https://doi.org/10.1109/CVPR42600.2020.00611.
Hessel, J., Holtzman, A., Forbes, M., Le Bras, R. & Choi, Y. CLIPScore: A reference-free evaluation metric for image captioning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 7514–7528 (Association for Computational Linguistics, 2021). https://doi.org/10.18653/v1/2021.emnlp-main.595.
Jayasumana, S. et al. Rethinking FID: Towards a better evaluation metric for image generation. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 9307–9315 (IEEE, 2024). https://doi.org/10.1109/CVPR52733.2024.00889.
Horváth, D., Erdős, G., Istenes, Z., Horváth, T. & Földi, S. Object detection using Sim2Real domain randomization for robotic applications. IEEE Trans. Robot. 39, 1225–1243 (2023).
Wilson, S., Fischer, T., Dayoub, F., Miller, D. & Sünderhauf, N. SAFE: Sensitivity-aware features for out-of-distribution object detection. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV) 23508–23519 (IEEE, 2023). https://doi.org/10.1109/ICCV51070.2023.02154.
Chen, S., Sun, P., Song, Y. & Luo, P. DiffusionDet: Diffusion model for object detection. In 2023 IEEE/CVF International Conference on Computer Vision (ICCV) 19773–19786 (IEEE, 2023). https://doi.org/10.1109/ICCV51070.2023.01816.
Bolya, D., Foley, S., Hays, J. & Hoffman, J. TIDE: A general toolbox for identifying object detection errors. https://doi.org/10.48550/arXiv.2008.08115 (2020).
Cai, Z., Fan, Q., Feris, R. S. & Vasconcelos, N. A unified multi-scale deep convolutional neural network for fast object detection. In Computer Vision—ECCV 2016 Vol. 9908 (eds Leibe, B. et al.) 354–370 (Springer International Publishing, 2016).
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K. & Yuille, A. L. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2017).
Akyon, F. C., Onur Altinuc, S. & Temizel, A. Slicing aided hyper inference and fine-tuning for small object detection. In 2022 IEEE International Conference on Image Processing (ICIP) 966–970. https://doi.org/10.1109/ICIP46576.2022.9897990 (2022).
Lin, T.-Y. et al. Feature pyramid networks for object detection. In 2017 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2117–2125 (2017).
Li, Y., Chen, Y., Wang, N. & Zhang, Z.-X. Scale-aware trident networks for object detection. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV) 6053–6062 (IEEE, 2019). https://doi.org/10.1109/ICCV.2019.00615.
Hanser, T. et al. Data-driven federated learning in drug discovery with knowledge distillation. Nat. Mach. Intell. 7, 423–436 (2025).
Ma, J. et al. A generalizable pathology foundation model using a unified knowledge distillation pretraining framework. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-025-01488-4 (2025).
Acknowledgements
This work was supported by the High-performance Computing Platform of China University of Geosciences Beijing.
Funding
This work was supported by the National Natural Science Foundation of China (Grant No.52394252), and the National Key R&D Program of China (Grant No.2022YFC2804003 and Grant No.2023YFC2811405-4).
Author information
Authors and Affiliations
Contributions
J.D. conducted the literature search and conceived the experiments. M.D. collected the data and performed statistical analysis. D.W. prepared the figures. X.H. validated the dataset annotations. W.S. and J.X. led the preparation of the final version and provided critical feedback. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Supplementary Information. (download DOCX )
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
About this article
Cite this article
Deng, J., Duan, M., Wei, D. et al. Improving rare-class detection in deep-sea imagery via generative augmentation with stable diffusion.
Sci Rep (2026). https://doi.org/10.1038/s41598-026-45732-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-45732-6
Source: Ecology - nature.com
