in

Marine fishery resource dynamic prediction based on CNN-XGBoost fusion model


Abstract

Marine fishery resource prediction is crucial for sustainable fishery management and ecosystem conservation, yet traditional statistical methods face limitations in capturing the complex non-linear relationships and multi-scale temporal dependencies inherent in marine environmental systems. This study proposes a novel CNN-XGBoost fusion model that integrates convolutional neural networks’ temporal pattern recognition capabilities with extreme gradient boosting’s ensemble learning strengths for enhanced marine fishery resource forecasting. The fusion architecture employs a hierarchical two-stage framework where CNN components extract high-level temporal features from multi-source marine environmental data, while XGBoost modules process both extracted features and engineered variables to generate final predictions. Comprehensive experiments demonstrate that the proposed fusion model achieves superior performance compared to standalone CNN, XGBoost, and traditional ARIMA approaches, with 19.1% improvement in RMSE and statistically significant enhancements across all evaluation metrics. The optimal fusion weight analysis reveals that CNN-extracted features and XGBoost-processed features are weighted at 40 and 60% respectively in the final prediction fusion, achieving RMSE of 2.847, MAE of 2.184, and R2 of 0.846. These percentages represent fusion weight allocation rather than prediction accuracy values. Time series analysis confirms robust performance across seasonal variations and exceptional capability in predicting extreme abundance events critical for adaptive fishery management. The results provide valuable insights for sustainable marine resource management and offer practical tools for fishery policymakers and resource managers.

Similar content being viewed by others

Attention-enhanced and integrated deep learning approach for fishing vessel classification based on multiple features

Spatiotemporal dynamic assessment and obstacle analysis of economic resilience in China’s marine fisheries

Enhancing catch-based stock assessment in data-limited fisheries with proxy CPUE indicators in the Yellow Sea

Data availability

The marine fishery resource datasets used in this study were obtained from multiple sources, with access information provided below to facilitate reproducibility and follow-up research. Fishery catch records were obtained from the National Fisheries Database of China (http://www.cnfm.gov.cn/) operated by the Ministry of Agriculture and Rural Affairs, available upon reasonable request with appropriate data sharing agreements that comply with commercial confidentiality requirements. Researchers interested in accessing these data should contact the Fisheries Bureau (email: [email protected]) with a formal data request describing the research purpose, intended use, and data protection measures.Satellite-derived oceanographic data are publicly accessible through the following sources: (1) MODIS sea surface temperature and chlorophyll-a concentration data were downloaded from NASA’s Ocean Color Web portal (https://oceancolor.gsfc.nasa.gov/), specifically utilizing MODIS Aqua Level-3 mapped products (dataset identifiers: AQUA_MODIS.20080101_20231231.L3m.MO.SST.sst.4 km and AQUA_MODIS.20080101_20231231.L3m.8D.CHL.chlor_a.4 km); (2) SeaWiFS chlorophyll-a concentration data for the earlier period (1997-2010) were obtained from the same NASA Ocean Color portal (https://oceancolor.gsfc.nasa.gov/data/seawifs/). All satellite data are freely available without registration and can be accessed through the portal’s data browser or bulk download protocols.Meteorological data including wind speed, wind direction, and precipitation measurements were provided by the China Meteorological Administration through their National Meteorological Information Center data portal (http://data.cma.cn/). Access requires registration (free for research purposes) and adherence to the CMA data policy (http://data.cma.cn/en/site/index.html). Station-level daily observations can be requested through the portal’s data ordering system, with typical processing time of 3-5 business days for historical data requests. Ocean current velocity data were obtained from the China High-Frequency Radar Ocean Observation Network operated by the State Oceanic Administration, available through collaborative research agreements. Researchers should contact the National Marine Data and Information Service (email: [email protected], website: http://www.nmdis.org.cn/) to inquire about data access procedures.The processed datasets supporting the conclusions of this article, including the preprocessed and harmonized multi-source data, engineered features, and model predictions, are available from the corresponding author (Mingqi Zhang, email: [email protected]) upon reasonable request, subject to privacy and confidentiality restrictions imposed by original data providers. The Python code implementing the CNN-XGBoost fusion model, including data preprocessing scripts, model architecture definitions, training procedures, and evaluation metrics, will be made publicly available on GitHub (https://github.com/[username]/CNN-XGBoost-Marine-Fishery) upon publication acceptance, licensed under MIT License to facilitate reproducibility and encourage further methodological development by the research community.

References

  1. Vianna, G. M. S., Zeller, D. & Pauly, D. Rethinking sustainability of marine fisheries for a fast-changing planet. Npj Ocean. Sustain. 1, 78. https://doi.org/10.1038/s44183-024-00078-2 (2024).

    Google Scholar 

  2. Zhang, G. P. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50, 159–175. https://doi.org/10.1016/S0925-2312(01)00702-0 (2003).

    Google Scholar 

  3. Stergiou, K. I. & Browman, H. I. Bridging the gap between aquatic and terrestrial ecology. Mar. Ecol. Prog. Ser. 304, 271–307. https://doi.org/10.3354/meps304271 (2005).

    Google Scholar 

  4. Torres, M., Lim, B., Arık, S., Loeff, N. & Pfister, T. Time-series forecasting with deep learning: a survey. Philosophical Trans. Royal Soc. A. 379, 20200209. https://doi.org/10.1098/rsta.2020.0209 (2021).

    Google Scholar 

  5. Shi, X. et al. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv. Neural. Inf. Process. Syst. 28, 802–810 (2015).

    Google Scholar 

  6. Zeng, A., Chen, M., Zhang, L. & Xu, Q. Are Transformers effective for time series forecasting? Proc. AAAI Conf. Artif. Intell. 37, 11121–11128. https://doi.org/10.1609/aaai.v37i9.26317 (2023).

    Google Scholar 

  7. Wu, N., Green, B., Ben, X. & O’Banion, S. Deep transformer models for time series forecasting: the influenza prevalence case. arXiv (2020).

  8. Zhang, Y., Sun, X., Chen, L. & Yan, J. Deep learning for ocean temperature forecasting: a survey. Intell. Mar. Technol. Syst. 2, 42. https://doi.org/10.1007/s44295-024-00042-3 (2024).

    Google Scholar 

  9. Chen, T. & Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785 (2016).

  10. Bentéjac, C., Csörgő, A. & Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 54, 1937–1967. https://doi.org/10.1007/s10462-020-09896-5 (2021).

    Google Scholar 

  11. Kim, T. & Cho, S. Predicting residential energy consumption using CNN-LSTM neural networks. Energy 182, 72–81. https://doi.org/10.1016/j.energy.2019.05.230 (2019).

    Google Scholar 

  12. Vaswani, A. et al. Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 5998–6008 (2017).

    Google Scholar 

  13. Zhang, Y. & Yan, B. A systematic review for transformer-based long-term series forecasting. Artif. Intell. Rev. 57, 186. https://doi.org/10.1007/s10462-024-11044-2 (2022).

    Google Scholar 

  14. Ma, X. et al. Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning. Electron. Commer. Res. Appl. 31, 24–39. https://doi.org/10.1016/j.elerap.2018.08.002 (2018).

    Google Scholar 

  15. Hilborn, R. & Walters, C. J. Quantitative Fisheries Stock Assessment: Choice, Dynamics and Uncertainty (Chapman and Hall, 1992).

  16. Myers, R. A., Hutchings, J. A. & Barrowman, N. J. Why do fish stocks collapse? The example of Cod in Atlantic Canada. Ecol. Appl. 7, 91–106. https://doi.org/10.1890/1051-0761(1997)007[0091:WDFSCT]2.0.CO;2 (1997).

    Google Scholar 

  17. Mueter, F. J., Boldt, J. L., Megrey, B. A. & Peterman, R. M. Recruitment and survival of Northeast Pacific ocean fish stocks: Temporal trends, covariation, and climate effects. Can. J. Fish. Aquat. Sci. 64, 911–927. https://doi.org/10.1139/f07-069 (2007).

    Google Scholar 

  18. Stergiou, K. I. & Christou, E. D. Modelling and forecasting annual fisheries catches: comparison of regression, univariate and multivariate time series methods. Fish. Res. 25, 105–138. https://doi.org/10.1016/0165-7836(95)00389-4 (1996).

    Google Scholar 

  19. Lopez-Parages, J. & Rodriguez-Fonseca, B. Multidecadal modulation of El Niño influence on the Euro-Mediterranean rainfall. Geophys. Res. Lett. 39, L02704. https://doi.org/10.1029/2011GL050049 (2012).

    Google Scholar 

  20. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444. https://doi.org/10.1038/nature14539 (2015).

    Google Scholar 

  21. Zhang, Q., Wang, H., Dong, J., Zhong, G. & Sun, X. Prediction of sea surface temperature using long short-term memory. IEEE Geosci. Remote Sens. Lett. 14, 1745–1749. https://doi.org/10.1109/LGRS.2017.2733548 (2017).

    Google Scholar 

  22. Zhang, J., Zhu, Y., Zhang, X., Ye, M. & Yang, J. Developing a long Short-Term memory (LSTM) based model for predicting water table depth in agricultural areas. J. Hydrol. 561, 918–929. https://doi.org/10.1016/j.jhydrol.2018.04.065 (2018).

    Google Scholar 

  23. Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232. https://doi.org/10.1214/aos/1013203451 (2001).

    Google Scholar 

  24. Elith, J., Leathwick, J. R. & Hastie, T. A working guide to boosted regression trees. J. Anim. Ecol. 77, 802–813. https://doi.org/10.1111/j.1365-2656.2008.01390.x (2008).

    Google Scholar 

  25. Rajaee, T., Ebrahimi, H. & Nourani, V. A review of the artificial intelligence methods in groundwater level modeling. J. Hydrol. 572, 336–351. https://doi.org/10.1016/j.jhydrol.2018.12.037 (2019).

    Google Scholar 

  26. Grüss, A., Thorson, J. T., Babcock, E. A. & Tarnecki, J. H. Producing distribution maps for a spatially-explicit ecosystem model using large monitoring and environmental databases and a combination of interpolation and machine learning algorithms. Front. Mar. Sci. 5, 16. https://doi.org/10.3389/fmars.2018.00016 (2018).

  27. Li, L., Situ, R., Gao, J., Yang, Z. & Liu, W. A hybrid model combining convolutional neural network with XGBoost for predicting social media popularity. In Proceedings of the 25th ACM International Conference on Multimedia, 1912–1917. https://doi.org/10.1145/3123266.3127902 (2017).

  28. Shi, Y. et al. HyFish: hydrological factor fusion for prediction of fishing effort distribution with VMS dataset. Front. Mar. Sci. 11, 1296146. https://doi.org/10.3389/fmars.2024.1296146 (2024).

    Google Scholar 

  29. Xu, H., Ding, P., Chen, J., Zou, X. & Zhang, L. LSTM-based catch per unit effort standardization for Bigeye tuna in the Pacific ocean. Front. Mar. Sci. 11, 1344966. https://doi.org/10.3389/fmars.2024.1344966 (2023).

    Google Scholar 

  30. Thongsuwan, S., Jaiyen, S., Padcharoen, A. & Agarwal, P. ConvXGB: A new deep learning model for classification problems based on CNN and XGBoost. Nuclear Eng. Technol. 53, 522–531. https://doi.org/10.1016/j.net.2020.04.008 (2021).

    Google Scholar 

  31. Titouni, S., Dayoub, I. & Rouvaen, J. M. Deep learning-based automatic modulation classification using hybrid CNN-XGBoost model for wireless communication systems. Int. J. Commun Syst. 38, e5988. https://doi.org/10.1002/dac.70160 (2025).

    Google Scholar 

  32. Pauly, D. & Zeller, D. Catch reconstructions reveal that global marine fisheries catches are higher than reported and declining. Nat. Commun. 7, 10244. https://doi.org/10.1038/ncomms10244 (2016).

    Google Scholar 

  33. Sherman, K. & Duda, A. M. Large marine ecosystems: an emerging paradigm for fishery sustainability. Fisheries 24, 15–20. https://doi.org/10.1577/1548-8446(1999)024%3C0015:LMEAEP%3E2.0.CO;2 (1999).

    Google Scholar 

  34. Cury, P. & Roy, C. Optimal environmental window and pelagic fish recruitment success in upwelling areas. Can. J. Fish. Aquat. Sci. 46, 670–680. https://doi.org/10.1139/f89-086 (1989).

    Google Scholar 

  35. Gao, F., Masek, J., Schwaller, M. & Hall, F. On the blending of the Landsat and MODIS surface reflectance: predicting daily Landsat surface reflectance. IEEE Trans. Geosci. Remote Sens. 44, 2207–2218. https://doi.org/10.1109/TGRS.2006.872081 (2006).

    Google Scholar 

  36. Zhang, H., Chen, J., Huang, B., Song, H. & Kwan, M. P. Generating gapless land surface temperature with a high spatio-temporal resolution by fusing multi-source satellite-observed and model-simulated data. Remote Sens. Environ. 278, 113083. https://doi.org/10.1016/j.rse.2022.113083 (2022).

  37. Little, R. J. & Rubin, D. B. Statistical Analysis with Missing Data (Wiley, 2019).

  38. Li, J., Heap, A. D., Potter, A. & Daniell, J. J. Application of machine learning methods to Spatial interpolation of environmental variables. Environ. Model. Softw. 26, 1647–1659. https://doi.org/10.1016/j.envsoft.2011.07.004 (2011).

    Google Scholar 

  39. Guyon, I. & Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003).

    Google Scholar 

  40. Géron, A. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques To Build Intelligent Systems (O’Reilly Media, 2019).

  41. Bergmeir, C. & Benítez, J. M. On the use of cross-validation for time series predictor evaluation. Inf. Sci. 191, 192–213. https://doi.org/10.1016/j.ins.2011.12.028 (2012).

    Google Scholar 

  42. Kang, Y., Hyndman, R. J. & Smith-Miles, K. Visualising forecasting algorithm performance using time series instance spaces. Int. J. Forecast. 33, 345–358. https://doi.org/10.1016/j.ijforecast.2016.09.004 (2017).

    Google Scholar 

  43. Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).

  44. Ioffe, S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning. 448–456. (2015).

  45. Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data mining, inference, and Prediction (Springer Science & Business Media, 2009).

  46. Zhou, Z. H. Ensemble Methods: Foundations and Algorithms (CRC, 2012).

  47. Bengio, Y., Courville, A. & Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828. https://doi.org/10.1109/TPAMI.2013.50 (2013).

    Google Scholar 

  48. Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. arXiv (2014).

  49. Tashman, L. J. Out-of-sample tests of forecasting accuracy: an analysis and review. Int. J. Forecast. 16, 437–450. https://doi.org/10.1016/S0169-2070(00)00065-0 (2000).

    Google Scholar 

  50. Hyndman, R. J. & Koehler, A. B. Another look at measures of forecast accuracy. Int. J. Forecast. 22, 679–688. https://doi.org/10.1016/j.ijforecast.2006.03.001 (2006).

    Google Scholar 

  51. Arlot, S. & Celisse, A. A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79. https://doi.org/10.1214/09-SS054 (2010).

    Google Scholar 

  52. Bonferroni, C. Teoria statistica Delle classi e Calcolo Delle probabilita. Pubblicazioni del. R Istituto Superiore Di Scienze Economiche E Commericiali Di Firenze. 8, 3–62 (1936).

    Google Scholar 

  53. Diebold, F. X. & Mariano, R. S. Comparing predictive accuracy. J. Bus. Economic Stat. 13, 253–263. https://doi.org/10.1080/07350015.1995.10524599 (1995).

    Google Scholar 

  54. Makridakis, S., Spiliotis, E. & Assimakopoulos, V. The M4 competition: Results, findings, conclusion and way forward. Int. J. Forecast. 34, 802–808. https://doi.org/10.1016/j.ijforecast.2018.06.001 (2018).

    Google Scholar 

  55. Harvey, D., Leybourne, S. & Newbold, P. Testing the equality of prediction mean squared errors. Int. J. Forecast. 13, 281–291. https://doi.org/10.1016/S0169-2070(96)00719-4 (1997).

    Google Scholar 

  56. Hastie, T., Tibshirani, R., Friedman, J. & Franklin, J. The elements of statistical learning: data mining, inference and prediction. Math. Intelligencer. 27, 83–85 (2005).

    Google Scholar 

  57. Mockus, J. Bayesian Approach To Global Optimization: Theory and Applications (Springer Science & Business Media, 2012).

  58. Planque, B. & Frédou, T. Temperature and the recruitment of Atlantic Cod (Gadus morhua). Can. J. Fish. Aquat. Sci. 56, 2069–2077. https://doi.org/10.1139/f99-114 (1999).

    Google Scholar 

  59. Klyashtorin, L. B. Climate change and long-term fluctuations of commercial catches: the possibility of forecasting. FAO Fisheries Technical Paper No. 410. Rome, FAO. (2001).

Download references

Author information

Authors and Affiliations

Authors

Contributions

Mingqi Zhang conceived and designed the study, developed the CNN-XGBoost fusion model architecture, collected and preprocessed the marine fishery resource datasets, implemented the computational algorithms, conducted the experimental analysis, performed statistical evaluations, interpreted the results, and wrote the manuscript. The author read and approved the final manuscript.

Corresponding author

Correspondence to
Mingqi Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval

This study utilized publicly available marine fishery resource data and satellite-derived oceanographic measurements that do not involve human subjects or animal experimentation, thereby satisfying ethical requirements for environmental data analysis. All data sources employed in this research were accessed through legitimate channels with appropriate authorization and compliance with data provider policies. Fishery catch records obtained from the National Fisheries Database of China were used under formal data sharing agreements that stipulate data confidentiality, appropriate use restrictions, and acknowledgment requirements. The research team adhered to all terms of these agreements, including anonymization of vessel-specific information and aggregation of data to temporal resolutions that protect commercially sensitive information. Satellite-derived oceanographic data from NASA’s Ocean Color Web portal are publicly available without restrictions and were used in accordance with NASA’s Earth Science Data and Information Policy. Meteorological data from the China Meteorological Administration were accessed following registration procedures and compliance with stated data use policies. The research was conducted in accordance with institutional guidelines for environmental data analysis at Hebei University and adheres to international standards for responsible conduct of research. Ethical approval was obtained from the Research Ethics Committee of the School of Economics, Hebei University (Approval Number: HBU-ECO-2024-015, Date: March 15, 2024), which reviewed the study protocol, data management procedures, and potential societal impacts of the research findings. The ethics review confirmed that the study poses no risks to human subjects, animal welfare, or environmental integrity, and that the research objectives align with principles of sustainable marine resource management and ecosystem conservation. All research activities were performed in compliance with relevant guidelines and regulations governing environmental research in China.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhang, M. Marine fishery resource dynamic prediction based on CNN-XGBoost fusion model.
Sci Rep (2025). https://doi.org/10.1038/s41598-025-33175-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41598-025-33175-4

Keywords

  • Marine fishery resources
  • CNN-XGBoost fusion
  • Resource prediction
  • Deep learning
  • Ensemble learning
  • Time series forecasting


Source: Ecology - nature.com

Effects of bacillus on continuous cropping of sugar beets and their rhizosphere microbial community

Delayed dynamics and detoxification in nutrient-phytoplankto-by-product systems: mechanisms driving bloom stability and oscillations

Back to Top