Abstract
Marine fishery resource prediction is crucial for sustainable fishery management and ecosystem conservation, yet traditional statistical methods face limitations in capturing the complex non-linear relationships and multi-scale temporal dependencies inherent in marine environmental systems. This study proposes a novel CNN-XGBoost fusion model that integrates convolutional neural networks’ temporal pattern recognition capabilities with extreme gradient boosting’s ensemble learning strengths for enhanced marine fishery resource forecasting. The fusion architecture employs a hierarchical two-stage framework where CNN components extract high-level temporal features from multi-source marine environmental data, while XGBoost modules process both extracted features and engineered variables to generate final predictions. Comprehensive experiments demonstrate that the proposed fusion model achieves superior performance compared to standalone CNN, XGBoost, and traditional ARIMA approaches, with 19.1% improvement in RMSE and statistically significant enhancements across all evaluation metrics. The optimal fusion weight analysis reveals that CNN-extracted features and XGBoost-processed features are weighted at 40 and 60% respectively in the final prediction fusion, achieving RMSE of 2.847, MAE of 2.184, and R2 of 0.846. These percentages represent fusion weight allocation rather than prediction accuracy values. Time series analysis confirms robust performance across seasonal variations and exceptional capability in predicting extreme abundance events critical for adaptive fishery management. The results provide valuable insights for sustainable marine resource management and offer practical tools for fishery policymakers and resource managers.
Similar content being viewed by others
Attention-enhanced and integrated deep learning approach for fishing vessel classification based on multiple features
Spatiotemporal dynamic assessment and obstacle analysis of economic resilience in China’s marine fisheries
Enhancing catch-based stock assessment in data-limited fisheries with proxy CPUE indicators in the Yellow Sea
Data availability
The marine fishery resource datasets used in this study were obtained from multiple sources, with access information provided below to facilitate reproducibility and follow-up research. Fishery catch records were obtained from the National Fisheries Database of China (http://www.cnfm.gov.cn/) operated by the Ministry of Agriculture and Rural Affairs, available upon reasonable request with appropriate data sharing agreements that comply with commercial confidentiality requirements. Researchers interested in accessing these data should contact the Fisheries Bureau (email: [email protected]) with a formal data request describing the research purpose, intended use, and data protection measures.Satellite-derived oceanographic data are publicly accessible through the following sources: (1) MODIS sea surface temperature and chlorophyll-a concentration data were downloaded from NASA’s Ocean Color Web portal (https://oceancolor.gsfc.nasa.gov/), specifically utilizing MODIS Aqua Level-3 mapped products (dataset identifiers: AQUA_MODIS.20080101_20231231.L3m.MO.SST.sst.4 km and AQUA_MODIS.20080101_20231231.L3m.8D.CHL.chlor_a.4 km); (2) SeaWiFS chlorophyll-a concentration data for the earlier period (1997-2010) were obtained from the same NASA Ocean Color portal (https://oceancolor.gsfc.nasa.gov/data/seawifs/). All satellite data are freely available without registration and can be accessed through the portal’s data browser or bulk download protocols.Meteorological data including wind speed, wind direction, and precipitation measurements were provided by the China Meteorological Administration through their National Meteorological Information Center data portal (http://data.cma.cn/). Access requires registration (free for research purposes) and adherence to the CMA data policy (http://data.cma.cn/en/site/index.html). Station-level daily observations can be requested through the portal’s data ordering system, with typical processing time of 3-5 business days for historical data requests. Ocean current velocity data were obtained from the China High-Frequency Radar Ocean Observation Network operated by the State Oceanic Administration, available through collaborative research agreements. Researchers should contact the National Marine Data and Information Service (email: [email protected], website: http://www.nmdis.org.cn/) to inquire about data access procedures.The processed datasets supporting the conclusions of this article, including the preprocessed and harmonized multi-source data, engineered features, and model predictions, are available from the corresponding author (Mingqi Zhang, email: [email protected]) upon reasonable request, subject to privacy and confidentiality restrictions imposed by original data providers. The Python code implementing the CNN-XGBoost fusion model, including data preprocessing scripts, model architecture definitions, training procedures, and evaluation metrics, will be made publicly available on GitHub (https://github.com/[username]/CNN-XGBoost-Marine-Fishery) upon publication acceptance, licensed under MIT License to facilitate reproducibility and encourage further methodological development by the research community.
References
Vianna, G. M. S., Zeller, D. & Pauly, D. Rethinking sustainability of marine fisheries for a fast-changing planet. Npj Ocean. Sustain. 1, 78. https://doi.org/10.1038/s44183-024-00078-2 (2024).
Zhang, G. P. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 50, 159–175. https://doi.org/10.1016/S0925-2312(01)00702-0 (2003).
Stergiou, K. I. & Browman, H. I. Bridging the gap between aquatic and terrestrial ecology. Mar. Ecol. Prog. Ser. 304, 271–307. https://doi.org/10.3354/meps304271 (2005).
Torres, M., Lim, B., Arık, S., Loeff, N. & Pfister, T. Time-series forecasting with deep learning: a survey. Philosophical Trans. Royal Soc. A. 379, 20200209. https://doi.org/10.1098/rsta.2020.0209 (2021).
Shi, X. et al. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv. Neural. Inf. Process. Syst. 28, 802–810 (2015).
Zeng, A., Chen, M., Zhang, L. & Xu, Q. Are Transformers effective for time series forecasting? Proc. AAAI Conf. Artif. Intell. 37, 11121–11128. https://doi.org/10.1609/aaai.v37i9.26317 (2023).
Wu, N., Green, B., Ben, X. & O’Banion, S. Deep transformer models for time series forecasting: the influenza prevalence case. arXiv (2020).
Zhang, Y., Sun, X., Chen, L. & Yan, J. Deep learning for ocean temperature forecasting: a survey. Intell. Mar. Technol. Syst. 2, 42. https://doi.org/10.1007/s44295-024-00042-3 (2024).
Chen, T. & Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785 (2016).
Bentéjac, C., Csörgő, A. & Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 54, 1937–1967. https://doi.org/10.1007/s10462-020-09896-5 (2021).
Kim, T. & Cho, S. Predicting residential energy consumption using CNN-LSTM neural networks. Energy 182, 72–81. https://doi.org/10.1016/j.energy.2019.05.230 (2019).
Vaswani, A. et al. Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 5998–6008 (2017).
Zhang, Y. & Yan, B. A systematic review for transformer-based long-term series forecasting. Artif. Intell. Rev. 57, 186. https://doi.org/10.1007/s10462-024-11044-2 (2022).
Ma, X. et al. Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning. Electron. Commer. Res. Appl. 31, 24–39. https://doi.org/10.1016/j.elerap.2018.08.002 (2018).
Hilborn, R. & Walters, C. J. Quantitative Fisheries Stock Assessment: Choice, Dynamics and Uncertainty (Chapman and Hall, 1992).
Myers, R. A., Hutchings, J. A. & Barrowman, N. J. Why do fish stocks collapse? The example of Cod in Atlantic Canada. Ecol. Appl. 7, 91–106. https://doi.org/10.1890/1051-0761(1997)007[0091:WDFSCT]2.0.CO;2 (1997).
Mueter, F. J., Boldt, J. L., Megrey, B. A. & Peterman, R. M. Recruitment and survival of Northeast Pacific ocean fish stocks: Temporal trends, covariation, and climate effects. Can. J. Fish. Aquat. Sci. 64, 911–927. https://doi.org/10.1139/f07-069 (2007).
Stergiou, K. I. & Christou, E. D. Modelling and forecasting annual fisheries catches: comparison of regression, univariate and multivariate time series methods. Fish. Res. 25, 105–138. https://doi.org/10.1016/0165-7836(95)00389-4 (1996).
Lopez-Parages, J. & Rodriguez-Fonseca, B. Multidecadal modulation of El Niño influence on the Euro-Mediterranean rainfall. Geophys. Res. Lett. 39, L02704. https://doi.org/10.1029/2011GL050049 (2012).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444. https://doi.org/10.1038/nature14539 (2015).
Zhang, Q., Wang, H., Dong, J., Zhong, G. & Sun, X. Prediction of sea surface temperature using long short-term memory. IEEE Geosci. Remote Sens. Lett. 14, 1745–1749. https://doi.org/10.1109/LGRS.2017.2733548 (2017).
Zhang, J., Zhu, Y., Zhang, X., Ye, M. & Yang, J. Developing a long Short-Term memory (LSTM) based model for predicting water table depth in agricultural areas. J. Hydrol. 561, 918–929. https://doi.org/10.1016/j.jhydrol.2018.04.065 (2018).
Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232. https://doi.org/10.1214/aos/1013203451 (2001).
Elith, J., Leathwick, J. R. & Hastie, T. A working guide to boosted regression trees. J. Anim. Ecol. 77, 802–813. https://doi.org/10.1111/j.1365-2656.2008.01390.x (2008).
Rajaee, T., Ebrahimi, H. & Nourani, V. A review of the artificial intelligence methods in groundwater level modeling. J. Hydrol. 572, 336–351. https://doi.org/10.1016/j.jhydrol.2018.12.037 (2019).
Grüss, A., Thorson, J. T., Babcock, E. A. & Tarnecki, J. H. Producing distribution maps for a spatially-explicit ecosystem model using large monitoring and environmental databases and a combination of interpolation and machine learning algorithms. Front. Mar. Sci. 5, 16. https://doi.org/10.3389/fmars.2018.00016 (2018).
Li, L., Situ, R., Gao, J., Yang, Z. & Liu, W. A hybrid model combining convolutional neural network with XGBoost for predicting social media popularity. In Proceedings of the 25th ACM International Conference on Multimedia, 1912–1917. https://doi.org/10.1145/3123266.3127902 (2017).
Shi, Y. et al. HyFish: hydrological factor fusion for prediction of fishing effort distribution with VMS dataset. Front. Mar. Sci. 11, 1296146. https://doi.org/10.3389/fmars.2024.1296146 (2024).
Xu, H., Ding, P., Chen, J., Zou, X. & Zhang, L. LSTM-based catch per unit effort standardization for Bigeye tuna in the Pacific ocean. Front. Mar. Sci. 11, 1344966. https://doi.org/10.3389/fmars.2024.1344966 (2023).
Thongsuwan, S., Jaiyen, S., Padcharoen, A. & Agarwal, P. ConvXGB: A new deep learning model for classification problems based on CNN and XGBoost. Nuclear Eng. Technol. 53, 522–531. https://doi.org/10.1016/j.net.2020.04.008 (2021).
Titouni, S., Dayoub, I. & Rouvaen, J. M. Deep learning-based automatic modulation classification using hybrid CNN-XGBoost model for wireless communication systems. Int. J. Commun Syst. 38, e5988. https://doi.org/10.1002/dac.70160 (2025).
Pauly, D. & Zeller, D. Catch reconstructions reveal that global marine fisheries catches are higher than reported and declining. Nat. Commun. 7, 10244. https://doi.org/10.1038/ncomms10244 (2016).
Sherman, K. & Duda, A. M. Large marine ecosystems: an emerging paradigm for fishery sustainability. Fisheries 24, 15–20. https://doi.org/10.1577/1548-8446(1999)024%3C0015:LMEAEP%3E2.0.CO;2 (1999).
Cury, P. & Roy, C. Optimal environmental window and pelagic fish recruitment success in upwelling areas. Can. J. Fish. Aquat. Sci. 46, 670–680. https://doi.org/10.1139/f89-086 (1989).
Gao, F., Masek, J., Schwaller, M. & Hall, F. On the blending of the Landsat and MODIS surface reflectance: predicting daily Landsat surface reflectance. IEEE Trans. Geosci. Remote Sens. 44, 2207–2218. https://doi.org/10.1109/TGRS.2006.872081 (2006).
Zhang, H., Chen, J., Huang, B., Song, H. & Kwan, M. P. Generating gapless land surface temperature with a high spatio-temporal resolution by fusing multi-source satellite-observed and model-simulated data. Remote Sens. Environ. 278, 113083. https://doi.org/10.1016/j.rse.2022.113083 (2022).
Little, R. J. & Rubin, D. B. Statistical Analysis with Missing Data (Wiley, 2019).
Li, J., Heap, A. D., Potter, A. & Daniell, J. J. Application of machine learning methods to Spatial interpolation of environmental variables. Environ. Model. Softw. 26, 1647–1659. https://doi.org/10.1016/j.envsoft.2011.07.004 (2011).
Guyon, I. & Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003).
Géron, A. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques To Build Intelligent Systems (O’Reilly Media, 2019).
Bergmeir, C. & Benítez, J. M. On the use of cross-validation for time series predictor evaluation. Inf. Sci. 191, 192–213. https://doi.org/10.1016/j.ins.2011.12.028 (2012).
Kang, Y., Hyndman, R. J. & Smith-Miles, K. Visualising forecasting algorithm performance using time series instance spaces. Int. J. Forecast. 33, 345–358. https://doi.org/10.1016/j.ijforecast.2016.09.004 (2017).
Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).
Ioffe, S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning. 448–456. (2015).
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data mining, inference, and Prediction (Springer Science & Business Media, 2009).
Zhou, Z. H. Ensemble Methods: Foundations and Algorithms (CRC, 2012).
Bengio, Y., Courville, A. & Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828. https://doi.org/10.1109/TPAMI.2013.50 (2013).
Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. arXiv (2014).
Tashman, L. J. Out-of-sample tests of forecasting accuracy: an analysis and review. Int. J. Forecast. 16, 437–450. https://doi.org/10.1016/S0169-2070(00)00065-0 (2000).
Hyndman, R. J. & Koehler, A. B. Another look at measures of forecast accuracy. Int. J. Forecast. 22, 679–688. https://doi.org/10.1016/j.ijforecast.2006.03.001 (2006).
Arlot, S. & Celisse, A. A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79. https://doi.org/10.1214/09-SS054 (2010).
Bonferroni, C. Teoria statistica Delle classi e Calcolo Delle probabilita. Pubblicazioni del. R Istituto Superiore Di Scienze Economiche E Commericiali Di Firenze. 8, 3–62 (1936).
Diebold, F. X. & Mariano, R. S. Comparing predictive accuracy. J. Bus. Economic Stat. 13, 253–263. https://doi.org/10.1080/07350015.1995.10524599 (1995).
Makridakis, S., Spiliotis, E. & Assimakopoulos, V. The M4 competition: Results, findings, conclusion and way forward. Int. J. Forecast. 34, 802–808. https://doi.org/10.1016/j.ijforecast.2018.06.001 (2018).
Harvey, D., Leybourne, S. & Newbold, P. Testing the equality of prediction mean squared errors. Int. J. Forecast. 13, 281–291. https://doi.org/10.1016/S0169-2070(96)00719-4 (1997).
Hastie, T., Tibshirani, R., Friedman, J. & Franklin, J. The elements of statistical learning: data mining, inference and prediction. Math. Intelligencer. 27, 83–85 (2005).
Mockus, J. Bayesian Approach To Global Optimization: Theory and Applications (Springer Science & Business Media, 2012).
Planque, B. & Frédou, T. Temperature and the recruitment of Atlantic Cod (Gadus morhua). Can. J. Fish. Aquat. Sci. 56, 2069–2077. https://doi.org/10.1139/f99-114 (1999).
Klyashtorin, L. B. Climate change and long-term fluctuations of commercial catches: the possibility of forecasting. FAO Fisheries Technical Paper No. 410. Rome, FAO. (2001).
Author information
Authors and Affiliations
Contributions
Mingqi Zhang conceived and designed the study, developed the CNN-XGBoost fusion model architecture, collected and preprocessed the marine fishery resource datasets, implemented the computational algorithms, conducted the experimental analysis, performed statistical evaluations, interpreted the results, and wrote the manuscript. The author read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics approval
This study utilized publicly available marine fishery resource data and satellite-derived oceanographic measurements that do not involve human subjects or animal experimentation, thereby satisfying ethical requirements for environmental data analysis. All data sources employed in this research were accessed through legitimate channels with appropriate authorization and compliance with data provider policies. Fishery catch records obtained from the National Fisheries Database of China were used under formal data sharing agreements that stipulate data confidentiality, appropriate use restrictions, and acknowledgment requirements. The research team adhered to all terms of these agreements, including anonymization of vessel-specific information and aggregation of data to temporal resolutions that protect commercially sensitive information. Satellite-derived oceanographic data from NASA’s Ocean Color Web portal are publicly available without restrictions and were used in accordance with NASA’s Earth Science Data and Information Policy. Meteorological data from the China Meteorological Administration were accessed following registration procedures and compliance with stated data use policies. The research was conducted in accordance with institutional guidelines for environmental data analysis at Hebei University and adheres to international standards for responsible conduct of research. Ethical approval was obtained from the Research Ethics Committee of the School of Economics, Hebei University (Approval Number: HBU-ECO-2024-015, Date: March 15, 2024), which reviewed the study protocol, data management procedures, and potential societal impacts of the research findings. The ethics review confirmed that the study poses no risks to human subjects, animal welfare, or environmental integrity, and that the research objectives align with principles of sustainable marine resource management and ecosystem conservation. All research activities were performed in compliance with relevant guidelines and regulations governing environmental research in China.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
About this article
Cite this article
Zhang, M. Marine fishery resource dynamic prediction based on CNN-XGBoost fusion model.
Sci Rep (2025). https://doi.org/10.1038/s41598-025-33175-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-33175-4
Keywords
- Marine fishery resources
- CNN-XGBoost fusion
- Resource prediction
- Deep learning
- Ensemble learning
- Time series forecasting
Source: Ecology - nature.com
