Abstract
Agricultural research increasingly relies on data-driven approaches for crop yield prediction that complement more established crop growth models, including machine learning techniques. However, these approaches rely on large training datasets. Here, we present the Crop Yields, Climate, Soils, and Satellites (CYCleSS) dataset, a large-scale crop yield dataset derived from precision yield data for 934 fields across England on which a variety of crops are grown. In addition, the data also contains satellite-derived remote sensing data, weather data, and data on soil type, all aligned at a grid resolution of 10 km. Weather data is available at a daily temporal resolution, satellite data at 5-day resolution, while crop yield data is available at yearly resolution. This effort has been made possible through careful anonymisation of the yield data while preserving the alignment with remote sensing, weather, and soil data. This data will be useful both to train machine learning models of yield prediction as well as to parameterize mechanistic crop growth models. Furthermore, the anonymisation procedure itself will be of interest to the research community, as it represents a solution to a common problem on the interface of agricultural research and farming practice.
Data availability
All data comprising the final CYCleSS dataset is available through Figshare repository (https://doi.org/10.6084/m9.figshare.27225807)48. See the Data Records section for a detailed breakdown of the contents of this repository. Researchers who are further interested in the underlying data should contact the authors affiliated with UKCEH.
Code availability
R code used to merge and align available UK climate, soil, and Sentinel-1 synthetic aperture radar data to the same 1 km2 grid is provided in the following GitHub repository: https://github.com/alan-turing-institute/CYCLeSS-dataset-code. Dummy data and code needed to replicate the final process of merging climate, soil, and satellite data with UKCEH precision yield data and anonymisation of field locations is contained within the ‘CLYCESS_anonymisation.zip’ folder shared as part of this repository. R version 4.2.3 was used for the creation of this dataset.
References
Fischer R.A., Byerlee D. & Edmeades G.O. Crop yields and global food security: will yield increase continue to feed the world? ACIAR Monograph No. 158. (Australian Centre for International Agricultural Research: Canberra, 2014).
Hu, T. et al. Climate change impacts on crop yields: A review of empirical findings, statistical crop models, and machine learning methods. Environ Model Softw 179, 106119, https://doi.org/10.1016/j.envsoft.2024.106119 (2024).
Maestrini, B. et al. Mixing process-based and data-driven approaches in yield prediction. Eur J Agron 139, 126569, https://doi.org/10.1016/j.eja.2022.126569 (2022).
Silva, J. V. & Giller, K. E. Grand challenges for the 21st century: What crop models can and can’t (yet) do. J Agric Sci 158, 794–805, https://doi.org/10.1017/S0021859621000150 (2021).
van Klompenburg, T., Kassahun, A. & Catal, C. Crop yield prediction using machine learning: A systematic literature review. Comput Electron Agric 177, 105709, https://doi.org/10.1016/j.compag.2020.105709 (2020).
Corcoran, E. et al. Current data and modeling bottlenecks for predicting crop yields in the United Kingdom. Front. Sustain. Food Syst 7. https://doi.org/10.3389/fsufs.2023.1023169. (2023).
Lischeid, G., Webber, H., Sommer, M., Nendel, C. & Ewert, F. Machine learning in crop yield modelling: A powerful tool, but no surrogate for science. Agric For Meteorol 312, 108698, https://doi.org/10.1016/j.agrformet.2021.108698 (2022).
Paudel, D. et al. Machine learning for large-scale crop yield forecasting. Agric. Syst 187, 103016, https://doi.org/10.1016/j.agsy.2020.103016 (2021).
Li, L. et al. Integrating machine learning and environmental variables to constrain uncertainty in crop yield change projections under climate change. Eur J Agron 149, 126917 https://www.sciencedirect.com/science/article/pii/S1161030123001855 (2023).
Morales, A. & Villalobos, F. J. Using machine learning for crop yield prediction in the past or the future. Front. Plant Sci. 14, 1128388, https://doi.org/10.3389/fpls.2023.1128388 (2023).
Willcock, S. et al. Machine learning for ecosystem services. Ecosyst. Serv 33, 165–174, https://doi.org/10.1016/j.ecoser.2018.04.004 (2018).
Pantazi, X. E., Moshou, D., Alexandridis, T., Whetton, R. L. & Mouazen, A. M. Wheat yield prediction using machine learning and advanced sensing techniques. Comput Electron Agric 121, 57–65, https://doi.org/10.1016/j.compag.2015.11.018 (2016).
Jiang, C., Guan, K., Huang, Y. & Jong, M. A vehicle imaging approach to acquire ground truth data for upscaling to satellite data: A case study for estimating harvesting dates. Remote Sens Environ 300, 113894, https://doi.org/10.1016/j.rse.2023.113894 (2024).
Cai, Y. et al. Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches. Agric For Meteorol 274, 144–159, https://doi.org/10.1016/j.agrformet.2019.03.010 (2019).
Bongiovanni, R. & Lowenberg-Deboer, J. Precision agriculture and sustainability. Precis. Agric 5, 359–387 (2004).
McBratney, A. & Whelan, B. Future directions of precision agriculture. Precis. Agric 6, 7–23, https://doi.org/10.1007/s11119-005-0681-8 (2005).
Gebbers, R. & Adamchuk, V. I. Precision agriculture and food security. Science 327, 828–831, https://doi.org/10.1126/science.1183899 (2010).
Al-Gaadi, K. A. et al. Prediction of potato crop yield using precision agriculture techniques. PLOS One. 11, eo162219, https://doi.org/10.1371/journal.pone.0162219 (2016).
Hunt, M. L., Blackburn, G. A., Carrasco, L., Redhead, J. W. & Rowland, C. S. High resolution wheat yield mapping using Sentinel-2. Remote Sens Environ 233, 111410, https://doi.org/10.1016/j.rse.2019.111410 (2019).
Mancini, F. et al. Chapter three – Detecting landscape scale consequences of insecticide use on invertebrate communities. Adv. Ecol. Res. 63, 93–126, https://doi.org/10.1016/bs.aecr.2020.07.001 (2020).
Fincham, W. N., Redhead, J. W., Woodcock, B. A. & Pywell, R. F. Exploring drivers of within-field crop yield variation using a national precision yield network. J Appl Ecol 60, 319–329, https://doi.org/10.1111/1365-2664.14323 (2022).
Wilkinson, M. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018, https://doi.org/10.1038/sdata.2016.18 (2016).
Kang, Y. et al. Comparative assessment of environmental variables and machine learning algorithms for maize yield prediction in the US Midwest. Environ. Res. Lett 15, 064005, https://doi.org/10.1088/1748-9326/ab7df9 (2020).
Khaki, S., & Wang, L. (2019). Crop Yield Prediction Using Deep Neural Networks. Front. Plant Sci. 10. https://doi.org/10.3389/fpls.2019.00621 (2019).
Yang, Q., Shi, L., Han, J., Zha, Y. & Zhu, P. Deep convolutional neural networks for rice grain yield estimation at the ripening stage using UAV-based remotely sensed images. Field Crops Res 235, 142–153, https://doi.org/10.1016/j.fcr.2019.02.022 (2019).
Halder, M. et al. A Systematic Review on Crop Yield Prediction Using Machine Learning. In: Nguyen, T. D. L., Verdú, E., Le, A. N., Ganzha, M. (eds) Intelligent Systems and Networks. (ICISN 2023, Lecture Notes in Networks and Systems, vol 752, Springer, Singapore. https://doi.org/10.1007/978-981-99-4725-6_77 2023).
Holzworth, D. P. et al. APSIM – Evolution towards a new generation of agricultural systems smulation. Environ Model Softw 62, 327–350, https://doi.org/10.1016/j.envsoft.2014.07.009 (2014).
Jamieson, P., Semenov, M. A., Brooking, I. R. & Francis, G. Sirius: A mechanistic model of wheat response to environmental variation. Eur J Agron 8, 161–179, https://doi.org/10.1016/S1161-0301(98)00020-3 (1998).
Manivasagam, V. S. & Rozenstein, O. Practices for upscaling crop simulation models from field scale to large regions. Comput Electron Agric 175, 105554, https://doi.org/10.1016/j.compag.2020.105554 (2020).
Peng, B. et al. Towards a multiscale crop modelling framework for climate change adaptation assessment. Nat. Plants 6, 338–348, https://doi.org/10.1038/s41477-020-0625-3 (2020).
Addy, J. W. G., Ellis, R. H., Macdonald, A. J., Semenov, M. A. & Mead, A. Investigating the effects of inter-annual weather variation (1968–2016) on the functional response of cereal grain yield to applied nitrogen, using data from the Rothamsted Long-Term Experiments. Agric For Meteorol 284, 107898, https://doi.org/10.1016/j.agrformet.2019.107898 (2020).
Jackson, R.D. Remote sensing of vegetation characteristics for farm management. (Remote Sensing: Critical Review of Technology, 475, 81-97, SPIE. https://doi.org/10.1117/12.966243 1984).
Bauer, M. E. Spectral inputs to crop identification and condition assessment. Proceedings of the IEEE 73, 1071–1085, https://doi.org/10.1109/PROC.1985.13238 (1985).
Ma, T., Duan, Z., Li, R. & Song, X. Enhancing SWAT with remotely sensed LAI for improved modelling of ecohydrological process in subtropics. J. Hydrol 570, 802–815, https://doi.org/10.1016/j.jhydrol.2019.01.024 (2019).
Hayman, G. et al. A framework for improved predictions of the climate impacts on potential yields of UK winter wheat and its applicability to other UK crops. Clim. Serv 34, 100479, https://doi.org/10.1016/j.cliser.2024.100479 (2024).
Novelli, F., Spiegel, H., Sanden, T., & Vuolo, F. Assimilation of Sentinel-2 Leaf Area Index Data into a Physically-Based Crop Growth Model for Yield Estimation. Agronomy 9, 255, https://doi.org/10.3390/agronomy9050255 (2019).
Pan, H., Chen, Z., de Wit, A. & Ren, J. Joint Assimilation of Leaf Area Index and Soil Moisture from Sentinel-1 and Sentinel-2 Data into the WOFOST Model for Winter Wheat Yield Estimation. Sensors 19, 3161, https://doi.org/10.3390/s19143161 (2019).
Zhuo, W. et al. Assimilating soil moisture retrieved from Sentinel-1 and Sentinel-2 data into WOFOST model to improve winter wheat yield estimation. Remote Sens 11, 1618, https://doi.org/10.3390/rs11131618 (2019).
Robinson, E.L. et al. Climate hydrology and ecology research support system potential evapotranspiration dataset for Great Britain (1961-2015) [CHESS-PE]. NERC Environmental Information Data Centre https://doi.org/10.5285/8baf805d-39ce-4dac-b224-c926ada353b7 (2016).
Robinson, E. L. et al. Climate hydrology and ecology research support system meteorology dataset for Great Britain (1961–2015) [CHESS-met] v1.2. NERC Environmental Information Data Centre https://doi.org/10.5285/b745e7b1-626c-4ccc-ac27-56582e77b900 (2017).
European Commission, Joint Research Centre (JRC) Maps of indicators of soil hydraulic properties for Europe. European Commission, Joint Research Centre (JRC) Dataset PID.: http://data.europa.eu/89h/jrc-esdac-39 (2016).
Ballabio, C., Panagos, P. & Montanarella, L. Mapping topsoil physical properties at European scale using the LUCAS database. Geoderma 261, 110–123 (2016).
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org. (2024).
Bivand, R., Keitt, T., & Rowlingson, B. rgdal: Bindings for the ‘Geospatial’ Data Abstraction Library. http://rgdal.r-forge.r-project.org (2023).
Wickham, H., François, R., Henry, L., Müller, K. & Vaughan, D. dplyr: A Grammar of Data Manipulation. R package version 1.1.4, https://github.com/tidyverse/dplyr,(2023).
d’Andrimont, R. et al. EUCROPMAP 2018. European Commission, Joint Research Centre (JRC) PID: http://data.europa.eu/89h/15f86c84-eae1-4723-8e00-c1b35c8f56b9 (2021).
QGIS Development Team. QGIS Geographic Information System. Open Source Geospatial Foundation Project. http://qgis.osgeo.org (2024).
Corcoran, E. et al. CYCleSS: the Crop Yields, Climate, Soils, and Satellites Dataset. figshare. https://doi.org/10.6084/m9.figshare.27225807 (2024).
Department of Environment Food and Rural Affairs. Agricultural facts summary. Available at: https://www.gov.uk/government/statistics/agricultural-facts-england-regional-profiles/agricultural-facts-summary (Department of Environment Food and Rural Affairs 2024).
SEPAL Development Team. SEPAL – System for Earth Observation, Data Access, Processing and Analysis for Land Monitoring. https://sepal.io/ (2024).
Mandal, D. et al. Dual polarimetric radar vegetation index for crop growth monitoring using sentinel-1 SAR data. Remote Sens Environ 247, 111954, https://doi.org/10.1016/j.rse.2020.111954 (2020).
Singha, C. & Swain, K. C. Rice crop growth monitoring with sentinel 1 SAR data using machine learning models in google earth engine cloud. RSASE 32, 101029, https://doi.org/10.1016/j.rsase.2023.101029 (2024).
Arias, M., Campo-Bescos, M.A., & Alvarez-Mozos, J. Crop classification based on temporal signatures of Sentinel-1 observations over Navarre province, Spain. Remote Sens. 12. https://www.mdpi.com/2072-4292/12/2/278. (2020).
Mandal, D. et al. Sen4Rice: A processing chain for differentiating early and late transplanted rice using time-series Sentinel-1 SAR data with Google Earth engine. IEEE Geosci. Remote. Sens. Lett 15, 1947–1951, https://doi.org/10.1109/LGRS.2018.2865816 (2018).
Van Tricht, K., Gobin, A., Gilliams, S. & Piccard, I. Synergistic use of radar Sentinel-1 and optical Sentinel-2 imagery for crop mapping: a case study for Belgium. Remote Sens. 10, 1642, https://doi.org/10.3390/rs10101642 (2018).
Whelen, T. & Siqueira, P. Time-series classification of Sentinel-1 agricultural data over North Dakota. Remote Sens Lett 9, 411–420, https://doi.org/10.1080/2150704X.2018.1430393 (2018).
Fikriyah, V. N., Darvishzadeh, R., Laborte, A., Khan, N. I. & Nelson, A. Discriminating transplanted and direct seeded rice using Sentinel-1 intensity data. Int. J. Appl. Earth Obs. Geoinf. 76, 143–153, https://doi.org/10.1016/j.jag.2018.11.007 (2019).
Singha, M., Dong, J., Zhang, G. & Xiao, X. High resolution paddy rice maps in cloud-prone Bangladesh and Northeast India using Sentinel-1 data. Scientific Data 6, 26, https://doi.org/10.1038/s41597-019-0036-3 (2019).
Hollis, D., McCarthy, M., Kendon, M., Legg, T. & Simpson, I. HadUK-Grid – A new UK dataset of gridded climate observations. Geosci. Data J. 6, 151–159, https://doi.org/10.1002/gdj3.78 (2019).
Serrano-Notivoli, R., Longares, L. A. & Camara, R. bioclim: An R package for bioclimatic classifications via adaptive water balance. Ecol Inform 71, 101810, https://doi.org/10.1016/j.ecoinf.2022.101810 (2022).
Muhammed, S., Milne, A., Marchant, B., Griffin, S., & Whitemore, A. Exploiting yield maps and soil management zones. AHDB (2016).
Nyeki, A. & Nemenyi, M. Crop yield prediction in precision agriculture. Agronomy 12, 2460, https://doi.org/10.3390/agronomy12102460 (2022).
Visser, O. & Sippel, S. R. & Thjemann, L. Imprecision farming? Examining the (in)accuracy and risks of digital agriculture. J Rural Stud 86, 623–632, https://doi.org/10.1016/j.jrurstud.2021.07.024 (2021).
Parida, B. R., Kumar, A. & Ranjan, A. K. Crop types discrimination and yield prediction using Sentinel-2 data and AquaCrop model in Hazaribagh District, Jharkhand. Cartogr Geogr Inf Sci 73, 77–89, https://doi.org/10.1007/s42489-021-00073-4 (2021).
Perich, G. et al. Pixel-based yield mapping and prediction from Sentinel-2 using spectral indices and neural networks. Field Crops Research 292, 108824, https://doi.org/10.1016/j.fcr.2023.108824 (2023).
Acknowledgements
This work was supported by Wave 1 of The UKRI Strategic Priorities Fund under the EPSRC grant EP/W006022/1, particularly the “Environment and Sustainability” theme within that grant and The Alan Turing Institute. The contributions by JWR and RFP were funded by the Natural Environment Research Council (NERC) under research programme NE/W005050/1 AgZero+: Towards sustainable, climate-neutral farming. AgZero + is an initiative jointly supported by NERC and the Biotechnology and Biological Sciences Research Council (BBSRC). We thank the European Space Agency (ESA) for providing the Sentinel-1 (S-1) synthetic aperture radar (SAR) images. Climate hydrology and ecology research support system meteorology dataset for Great Britain (1961–2012) [CHESS-met] data licensed from NERC – Centre for Ecology & Hydrology.
Author information
Authors and Affiliations
Contributions
E.C. led creation and development of the CYCLeSS dataset, and the writing of the manuscript. J.W.R. facilitated merging of the aligned remote sensing, climate soil and land use data with precision yield data so that farms from which precision yield data was collected could remain sufficiently anonymous, and contributed to writing, reviewing and editing the manuscript. All other authors contributed to the conceptual design of the CYCLeSS dataset, advised on its development, contributed to writing, reviewing, and editing the manuscript, and approved the submitted version.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Reprints and permissions
About this article
Cite this article
Corcoran, E., Bebber, D.P., Curceac, S. et al. A comprehensive UK crop yield dataset incorporating satellite, weather, and soil type information.
Sci Data (2026). https://doi.org/10.1038/s41597-025-06528-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-025-06528-x
Source: Ecology - nature.com
