in

A comprehensive UK crop yield dataset incorporating satellite, weather, and soil type information


Abstract

Agricultural research increasingly relies on data-driven approaches for crop yield prediction that complement more established crop growth models, including machine learning techniques. However, these approaches rely on large training datasets. Here, we present the Crop Yields, Climate, Soils, and Satellites (CYCleSS) dataset, a large-scale crop yield dataset derived from precision yield data for 934 fields across England on which a variety of crops are grown. In addition, the data also contains satellite-derived remote sensing data, weather data, and data on soil type, all aligned at a grid resolution of 10 km. Weather data is available at a daily temporal resolution, satellite data at 5-day resolution, while crop yield data is available at yearly resolution. This effort has been made possible through careful anonymisation of the yield data while preserving the alignment with remote sensing, weather, and soil data. This data will be useful both to train machine learning models of yield prediction as well as to parameterize mechanistic crop growth models. Furthermore, the anonymisation procedure itself will be of interest to the research community, as it represents a solution to a common problem on the interface of agricultural research and farming practice.

Data availability

All data comprising the final CYCleSS dataset is available through Figshare repository (https://doi.org/10.6084/m9.figshare.27225807)48. See the Data Records section for a detailed breakdown of the contents of this repository. Researchers who are further interested in the underlying data should contact the authors affiliated with UKCEH.

Code availability

R code used to merge and align available UK climate, soil, and Sentinel-1 synthetic aperture radar data to the same 1 km2 grid is provided in the following GitHub repository: https://github.com/alan-turing-institute/CYCLeSS-dataset-code. Dummy data and code needed to replicate the final process of merging climate, soil, and satellite data with UKCEH precision yield data and anonymisation of field locations is contained within the ‘CLYCESS_anonymisation.zip’ folder shared as part of this repository. R version 4.2.3 was used for the creation of this dataset.

References

  1. Fischer R.A., Byerlee D. & Edmeades G.O. Crop yields and global food security: will yield increase continue to feed the world? ACIAR Monograph No. 158. (Australian Centre for International Agricultural Research: Canberra, 2014).

  2. Hu, T. et al. Climate change impacts on crop yields: A review of empirical findings, statistical crop models, and machine learning methods. Environ Model Softw 179, 106119, https://doi.org/10.1016/j.envsoft.2024.106119 (2024).

    Google Scholar 

  3. Maestrini, B. et al. Mixing process-based and data-driven approaches in yield prediction. Eur J Agron 139, 126569, https://doi.org/10.1016/j.eja.2022.126569 (2022).

    Google Scholar 

  4. Silva, J. V. & Giller, K. E. Grand challenges for the 21st century: What crop models can and can’t (yet) do. J Agric Sci 158, 794–805, https://doi.org/10.1017/S0021859621000150 (2021).

    Google Scholar 

  5. van Klompenburg, T., Kassahun, A. & Catal, C. Crop yield prediction using machine learning: A systematic literature review. Comput Electron Agric 177, 105709, https://doi.org/10.1016/j.compag.2020.105709 (2020).

    Google Scholar 

  6. Corcoran, E. et al. Current data and modeling bottlenecks for predicting crop yields in the United Kingdom. Front. Sustain. Food Syst 7. https://doi.org/10.3389/fsufs.2023.1023169. (2023).

  7. Lischeid, G., Webber, H., Sommer, M., Nendel, C. & Ewert, F. Machine learning in crop yield modelling: A powerful tool, but no surrogate for science. Agric For Meteorol 312, 108698, https://doi.org/10.1016/j.agrformet.2021.108698 (2022).

    Google Scholar 

  8. Paudel, D. et al. Machine learning for large-scale crop yield forecasting. Agric. Syst 187, 103016, https://doi.org/10.1016/j.agsy.2020.103016 (2021).

    Google Scholar 

  9. Li, L. et al. Integrating machine learning and environmental variables to constrain uncertainty in crop yield change projections under climate change. Eur J Agron 149, 126917 https://www.sciencedirect.com/science/article/pii/S1161030123001855 (2023).

    Google Scholar 

  10. Morales, A. & Villalobos, F. J. Using machine learning for crop yield prediction in the past or the future. Front. Plant Sci. 14, 1128388, https://doi.org/10.3389/fpls.2023.1128388 (2023).

    Google Scholar 

  11. Willcock, S. et al. Machine learning for ecosystem services. Ecosyst. Serv 33, 165–174, https://doi.org/10.1016/j.ecoser.2018.04.004 (2018).

    Google Scholar 

  12. Pantazi, X. E., Moshou, D., Alexandridis, T., Whetton, R. L. & Mouazen, A. M. Wheat yield prediction using machine learning and advanced sensing techniques. Comput Electron Agric 121, 57–65, https://doi.org/10.1016/j.compag.2015.11.018 (2016).

    Google Scholar 

  13. Jiang, C., Guan, K., Huang, Y. & Jong, M. A vehicle imaging approach to acquire ground truth data for upscaling to satellite data: A case study for estimating harvesting dates. Remote Sens Environ 300, 113894, https://doi.org/10.1016/j.rse.2023.113894 (2024).

    Google Scholar 

  14. Cai, Y. et al. Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches. Agric For Meteorol 274, 144–159, https://doi.org/10.1016/j.agrformet.2019.03.010 (2019).

    Google Scholar 

  15. Bongiovanni, R. & Lowenberg-Deboer, J. Precision agriculture and sustainability. Precis. Agric 5, 359–387 (2004).

    Google Scholar 

  16. McBratney, A. & Whelan, B. Future directions of precision agriculture. Precis. Agric 6, 7–23, https://doi.org/10.1007/s11119-005-0681-8 (2005).

    Google Scholar 

  17. Gebbers, R. & Adamchuk, V. I. Precision agriculture and food security. Science 327, 828–831, https://doi.org/10.1126/science.1183899 (2010).

    Google Scholar 

  18. Al-Gaadi, K. A. et al. Prediction of potato crop yield using precision agriculture techniques. PLOS One. 11, eo162219, https://doi.org/10.1371/journal.pone.0162219 (2016).

    Google Scholar 

  19. Hunt, M. L., Blackburn, G. A., Carrasco, L., Redhead, J. W. & Rowland, C. S. High resolution wheat yield mapping using Sentinel-2. Remote Sens Environ 233, 111410, https://doi.org/10.1016/j.rse.2019.111410 (2019).

    Google Scholar 

  20. Mancini, F. et al. Chapter three – Detecting landscape scale consequences of insecticide use on invertebrate communities. Adv. Ecol. Res. 63, 93–126, https://doi.org/10.1016/bs.aecr.2020.07.001 (2020).

    Google Scholar 

  21. Fincham, W. N., Redhead, J. W., Woodcock, B. A. & Pywell, R. F. Exploring drivers of within-field crop yield variation using a national precision yield network. J Appl Ecol 60, 319–329, https://doi.org/10.1111/1365-2664.14323 (2022).

    Google Scholar 

  22. Wilkinson, M. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018, https://doi.org/10.1038/sdata.2016.18 (2016).

    Google Scholar 

  23. Kang, Y. et al. Comparative assessment of environmental variables and machine learning algorithms for maize yield prediction in the US Midwest. Environ. Res. Lett 15, 064005, https://doi.org/10.1088/1748-9326/ab7df9 (2020).

    Google Scholar 

  24. Khaki, S., & Wang, L. (2019). Crop Yield Prediction Using Deep Neural Networks. Front. Plant Sci. 10. https://doi.org/10.3389/fpls.2019.00621 (2019).

  25. Yang, Q., Shi, L., Han, J., Zha, Y. & Zhu, P. Deep convolutional neural networks for rice grain yield estimation at the ripening stage using UAV-based remotely sensed images. Field Crops Res 235, 142–153, https://doi.org/10.1016/j.fcr.2019.02.022 (2019).

    Google Scholar 

  26. Halder, M. et al. A Systematic Review on Crop Yield Prediction Using Machine Learning. In: Nguyen, T. D. L., Verdú, E., Le, A. N., Ganzha, M. (eds) Intelligent Systems and Networks. (ICISN 2023, Lecture Notes in Networks and Systems, vol 752, Springer, Singapore. https://doi.org/10.1007/978-981-99-4725-6_77 2023).

  27. Holzworth, D. P. et al. APSIM – Evolution towards a new generation of agricultural systems smulation. Environ Model Softw 62, 327–350, https://doi.org/10.1016/j.envsoft.2014.07.009 (2014).

    Google Scholar 

  28. Jamieson, P., Semenov, M. A., Brooking, I. R. & Francis, G. Sirius: A mechanistic model of wheat response to environmental variation. Eur J Agron 8, 161–179, https://doi.org/10.1016/S1161-0301(98)00020-3 (1998).

    Google Scholar 

  29. Manivasagam, V. S. & Rozenstein, O. Practices for upscaling crop simulation models from field scale to large regions. Comput Electron Agric 175, 105554, https://doi.org/10.1016/j.compag.2020.105554 (2020).

    Google Scholar 

  30. Peng, B. et al. Towards a multiscale crop modelling framework for climate change adaptation assessment. Nat. Plants 6, 338–348, https://doi.org/10.1038/s41477-020-0625-3 (2020).

    Google Scholar 

  31. Addy, J. W. G., Ellis, R. H., Macdonald, A. J., Semenov, M. A. & Mead, A. Investigating the effects of inter-annual weather variation (1968–2016) on the functional response of cereal grain yield to applied nitrogen, using data from the Rothamsted Long-Term Experiments. Agric For Meteorol 284, 107898, https://doi.org/10.1016/j.agrformet.2019.107898 (2020).

    Google Scholar 

  32. Jackson, R.D. Remote sensing of vegetation characteristics for farm management. (Remote Sensing: Critical Review of Technology, 475, 81-97, SPIE. https://doi.org/10.1117/12.966243 1984).

  33. Bauer, M. E. Spectral inputs to crop identification and condition assessment. Proceedings of the IEEE 73, 1071–1085, https://doi.org/10.1109/PROC.1985.13238 (1985).

    Google Scholar 

  34. Ma, T., Duan, Z., Li, R. & Song, X. Enhancing SWAT with remotely sensed LAI for improved modelling of ecohydrological process in subtropics. J. Hydrol 570, 802–815, https://doi.org/10.1016/j.jhydrol.2019.01.024 (2019).

    Google Scholar 

  35. Hayman, G. et al. A framework for improved predictions of the climate impacts on potential yields of UK winter wheat and its applicability to other UK crops. Clim. Serv 34, 100479, https://doi.org/10.1016/j.cliser.2024.100479 (2024).

    Google Scholar 

  36. Novelli, F., Spiegel, H., Sanden, T., & Vuolo, F. Assimilation of Sentinel-2 Leaf Area Index Data into a Physically-Based Crop Growth Model for Yield Estimation. Agronomy 9, 255, https://doi.org/10.3390/agronomy9050255 (2019).

  37. Pan, H., Chen, Z., de Wit, A. & Ren, J. Joint Assimilation of Leaf Area Index and Soil Moisture from Sentinel-1 and Sentinel-2 Data into the WOFOST Model for Winter Wheat Yield Estimation. Sensors 19, 3161, https://doi.org/10.3390/s19143161 (2019).

    Google Scholar 

  38. Zhuo, W. et al. Assimilating soil moisture retrieved from Sentinel-1 and Sentinel-2 data into WOFOST model to improve winter wheat yield estimation. Remote Sens 11, 1618, https://doi.org/10.3390/rs11131618 (2019).

    Google Scholar 

  39. Robinson, E.L. et al. Climate hydrology and ecology research support system potential evapotranspiration dataset for Great Britain (1961-2015) [CHESS-PE]. NERC Environmental Information Data Centre https://doi.org/10.5285/8baf805d-39ce-4dac-b224-c926ada353b7 (2016).

  40. Robinson, E. L. et al. Climate hydrology and ecology research support system meteorology dataset for Great Britain (1961–2015) [CHESS-met] v1.2. NERC Environmental Information Data Centre https://doi.org/10.5285/b745e7b1-626c-4ccc-ac27-56582e77b900 (2017).

  41. European Commission, Joint Research Centre (JRC) Maps of indicators of soil hydraulic properties for Europe. European Commission, Joint Research Centre (JRC) Dataset PID.: http://data.europa.eu/89h/jrc-esdac-39 (2016).

  42. Ballabio, C., Panagos, P. & Montanarella, L. Mapping topsoil physical properties at European scale using the LUCAS database. Geoderma 261, 110–123 (2016).

    Google Scholar 

  43. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org. (2024).

  44. Bivand, R., Keitt, T., & Rowlingson, B. rgdal: Bindings for the ‘Geospatial’ Data Abstraction Library. http://rgdal.r-forge.r-project.org (2023).

  45. Wickham, H., François, R., Henry, L., Müller, K. & Vaughan, D. dplyr: A Grammar of Data Manipulation. R package version 1.1.4, https://github.com/tidyverse/dplyr,(2023).

  46. d’Andrimont, R. et al. EUCROPMAP 2018. European Commission, Joint Research Centre (JRC) PID: http://data.europa.eu/89h/15f86c84-eae1-4723-8e00-c1b35c8f56b9 (2021).

  47. QGIS Development Team. QGIS Geographic Information System. Open Source Geospatial Foundation Project. http://qgis.osgeo.org (2024).

  48. Corcoran, E. et al. CYCleSS: the Crop Yields, Climate, Soils, and Satellites Dataset. figshare. https://doi.org/10.6084/m9.figshare.27225807 (2024).

    Google Scholar 

  49. Department of Environment Food and Rural Affairs. Agricultural facts summary. Available at: https://www.gov.uk/government/statistics/agricultural-facts-england-regional-profiles/agricultural-facts-summary (Department of Environment Food and Rural Affairs 2024).

  50. SEPAL Development Team. SEPAL – System for Earth Observation, Data Access, Processing and Analysis for Land Monitoring. https://sepal.io/ (2024).

  51. Mandal, D. et al. Dual polarimetric radar vegetation index for crop growth monitoring using sentinel-1 SAR data. Remote Sens Environ 247, 111954, https://doi.org/10.1016/j.rse.2020.111954 (2020).

    Google Scholar 

  52. Singha, C. & Swain, K. C. Rice crop growth monitoring with sentinel 1 SAR data using machine learning models in google earth engine cloud. RSASE 32, 101029, https://doi.org/10.1016/j.rsase.2023.101029 (2024).

    Google Scholar 

  53. Arias, M., Campo-Bescos, M.A., & Alvarez-Mozos, J. Crop classification based on temporal signatures of Sentinel-1 observations over Navarre province, Spain. Remote Sens. 12. https://www.mdpi.com/2072-4292/12/2/278. (2020).

  54. Mandal, D. et al. Sen4Rice: A processing chain for differentiating early and late transplanted rice using time-series Sentinel-1 SAR data with Google Earth engine. IEEE Geosci. Remote. Sens. Lett 15, 1947–1951, https://doi.org/10.1109/LGRS.2018.2865816 (2018).

    Google Scholar 

  55. Van Tricht, K., Gobin, A., Gilliams, S. & Piccard, I. Synergistic use of radar Sentinel-1 and optical Sentinel-2 imagery for crop mapping: a case study for Belgium. Remote Sens. 10, 1642, https://doi.org/10.3390/rs10101642 (2018).

    Google Scholar 

  56. Whelen, T. & Siqueira, P. Time-series classification of Sentinel-1 agricultural data over North Dakota. Remote Sens Lett 9, 411–420, https://doi.org/10.1080/2150704X.2018.1430393 (2018).

    Google Scholar 

  57. Fikriyah, V. N., Darvishzadeh, R., Laborte, A., Khan, N. I. & Nelson, A. Discriminating transplanted and direct seeded rice using Sentinel-1 intensity data. Int. J. Appl. Earth Obs. Geoinf. 76, 143–153, https://doi.org/10.1016/j.jag.2018.11.007 (2019).

    Google Scholar 

  58. Singha, M., Dong, J., Zhang, G. & Xiao, X. High resolution paddy rice maps in cloud-prone Bangladesh and Northeast India using Sentinel-1 data. Scientific Data 6, 26, https://doi.org/10.1038/s41597-019-0036-3 (2019).

    Google Scholar 

  59. Hollis, D., McCarthy, M., Kendon, M., Legg, T. & Simpson, I. HadUK-Grid – A new UK dataset of gridded climate observations. Geosci. Data J. 6, 151–159, https://doi.org/10.1002/gdj3.78 (2019).

    Google Scholar 

  60. Serrano-Notivoli, R., Longares, L. A. & Camara, R. bioclim: An R package for bioclimatic classifications via adaptive water balance. Ecol Inform 71, 101810, https://doi.org/10.1016/j.ecoinf.2022.101810 (2022).

    Google Scholar 

  61. Muhammed, S., Milne, A., Marchant, B., Griffin, S., & Whitemore, A. Exploiting yield maps and soil management zones. AHDB (2016).

  62. Nyeki, A. & Nemenyi, M. Crop yield prediction in precision agriculture. Agronomy 12, 2460, https://doi.org/10.3390/agronomy12102460 (2022).

    Google Scholar 

  63. Visser, O. & Sippel, S. R. & Thjemann, L. Imprecision farming? Examining the (in)accuracy and risks of digital agriculture. J Rural Stud 86, 623–632, https://doi.org/10.1016/j.jrurstud.2021.07.024 (2021).

    Google Scholar 

  64. Parida, B. R., Kumar, A. & Ranjan, A. K. Crop types discrimination and yield prediction using Sentinel-2 data and AquaCrop model in Hazaribagh District, Jharkhand. Cartogr Geogr Inf Sci 73, 77–89, https://doi.org/10.1007/s42489-021-00073-4 (2021).

    Google Scholar 

  65. Perich, G. et al. Pixel-based yield mapping and prediction from Sentinel-2 using spectral indices and neural networks. Field Crops Research 292, 108824, https://doi.org/10.1016/j.fcr.2023.108824 (2023).

    Google Scholar 

Download references

Acknowledgements

This work was supported by Wave 1 of The UKRI Strategic Priorities Fund under the EPSRC grant EP/W006022/1, particularly the “Environment and Sustainability” theme within that grant and The Alan Turing Institute. The contributions by JWR and RFP were funded by the Natural Environment Research Council (NERC) under research programme NE/W005050/1 AgZero+: Towards sustainable, climate-neutral farming. AgZero + is an initiative jointly supported by NERC and the Biotechnology and Biological Sciences Research Council (BBSRC). We thank the European Space Agency (ESA) for providing the Sentinel-1 (S-1) synthetic aperture radar (SAR) images. Climate hydrology and ecology research support system meteorology dataset for Great Britain (1961–2012) [CHESS-met] data licensed from NERC – Centre for Ecology & Hydrology.

Author information

Authors and Affiliations

Authors

Contributions

E.C. led creation and development of the CYCLeSS dataset, and the writing of the manuscript. J.W.R. facilitated merging of the aligned remote sensing, climate soil and land use data with precision yield data so that farms from which precision yield data was collected could remain sufficiently anonymous, and contributed to writing, reviewing and editing the manuscript. All other authors contributed to the conceptual design of the CYCLeSS dataset, advised on its development, contributed to writing, reviewing, and editing the manuscript, and approved the submitted version.

Corresponding author

Correspondence to
Sebastian E. Ahnert.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Corcoran, E., Bebber, D.P., Curceac, S. et al. A comprehensive UK crop yield dataset incorporating satellite, weather, and soil type information.
Sci Data (2026). https://doi.org/10.1038/s41597-025-06528-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41597-025-06528-x


Source: Ecology - nature.com

Mental models of the sixth mass extinction reveal pathways for transformative sustainability action

Thermal variation associated stress response regulates the growth and reproductive potential of soybean looper

Back to Top