in

Integrating transformer-based learning and Sentinel-2 bare soil composites for soil organic carbon mapping in the black soil region of Northeast China


Abstract

Accurate assessment of soil organic carbon (SOC) is essential for sustainable cropland management and carbon sequestration monitoring. However, high-resolution SOC mapping remains challenging due to two persistent limitations: (1) the difficulty of extracting true bare-soil reflectance—especially when single-date imagery is used and spectral signals remain influenced by vegetation, residue, and soil moisture; and (2) reliance on models that require large training datasets and may underperform in typical small-sample soil survey settings. To address these challenges, we developed an approach that integrates multi-temporal Sentinel-2 bare-soil composites with a transformer-based foundation model—Tabular Prior-data Fitted Network (TabPFN)—for SOC prediction in the black soil region of Northeast China. Bare soil pixels were extracted using a Normalized Difference Vegetation Index threshold (0.1–0.4), and two compositing strategies—the 50th percentile (P50) and 90th percentile (P90)—were compared. We systematically evaluated three advanced algorithms: TabPFN, convolutional neural network (CNN), and Extreme Gradient Boosting (XGBoost). Results demonstrated that the TabPFN model coupled with P50 composites achieved the highest prediction accuracy (R2 = 0.78, RMSE = 1.90 g kg⁻1), outperforming CNN and XGBoost by 4–6%. TabPFN’s distinct advantage lies in its design as a prior-data fitted transformer, which enables robust generalization from limited samples (N = 174) without extensive hyperparameter tuning, effectively addressing the “small data” challenge pervasive in digital soil mapping. SHapley Additive exPlanations analysis indicated that shortwave infrared band (B12) and precipitation have the greatest effect on model output, indicating joint role of soil spectral response and climate variability. This is one of the first studies to apply the TabPFN architecture to SOC estimation, offering a novel, interpretable, and scalable workflow that bridges the gap between data scarcity and model complexity. The proposed framework provides a reliable tool for high-resolution SOC mapping in heterogeneous croplands, supporting precision agriculture and long-term carbon accounting initiatives.

Similar content being viewed by others

A national soil organic carbon density dataset (2010–2024) in China

Innovative approaches in soil carbon sequestration modelling for better prediction with limited data

Predicting soil organic carbon with ensemble learning techniques by using satellite images for precision farming

Data availability

The datasets analyzed during the current study are not publicly available due to existing agreements and data-use restrictions but are available from the corresponding author on reasonable request.

References

  1. Lal, R. Soil carbon sequestration impacts on global climate change and food security. Science 304, 1623–1627. https://doi.org/10.1126/science.1097396 (2004).

    Google Scholar 

  2. Leifeld, J. & Menichetti, L. The underappreciated potential of peatlands in global climate change mitigation strategies. Nat. Commun. 9, 1071. https://doi.org/10.1038/s41467-018-03406-6 (2018).

    Google Scholar 

  3. Six, J., Elliott, E. T. & Paustian, K. Soil Structure and Soil Organic Matter II. A Normalized Stability Index and the effect of mineralogy. Soil Sci. Soc. Am. J. 64, 1042–1049. https://doi.org/10.2136/sssaj2000.6431042x (2000).

    Google Scholar 

  4. Minasny, B., McBratney, A. B., Malone, B. P. & Wheeler, I. in Advances in Agronomy Vol. 118 (ed Donald L. Sparks) 1–47 (Academic Press, 2013).

  5. Qi, L., Ma, J., Sun, Q. & Shi, P. Mapping soil organic carbon sequestration potential in croplands using a combined proximal and remote sensing approach. Soil Tillage Res. 254, 106733. https://doi.org/10.1016/j.still.2025.106733 (2025).

    Google Scholar 

  6. Minasny, B. et al. Soil carbon 4 per mille. Geoderma 292, 59–86. https://doi.org/10.1016/j.geoderma.2017.01.002 (2017).

    Google Scholar 

  7. Chen, S. et al. Digital mapping of GlobalSoilMap soil properties at a broad scale: A review. Geoderma 409, 115567. https://doi.org/10.1016/j.geoderma.2021.115567 (2022).

    Google Scholar 

  8. Zhou, F. et al. Integrating historical crop rotation changes into soil organic matter mapping in the Cropland of Southeastern China. Earth’s Future 13, e2025EF006117, https://doi.org/10.1029/2025EF006117 (2025).

  9. Shi, P., Six, J., Sila, A., Vanlauwe, B. & Van Oost, K. Towards spatially continuous mapping of soil organic carbon in croplands using multitemporal Sentinel-2 remote sensing. ISPRS J. Photogramm. Remote. Sens. 193, 187–199. https://doi.org/10.1016/j.isprsjprs.2022.09.013 (2022).

    Google Scholar 

  10. McBratney, A. B., Mendonça Santos, M. L. & Minasny, B. On digital soil mapping. Geoderma 117, 3–52. https://doi.org/10.1016/S0016-7061(03)00223-4 (2003).

    Google Scholar 

  11. Poppiel, R. R., Paiva, AFd. S. & Demattê, J. A. M. Bridging the gap between soil spectroscopy and traditional laboratory: Insights for routine implementation. Geoderma 425, 116029. https://doi.org/10.1016/j.geoderma.2022.116029 (2022).

    Google Scholar 

  12. Castaldi, F., Chabrillat, S. & van Wesemael, B. Sampling strategies for soil property mapping using multispectral Sentinel-2 and Hyperspectral EnMAP Satellite Data. Remote Sensing 11, 309 (2019).

    Google Scholar 

  13. Gholizadeh, A., Žižala, D., Saberioon, M. & Borůvka, L. Soil organic carbon and texture retrieving and mapping using proximal, airborne and Sentinel-2 spectral imaging. Remote Sens. Environ. 218, 89–103. https://doi.org/10.1016/j.rse.2018.09.015 (2018).

    Google Scholar 

  14. Wadoux, A. M. J. C., Minasny, B. & McBratney, A. B. Machine learning for digital soil mapping: Applications, challenges and suggested solutions. Earth Sci. Rev. 210, 103359. https://doi.org/10.1016/j.earscirev.2020.103359 (2020).

    Google Scholar 

  15. Dvorakova, K., Shi, P., Limbourg, Q. & van Wesemael, B. Soil Organic carbon mapping from remote sensing: The effect of crop residues. Remote Sens. https://doi.org/10.3390/rs12121913 (2020).

    Google Scholar 

  16. Rogge, D. et al. Building an exposed soil composite processor (SCMaP) for mapping spatial and temporal characteristics of soils with Landsat imagery (1984–2014). Remote Sens. Environ. 205, 1–17. https://doi.org/10.1016/j.rse.2017.11.004 (2018).

    Google Scholar 

  17. Melo Dematte, J. A., Fongaro, C. T., Rizzo, R. & Safanelli, J. L. Geospatial Soil Sensing System (GEOS3): A powerful data mining procedure to retrieve soil spectral reflectance from satellite images. Remote Sens. Environ. 212, 161–217. https://doi.org/10.1016/j.rse.2018.04.047 (2018).

    Google Scholar 

  18. Castaldi, F. et al. Evaluating the capability of the Sentinel 2 data for soil organic carbon prediction in croplands. ISPRS J. Photogramm. Remote. Sens. 147, 267–282. https://doi.org/10.1016/j.isprsjprs.2018.11.026 (2019).

    Google Scholar 

  19. Vaudour, E. et al. Temporal mosaicking approaches of Sentinel-2 images for extending topsoil organic carbon content mapping in croplands. Int. J. Appl. Earth Obs. Geoinf. 96, 102277. https://doi.org/10.1016/j.jag.2020.102277 (2021).

    Google Scholar 

  20. Xue, J. et al. National-scale mapping topsoil organic carbon of cropland in China using multitemporal Sentinel-2 images. Geoderma 456, 117272. https://doi.org/10.1016/j.geoderma.2025.117272 (2025).

    Google Scholar 

  21. Zhu, Y., Qi, L., Wu, Z. & Shi, P. Spectra-based predictive mapping of soil organic carbon in croplands: Single-date versus multitemporal bare soil compositing approaches. Geoderma 449, 116987. https://doi.org/10.1016/j.geoderma.2024.116987 (2024).

    Google Scholar 

  22. Castaldi, F. et al. Assessing the capability of Sentinel-2 time-series to estimate soil organic carbon and clay content at local scale in croplands. ISPRS J. Photogramm. Remote. Sens. 199, 40–60. https://doi.org/10.1016/j.isprsjprs.2023.03.016 (2023).

    Google Scholar 

  23. Hong, Y. et al. Bridging the gap between laboratory VNIR-SWIR spectra and Landsat-8 bare soil composite image for soil organic carbon prediction. Remote Sens. Environ. 328, 114874. https://doi.org/10.1016/j.rse.2025.114874 (2025).

    Google Scholar 

  24. Zhang, M.-W. et al. Predicting spatial–temporal soil organic matter dynamics in a Mollisols region of the northern Songnen Plain, China, during 2009–2018 using a spectral-temporal feature set. Geoderma 461, 117461. https://doi.org/10.1016/j.geoderma.2025.117461 (2025).

    Google Scholar 

  25. Minasny, B. & McBratney, A. B. Digital soil mapping: A brief history and some lessons. Geoderma 264, 301–311. https://doi.org/10.1016/j.geoderma.2015.07.017 (2016).

    Google Scholar 

  26. Vaudour, E. et al. Satellite imagery to map topsoil organic carbon content over cultivated areas: An overview. Remote Sens. https://doi.org/10.3390/rs14122917 (2022).

    Google Scholar 

  27. Padarian, J., Minasny, B. & McBratney, A. B. Using deep learning for digital soil mapping. SOIL 5, 79–89. https://doi.org/10.5194/soil-5-79-2019 (2019).

    Google Scholar 

  28. Meng, X., Bao, Y., Wang, Y., Zhang, X. & Liu, H. An advanced soil organic carbon content prediction model via fused temporal-spatial-spectral (TSS) information based on machine learning and deep learning algorithms. Remote Sens. Environ. https://doi.org/10.1016/j.rse.2022.113166 (2022).

    Google Scholar 

  29. Meng, X., Bao, Y., Luo, C., Zhang, X. & Liu, H. SOC content of global Mollisols at a 30 m spatial resolution from 1984 to 2021 generated by the novel ML-CNN prediction model. Remote Sens. Environ. 300, 113911. https://doi.org/10.1016/j.rse.2023.113911 (2024).

    Google Scholar 

  30. Žížala, D. et al. Soil sampling design matters – Enhancing the efficiency of digital soil mapping at the field scale. Geoderma Reg. 39, e00874. https://doi.org/10.1016/j.geodrs.2024.e00874 (2024).

    Google Scholar 

  31. Hollmann, N. et al. Accurate predictions on small data with a tabular foundation model. Nature 637, 319–326. https://doi.org/10.1038/s41586-024-08328-6 (2025).

    Google Scholar 

  32. Heiden, U. et al. Soil reflectance composites-improved thresholding and performance evaluation. Remote Sens. https://doi.org/10.3390/rs14184526 (2022).

    Google Scholar 

  33. Kalopesa, E. et al. Large-scale soil organic carbon estimation via a multisource data fusion approach. Remote Sens. 17, 771 (2025).

    Google Scholar 

  34. Zhang, Y. et al. Estimation of coastal wetland soil organic carbon content in Western Bohai Bay Using Remote Sensing, Climate, and Topographic Data. Remote Sens. https://doi.org/10.3390/rs15174241 (2023).

    Google Scholar 

  35. Song, J. et al. Mapping soil organic matter in cultivated land based on multi-year composite images on monthly time scales. J. Integr. Agric. 23, 1393–1408. https://doi.org/10.1016/j.jia.2023.09.017 (2024).

    Google Scholar 

  36. Ben-Dor, E., Inbar, Y. & Chen, Y. The reflectance spectra of organic matter in the visible near-infrared and short wave infrared region (400–2500 nm) during a controlled decomposition process. Remote Sens. Environ. 61, 1–15. https://doi.org/10.1016/S0034-4257(96)00120-4 (1997).

    Google Scholar 

  37. Angelopoulou, T., Tziolas, N., Balafoutis, A., Zalidis, G. & Bochtis, D. Remote sensing techniques for soil organic carbon estimation: A review. Remote Sens. 11, 676 (2019).

    Google Scholar 

  38. Chen, X. et al. Effects of precipitation on soil organic carbon fractions in three subtropical forests in southern China. J. of Plant Ecol. 9, 10–19. https://doi.org/10.1093/jpe/rtv027 (2015).

    Google Scholar 

  39. Pallandt, M. et al. Modelling the effect of climate–substrate interactions on soil organic matter decomposition with the Jena Soil Model. Biogeosciences 22, 1907–1928. https://doi.org/10.5194/bg-22-1907-2025 (2025).

    Google Scholar 

  40. Wu, Y. et al. Mechanisms behind the soil organic carbon response to temperature elevations. Agriculture 15, 1118 (2025).

    Google Scholar 

  41. Zhao, M. et al. Soil mineral-associated organic carbon and its relationship to clay minerals across grassland transects in China. Appl. Sci. 14, 2061 (2024).

    Google Scholar 

  42. Xu, Z. & Tsang, D. C. W. Mineral-mediated stability of organic carbon in soil and relevant interaction mechanisms. Eco-Environ. Health 3, 59–76. https://doi.org/10.1016/j.eehl.2023.12.003 (2024).

    Google Scholar 

  43. Xue, B. et al. Effect of clay mineralogy and soil organic carbon in aggregates under straw incorporation. Agronomy 12, 534 (2022).

    Google Scholar 

  44. Minasny, B. et al. Soil science-informed machine learning. Geoderma 452, 117094. https://doi.org/10.1016/j.geoderma.2024.117094 (2024).

    Google Scholar 

  45. Ge, H. et al. Enhancing yield prediction in maize breeding using UAV-derived RGB imagery: a novel classification-integrated regression approach. Front. Plant Sci. https://doi.org/10.3389/fpls.2025.1511871 (2025).

    Google Scholar 

  46. Dong, Y. et al. A 30-m annual corn residue coverage dataset from 2013 to 2021 in Northeast China. Sci. Data 11, 216. https://doi.org/10.1038/s41597-024-02998-7 (2024).

    Google Scholar 

  47. Ma, J. & Shi, P. Remotely sensed inter-field variation in soil organic carbon content as influenced by the cumulative effect of conservation tillage in northeast China. Soil Tillage Res. 243, 106170. https://doi.org/10.1016/j.still.2024.106170 (2024).

    Google Scholar 

  48. Nelson, D. W. & Sommers, L. E. in Methods of Soil Analysis 539–579 (1982).

  49. Broeg, T., Don, A., Wiesmeier, M., Scholten, T. & Erasmi, S. Spatiotemporal Monitoring of Cropland Soil Organic Carbon Changes From Space. Glob. Change Biol. 30, e17608. https://doi.org/10.1111/gcb.17608 (2024).

    Google Scholar 

  50. Broeg, T. et al. Using local ensemble models and Landsat bare soil composites for large-scale soil organic carbon maps in cropland. Geoderma 444, 116850. https://doi.org/10.1016/j.geoderma.2024.116850 (2024).

    Google Scholar 

  51. Zayani, H. et al. Using machine-learning algorithms to predict soil organic carbon content from combined remote sensing imagery and laboratory vis-NIR spectral datasets. Remote Sens. https://doi.org/10.3390/rs15174264 (2023).

    Google Scholar 

  52. Wang, Q. et al. Incorporating agricultural practices in digital mapping improves prediction of cropland soil organic carbon content: The case of the Tuojiang River Basin. J. Environ. Manage. 330, 117203. https://doi.org/10.1016/j.jenvman.2022.117203 (2023).

    Google Scholar 

  53. Lundberg, S. M. & Lee, S.-I. in Proceedings of the 31st International Conference on Neural Information Processing Systems 4768–4777 (Curran Associates Inc., Long Beach, California, USA, 2017).

  54. Thaler, E. A., Larsen, I. J. & Yu, Q. A New Index for remote sensing of soil organic carbon based solely on visible wavelengths. Soil Sci. Soc. Am. J. 83, 1443–1450. https://doi.org/10.2136/sssaj2018.09.0318 (2019).

    Google Scholar 

  55. Marques, M. J., Alvarez, A., Carral, P., Sastre, B. & Bienes, R. The use of remote sensing to detect the consequences of erosion in gypsiferous soils. Int. Soil Water Conserv. Res. 8, 383–392. https://doi.org/10.1016/j.iswcr.2020.10.001 (2020).

    Google Scholar 

  56. Castaldi, F., Chabrillat, S., Don, A. & van Wesemael, B. Soil organic carbon mapping using LUCAS topsoil database and sentinel-2 data: An approach to reduce soil moisture and crop residue effects. Remote Sens. 11, 2121 (2019).

    Google Scholar 

Download references

Funding

This research was funded by the “Study on the Retrogressive Erosion Mechanism of Gully Heads with Different Parent Materials in Black Soil Regions of Low Mountains and Hills” project of Natural Science Foundation of Jilin Province, China (20250102200JC).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, Song Wu, Na Chen, and Zhikang Wei; methodology, Na Chen and Xuancheng Jin; software, Nan Lin; validation, Nan Lin; formal analysis, Zhikang Wei; resources, Ling Zhao and Na Chen; data curation, Xuancheng Jin; writing—original draft preparation, Song Wu, Zhikang Wei and Na Chen; writing—review and editing, Xuancheng Jin, Na Chen and Song Wu; visualization, Song Wu and Xuancheng Jin; supervision, Nan Lin; project administration, Song Wu; funding acquisition, Fan Yang. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to
Song Wu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, N., Wei, Z., Jin, X. et al. Integrating transformer-based learning and Sentinel-2 bare soil composites for soil organic carbon mapping in the black soil region of Northeast China.
Sci Rep (2026). https://doi.org/10.1038/s41598-025-33682-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41598-025-33682-4

Keywords

  • Soil organic carbon
  • Sentinel-2
  • Digital soil mapping
  • Bare soil composite
  • TabPFN
  • SHAP


Source: Ecology - nature.com

Phage-mediated resistome dynamics in global aquifers

Assessing mining impacts in the deep sea

Back to Top