Abstract
Accurate assessment of soil organic carbon (SOC) is essential for sustainable cropland management and carbon sequestration monitoring. However, high-resolution SOC mapping remains challenging due to two persistent limitations: (1) the difficulty of extracting true bare-soil reflectance—especially when single-date imagery is used and spectral signals remain influenced by vegetation, residue, and soil moisture; and (2) reliance on models that require large training datasets and may underperform in typical small-sample soil survey settings. To address these challenges, we developed an approach that integrates multi-temporal Sentinel-2 bare-soil composites with a transformer-based foundation model—Tabular Prior-data Fitted Network (TabPFN)—for SOC prediction in the black soil region of Northeast China. Bare soil pixels were extracted using a Normalized Difference Vegetation Index threshold (0.1–0.4), and two compositing strategies—the 50th percentile (P50) and 90th percentile (P90)—were compared. We systematically evaluated three advanced algorithms: TabPFN, convolutional neural network (CNN), and Extreme Gradient Boosting (XGBoost). Results demonstrated that the TabPFN model coupled with P50 composites achieved the highest prediction accuracy (R2 = 0.78, RMSE = 1.90 g kg⁻1), outperforming CNN and XGBoost by 4–6%. TabPFN’s distinct advantage lies in its design as a prior-data fitted transformer, which enables robust generalization from limited samples (N = 174) without extensive hyperparameter tuning, effectively addressing the “small data” challenge pervasive in digital soil mapping. SHapley Additive exPlanations analysis indicated that shortwave infrared band (B12) and precipitation have the greatest effect on model output, indicating joint role of soil spectral response and climate variability. This is one of the first studies to apply the TabPFN architecture to SOC estimation, offering a novel, interpretable, and scalable workflow that bridges the gap between data scarcity and model complexity. The proposed framework provides a reliable tool for high-resolution SOC mapping in heterogeneous croplands, supporting precision agriculture and long-term carbon accounting initiatives.
Similar content being viewed by others
A national soil organic carbon density dataset (2010–2024) in China
Innovative approaches in soil carbon sequestration modelling for better prediction with limited data
Predicting soil organic carbon with ensemble learning techniques by using satellite images for precision farming
Data availability
The datasets analyzed during the current study are not publicly available due to existing agreements and data-use restrictions but are available from the corresponding author on reasonable request.
References
Lal, R. Soil carbon sequestration impacts on global climate change and food security. Science 304, 1623–1627. https://doi.org/10.1126/science.1097396 (2004).
Leifeld, J. & Menichetti, L. The underappreciated potential of peatlands in global climate change mitigation strategies. Nat. Commun. 9, 1071. https://doi.org/10.1038/s41467-018-03406-6 (2018).
Six, J., Elliott, E. T. & Paustian, K. Soil Structure and Soil Organic Matter II. A Normalized Stability Index and the effect of mineralogy. Soil Sci. Soc. Am. J. 64, 1042–1049. https://doi.org/10.2136/sssaj2000.6431042x (2000).
Minasny, B., McBratney, A. B., Malone, B. P. & Wheeler, I. in Advances in Agronomy Vol. 118 (ed Donald L. Sparks) 1–47 (Academic Press, 2013).
Qi, L., Ma, J., Sun, Q. & Shi, P. Mapping soil organic carbon sequestration potential in croplands using a combined proximal and remote sensing approach. Soil Tillage Res. 254, 106733. https://doi.org/10.1016/j.still.2025.106733 (2025).
Minasny, B. et al. Soil carbon 4 per mille. Geoderma 292, 59–86. https://doi.org/10.1016/j.geoderma.2017.01.002 (2017).
Chen, S. et al. Digital mapping of GlobalSoilMap soil properties at a broad scale: A review. Geoderma 409, 115567. https://doi.org/10.1016/j.geoderma.2021.115567 (2022).
Zhou, F. et al. Integrating historical crop rotation changes into soil organic matter mapping in the Cropland of Southeastern China. Earth’s Future 13, e2025EF006117, https://doi.org/10.1029/2025EF006117 (2025).
Shi, P., Six, J., Sila, A., Vanlauwe, B. & Van Oost, K. Towards spatially continuous mapping of soil organic carbon in croplands using multitemporal Sentinel-2 remote sensing. ISPRS J. Photogramm. Remote. Sens. 193, 187–199. https://doi.org/10.1016/j.isprsjprs.2022.09.013 (2022).
McBratney, A. B., Mendonça Santos, M. L. & Minasny, B. On digital soil mapping. Geoderma 117, 3–52. https://doi.org/10.1016/S0016-7061(03)00223-4 (2003).
Poppiel, R. R., Paiva, AFd. S. & Demattê, J. A. M. Bridging the gap between soil spectroscopy and traditional laboratory: Insights for routine implementation. Geoderma 425, 116029. https://doi.org/10.1016/j.geoderma.2022.116029 (2022).
Castaldi, F., Chabrillat, S. & van Wesemael, B. Sampling strategies for soil property mapping using multispectral Sentinel-2 and Hyperspectral EnMAP Satellite Data. Remote Sensing 11, 309 (2019).
Gholizadeh, A., Žižala, D., Saberioon, M. & Borůvka, L. Soil organic carbon and texture retrieving and mapping using proximal, airborne and Sentinel-2 spectral imaging. Remote Sens. Environ. 218, 89–103. https://doi.org/10.1016/j.rse.2018.09.015 (2018).
Wadoux, A. M. J. C., Minasny, B. & McBratney, A. B. Machine learning for digital soil mapping: Applications, challenges and suggested solutions. Earth Sci. Rev. 210, 103359. https://doi.org/10.1016/j.earscirev.2020.103359 (2020).
Dvorakova, K., Shi, P., Limbourg, Q. & van Wesemael, B. Soil Organic carbon mapping from remote sensing: The effect of crop residues. Remote Sens. https://doi.org/10.3390/rs12121913 (2020).
Rogge, D. et al. Building an exposed soil composite processor (SCMaP) for mapping spatial and temporal characteristics of soils with Landsat imagery (1984–2014). Remote Sens. Environ. 205, 1–17. https://doi.org/10.1016/j.rse.2017.11.004 (2018).
Melo Dematte, J. A., Fongaro, C. T., Rizzo, R. & Safanelli, J. L. Geospatial Soil Sensing System (GEOS3): A powerful data mining procedure to retrieve soil spectral reflectance from satellite images. Remote Sens. Environ. 212, 161–217. https://doi.org/10.1016/j.rse.2018.04.047 (2018).
Castaldi, F. et al. Evaluating the capability of the Sentinel 2 data for soil organic carbon prediction in croplands. ISPRS J. Photogramm. Remote. Sens. 147, 267–282. https://doi.org/10.1016/j.isprsjprs.2018.11.026 (2019).
Vaudour, E. et al. Temporal mosaicking approaches of Sentinel-2 images for extending topsoil organic carbon content mapping in croplands. Int. J. Appl. Earth Obs. Geoinf. 96, 102277. https://doi.org/10.1016/j.jag.2020.102277 (2021).
Xue, J. et al. National-scale mapping topsoil organic carbon of cropland in China using multitemporal Sentinel-2 images. Geoderma 456, 117272. https://doi.org/10.1016/j.geoderma.2025.117272 (2025).
Zhu, Y., Qi, L., Wu, Z. & Shi, P. Spectra-based predictive mapping of soil organic carbon in croplands: Single-date versus multitemporal bare soil compositing approaches. Geoderma 449, 116987. https://doi.org/10.1016/j.geoderma.2024.116987 (2024).
Castaldi, F. et al. Assessing the capability of Sentinel-2 time-series to estimate soil organic carbon and clay content at local scale in croplands. ISPRS J. Photogramm. Remote. Sens. 199, 40–60. https://doi.org/10.1016/j.isprsjprs.2023.03.016 (2023).
Hong, Y. et al. Bridging the gap between laboratory VNIR-SWIR spectra and Landsat-8 bare soil composite image for soil organic carbon prediction. Remote Sens. Environ. 328, 114874. https://doi.org/10.1016/j.rse.2025.114874 (2025).
Zhang, M.-W. et al. Predicting spatial–temporal soil organic matter dynamics in a Mollisols region of the northern Songnen Plain, China, during 2009–2018 using a spectral-temporal feature set. Geoderma 461, 117461. https://doi.org/10.1016/j.geoderma.2025.117461 (2025).
Minasny, B. & McBratney, A. B. Digital soil mapping: A brief history and some lessons. Geoderma 264, 301–311. https://doi.org/10.1016/j.geoderma.2015.07.017 (2016).
Vaudour, E. et al. Satellite imagery to map topsoil organic carbon content over cultivated areas: An overview. Remote Sens. https://doi.org/10.3390/rs14122917 (2022).
Padarian, J., Minasny, B. & McBratney, A. B. Using deep learning for digital soil mapping. SOIL 5, 79–89. https://doi.org/10.5194/soil-5-79-2019 (2019).
Meng, X., Bao, Y., Wang, Y., Zhang, X. & Liu, H. An advanced soil organic carbon content prediction model via fused temporal-spatial-spectral (TSS) information based on machine learning and deep learning algorithms. Remote Sens. Environ. https://doi.org/10.1016/j.rse.2022.113166 (2022).
Meng, X., Bao, Y., Luo, C., Zhang, X. & Liu, H. SOC content of global Mollisols at a 30 m spatial resolution from 1984 to 2021 generated by the novel ML-CNN prediction model. Remote Sens. Environ. 300, 113911. https://doi.org/10.1016/j.rse.2023.113911 (2024).
Žížala, D. et al. Soil sampling design matters – Enhancing the efficiency of digital soil mapping at the field scale. Geoderma Reg. 39, e00874. https://doi.org/10.1016/j.geodrs.2024.e00874 (2024).
Hollmann, N. et al. Accurate predictions on small data with a tabular foundation model. Nature 637, 319–326. https://doi.org/10.1038/s41586-024-08328-6 (2025).
Heiden, U. et al. Soil reflectance composites-improved thresholding and performance evaluation. Remote Sens. https://doi.org/10.3390/rs14184526 (2022).
Kalopesa, E. et al. Large-scale soil organic carbon estimation via a multisource data fusion approach. Remote Sens. 17, 771 (2025).
Zhang, Y. et al. Estimation of coastal wetland soil organic carbon content in Western Bohai Bay Using Remote Sensing, Climate, and Topographic Data. Remote Sens. https://doi.org/10.3390/rs15174241 (2023).
Song, J. et al. Mapping soil organic matter in cultivated land based on multi-year composite images on monthly time scales. J. Integr. Agric. 23, 1393–1408. https://doi.org/10.1016/j.jia.2023.09.017 (2024).
Ben-Dor, E., Inbar, Y. & Chen, Y. The reflectance spectra of organic matter in the visible near-infrared and short wave infrared region (400–2500 nm) during a controlled decomposition process. Remote Sens. Environ. 61, 1–15. https://doi.org/10.1016/S0034-4257(96)00120-4 (1997).
Angelopoulou, T., Tziolas, N., Balafoutis, A., Zalidis, G. & Bochtis, D. Remote sensing techniques for soil organic carbon estimation: A review. Remote Sens. 11, 676 (2019).
Chen, X. et al. Effects of precipitation on soil organic carbon fractions in three subtropical forests in southern China. J. of Plant Ecol. 9, 10–19. https://doi.org/10.1093/jpe/rtv027 (2015).
Pallandt, M. et al. Modelling the effect of climate–substrate interactions on soil organic matter decomposition with the Jena Soil Model. Biogeosciences 22, 1907–1928. https://doi.org/10.5194/bg-22-1907-2025 (2025).
Wu, Y. et al. Mechanisms behind the soil organic carbon response to temperature elevations. Agriculture 15, 1118 (2025).
Zhao, M. et al. Soil mineral-associated organic carbon and its relationship to clay minerals across grassland transects in China. Appl. Sci. 14, 2061 (2024).
Xu, Z. & Tsang, D. C. W. Mineral-mediated stability of organic carbon in soil and relevant interaction mechanisms. Eco-Environ. Health 3, 59–76. https://doi.org/10.1016/j.eehl.2023.12.003 (2024).
Xue, B. et al. Effect of clay mineralogy and soil organic carbon in aggregates under straw incorporation. Agronomy 12, 534 (2022).
Minasny, B. et al. Soil science-informed machine learning. Geoderma 452, 117094. https://doi.org/10.1016/j.geoderma.2024.117094 (2024).
Ge, H. et al. Enhancing yield prediction in maize breeding using UAV-derived RGB imagery: a novel classification-integrated regression approach. Front. Plant Sci. https://doi.org/10.3389/fpls.2025.1511871 (2025).
Dong, Y. et al. A 30-m annual corn residue coverage dataset from 2013 to 2021 in Northeast China. Sci. Data 11, 216. https://doi.org/10.1038/s41597-024-02998-7 (2024).
Ma, J. & Shi, P. Remotely sensed inter-field variation in soil organic carbon content as influenced by the cumulative effect of conservation tillage in northeast China. Soil Tillage Res. 243, 106170. https://doi.org/10.1016/j.still.2024.106170 (2024).
Nelson, D. W. & Sommers, L. E. in Methods of Soil Analysis 539–579 (1982).
Broeg, T., Don, A., Wiesmeier, M., Scholten, T. & Erasmi, S. Spatiotemporal Monitoring of Cropland Soil Organic Carbon Changes From Space. Glob. Change Biol. 30, e17608. https://doi.org/10.1111/gcb.17608 (2024).
Broeg, T. et al. Using local ensemble models and Landsat bare soil composites for large-scale soil organic carbon maps in cropland. Geoderma 444, 116850. https://doi.org/10.1016/j.geoderma.2024.116850 (2024).
Zayani, H. et al. Using machine-learning algorithms to predict soil organic carbon content from combined remote sensing imagery and laboratory vis-NIR spectral datasets. Remote Sens. https://doi.org/10.3390/rs15174264 (2023).
Wang, Q. et al. Incorporating agricultural practices in digital mapping improves prediction of cropland soil organic carbon content: The case of the Tuojiang River Basin. J. Environ. Manage. 330, 117203. https://doi.org/10.1016/j.jenvman.2022.117203 (2023).
Lundberg, S. M. & Lee, S.-I. in Proceedings of the 31st International Conference on Neural Information Processing Systems 4768–4777 (Curran Associates Inc., Long Beach, California, USA, 2017).
Thaler, E. A., Larsen, I. J. & Yu, Q. A New Index for remote sensing of soil organic carbon based solely on visible wavelengths. Soil Sci. Soc. Am. J. 83, 1443–1450. https://doi.org/10.2136/sssaj2018.09.0318 (2019).
Marques, M. J., Alvarez, A., Carral, P., Sastre, B. & Bienes, R. The use of remote sensing to detect the consequences of erosion in gypsiferous soils. Int. Soil Water Conserv. Res. 8, 383–392. https://doi.org/10.1016/j.iswcr.2020.10.001 (2020).
Castaldi, F., Chabrillat, S., Don, A. & van Wesemael, B. Soil organic carbon mapping using LUCAS topsoil database and sentinel-2 data: An approach to reduce soil moisture and crop residue effects. Remote Sens. 11, 2121 (2019).
Funding
This research was funded by the “Study on the Retrogressive Erosion Mechanism of Gully Heads with Different Parent Materials in Black Soil Regions of Low Mountains and Hills” project of Natural Science Foundation of Jilin Province, China (20250102200JC).
Author information
Authors and Affiliations
Contributions
Conceptualization, Song Wu, Na Chen, and Zhikang Wei; methodology, Na Chen and Xuancheng Jin; software, Nan Lin; validation, Nan Lin; formal analysis, Zhikang Wei; resources, Ling Zhao and Na Chen; data curation, Xuancheng Jin; writing—original draft preparation, Song Wu, Zhikang Wei and Na Chen; writing—review and editing, Xuancheng Jin, Na Chen and Song Wu; visualization, Song Wu and Xuancheng Jin; supervision, Nan Lin; project administration, Song Wu; funding acquisition, Fan Yang. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Supplementary Information.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
About this article
Cite this article
Chen, N., Wei, Z., Jin, X. et al. Integrating transformer-based learning and Sentinel-2 bare soil composites for soil organic carbon mapping in the black soil region of Northeast China.
Sci Rep (2026). https://doi.org/10.1038/s41598-025-33682-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-33682-4
Keywords
- Soil organic carbon
- Sentinel-2
- Digital soil mapping
- Bare soil composite
- TabPFN
- SHAP
Source: Ecology - nature.com
