in

Kernel mean matching enhances risk estimation under spatial distribution shifts


Abstract

Accurate risk estimation under distribution shifts is critical for deploying machine learning models in real-world spatial applications, from ecological forecasting to medical image analysis. Conventional methods such as No Weighting (NW) and Importance Weighting (IW) fail in spatially structured data due to two challenges: (1) density ratio estimation in high-dimensional clustered distributions and (2) non-stationarity from environmental gradients or sampling biases. Classifier-based approaches offer partial improvements but often yield miscalibrated risk estimates by prioritizing discriminative accuracy over distribution alignment. We conduct a systematic evaluation of four risk estimation methods —NW, IW, Kernel Mean Matching (KMM), and classifier-based reweighting—across synthetic benchmarks (with controlled spatial clustering) and real-world datasets (species distributions and immune cell layouts). Results show that KMM achieves superior robustness, reducing Mean Absolute Percentage Error (MAPE) by 12.3–86.5% compared to alternatives in high-dimensional settings. This advantage stems from KMM’s direct minimization of distributional divergence via kernel embeddings, bypassing error-prone density ratio estimation. Our findings demonstrate that KMM is a principled solution for spatial risk estimation, particularly when source and target distributions exhibit complex clustering or sampling artifacts. Its consistency across ecological and biomedical domains suggests broad applicability for reliable model deployment in spatially heterogeneous environments.

Similar content being viewed by others

An oversampling method for imbalanced data based on spatial distribution of minority samples SD-KMSMOTE

Real-time high-resolution millimeter-wave imaging for in-vivo skin cancer diagnosis

High-resolution mapping of single cells in spatial context

Data and code availability

For the data, preprocessing and modeling details to reproduce the calculations, we refer the reader to the repository of the project https://github.com/awesomeslayer/Importance-reweighting.

References

  1. Shimodaira, H. Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Stat. Plan. Inference 90, 227–244 (2000).

    Google Scholar 

  2. James, F. Monte Carlo theory and practice. Rep. Prog. Phys. 43, 1145 (1980).

    Google Scholar 

  3. Bickel, S., Brückner, M. & Scheffer, T. Discriminative learning under covariate shift. J. Mach. Learn. Res. 10, 2137–2155 (2009).

    Google Scholar 

  4. Zadrozny, B. Learning and evaluating classifiers under sample selection bias. In Proceedings of the 21st International Conference on Machine Learning 114 (ACM, Banff, Alberta, Canada, 2004) https://doi.org/10.1145/1015330.1015425.

  5. Tokdar, S. T. & Kass, R. E. Importance sampling: A review. Wiley Interdiscip. Rev.: Comput. Stat. 2, 54–60 (2010).

    Google Scholar 

  6. Wills, R. C., Dong, Y., Proistosecu, C., Armour, K. C. & Battisti, D. S. Systematic climate model biases in the large-scale patterns of recent sea-surface temperature and sea-level pressure change. Geophys. Res. Lett. 49, e2022GL100011 (2022).

    Google Scholar 

  7. Denissen, J. M. et al. Widespread shift from ecosystem energy to water limitation with climate change. Nat. Clim. Chang. 12, 677–684 (2022).

    Google Scholar 

  8. Ben-Said, M. Spatial point-pattern analysis as a powerful tool in identifying pattern-process relationships in plant ecology: An updated review. Ecol. Process. 10, 1–23 (2021).

    Google Scholar 

  9. Gatrell, A. C., Bailey, T. C., Diggle, P. J. & Rowlingson, B. S. Spatial point pattern analysis and its application in geographical epidemiology. Trans. Inst. Br. Geogr. 256–274 (1996).

  10. Vokinger, K. N., Feuerriegel, S. & Kesselheim, A. S. Mitigating bias in machine learning for medicine. Commun. Med. 1, 25 (2021).

    Google Scholar 

  11. Zhao, Z. et al. Identification of lung cancer gene markers through kernel maximum mean discrepancy and information entropy. BMC Med. Genom. 12, 1–10 (2019).

    Google Scholar 

  12. Vegas, E., Oller, J. M. & Reverter, F. Inferring differentially expressed pathways using kernel maximum mean discrepancy-based test. BMC Bioinform. 17, 399–405 (2016).

    Google Scholar 

  13. Maley, C. C., Koelble, K., Natrajan, R., Aktipis, A. & Yuan, Y. An ecological measure of immune-cancer colocalization as a prognostic factor for breast cancer. Breast Cancer Res. 17, 1–13 (2015).

    Google Scholar 

  14. Roberts, D. et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography https://doi.org/10.1111/ecog.02881 (2016).

    Google Scholar 

  15. Meyer, H. & Pebesma, E. Predicting into unknown space? Estimating the area of applicability of spatial prediction models. Methods Ecol. Evol. 12, 1620–1633. https://doi.org/10.1111/2041-210x.13650 (2021).

    Google Scholar 

  16. Tuia, D., Persello, C. & Bruzzone, L. Domain adaptation for the classification of remote sensing data: An overview of recent advances. IEEE Geosci. Remote Sensing Mag. 4, 41–57. https://doi.org/10.1109/MGRS.2016.2548504 (2016).

    Google Scholar 

  17. Wilson, G. & Cook, D. J. A survey of unsupervised deep domain adaptation. ACM Trans. Intell. Syst. Technol. (TIST) 11, 1–46 (2020).

    Google Scholar 

  18. Gretton, A. et al. Covariate shift by kernel mean matching. Dataset Shift Mach. Learn. 3, 5 (2009).

    Google Scholar 

  19. Martynova, E. & Textor, J. A uniformly bounded correlation function for spatial point patterns. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2177–2188 (2024).

  20. Scott, D. W. On optimal and data-based histograms. Biometrika 66, 605–610. https://doi.org/10.1093/biomet/66.3.605 (1979).

    Google Scholar 

  21. Huang, J., Gretton, A., Borgwardt, K., Schölkopf, B. & Smola, A. Correcting sample selection bias by unlabeled data. Adv. Neural Inf. Process. Syst. 19, (2006).

  22. Quionero-Candela, J., Sugiyama, M., Schwaighofer, A. & Lawrence, N. D. Dataset Shift in Machine Learning (The MIT Press, 2009).

    Google Scholar 

  23. GBIF.org. Occurrence download: Tussilago farfara l. (2024). https://www.gbif.org/occurrence/download/0031125-240626123714530. Accessed 20 Jul 2024.

  24. GBIF.org. Occurrence download: Anemone nemorosa l. (2024). https://www.gbif.org/occurrence/download/0031144-240626123714530. Accessed 20 Jul 2024.

  25. GBIF.org. Occurrence download: Caltha palustris l. https://www.gbif.org/occurrence/download/0031146-240626123714530 (2024). Accessed 20 Jul 2024.

  26. Hijmans, R. J. et al. Package ‘raster’. R package 734, 473 (2015).

    Google Scholar 

  27. Bivand, R. et al. Package ‘rgdal’. Bindings for the Geospatial Data Abstraction Library. Available online: https://cran.r-project.org/web/packages/rgdal/index.html (Accessed on 15 Oct 2017) 172 (2015).

  28. Pebesma, E. J. et al. Simple features for r: Standardized support for spatial vector data. R J. 10, 439 (2018).

    Google Scholar 

  29. Hijmans, R. J. et al. Package ‘terra’ (Vienna, Austria, Maintainer, 2022).

    Google Scholar 

  30. van der Hoorn, I. A. et al. Detection of dendritic cell subsets in the tumor microenvironment by multiplex immunohistochemistry. Eur. J. Immunol. 54, 2350616 (2024).

    Google Scholar 

  31. van der Woude, L. L., Gorris, M. A., Halilovic, A., Figdor, C. G. & de Vries, I. J. M. Migrating into the tumor: A roadmap for t cells. Trends Cancer 3, 797–808 (2017).

    Google Scholar 

  32. Sultan, S. et al. A segmentation-free machine learning architecture for immune land-scape phenotyping in solid tumors by multichannel imaging. BioRxiv 2021–10 (2021).

  33. Friedman, J. H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 1189–1232 (2001).

  34. Liu, A. & Ziebart, B. D. Robust classification under sample selection bias. In (eds. Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. & Weinberger, K.) Advances in Neural Information Processing Systems, vol. 27 (Curran Associates, Inc., 2014).

  35. Cauchois, M., Gupta, S., Ali, A. & Duchi, J. C. Robust validation: Confident predictions even when distributions shift. J. Am. Stat. Assoc. 119, 3033–3044 (2024).

    Google Scholar 

  36. Lam, H. & Zhang, H. Doubly robust stein-kernelized monte carlo estimator: Simultaneous bias-variance reduction and supercanonical convergence (2023). arXiv:2110.12131.

  37. Zech, J. R. et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Med. 15, e1002683 (2018).

    Google Scholar 

Download references

Funding

The work was supported by the grant for research centers in the field of AI provided by the Ministry of Economic Development of the Russian Federation in accordance with the agreement 000000C313925P4F0002 and the agreement with Skoltech №139-10-2025-033.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: A.Z., E.S. and D.K.; methodology: E.S., A.Z., D.K.; software: E.S.; validation: E.S., A.Z.; formal analysis: E.S., D.K.; investigation: A.Z., E.S.; data curation: D.K., E.S.; writing—original draft preparation: E.S., D.K.; writing—review and editing: D.K., E.S. and A.Z.; visualization: E.S., D.K.; supervision: A.Z.; project administration: A.Z., D.K. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to
Diana Koldasbayeva.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Serov, E., Koldasbayeva, D. & Zaytsev, A. Kernel mean matching enhances risk estimation under spatial distribution shifts.
Sci Rep (2026). https://doi.org/10.1038/s41598-026-36740-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41598-026-36740-7

Keywords

  • Kernel mean matching
  • Spatial risk estimation
  • Spatial modeling
  • Importance reweighting
  • Distribution shift robustness


Source: Ecology - nature.com

Political ecology of private tourism development in public protected areas in the lead-up to the IUCN World Conservation Congress 2025

Enhanced effect of warming on the leaf-onset date of boreal deciduous broadleaf forest

Back to Top