This dataset relies on two types of technical validation: ensuring the accuracy of (1) project attributes and, where applicable, (2) their geographic locations.
Project attribute validation: the double-verification method
Existing sources for Chinese overseas development finance rely on a variety of verification standards. The present dataset extends the most stringent approach of the existing “double verification” methods pioneered by the China Africa Research Initiative at the Johns Hopkins University School of Advanced International Studies (SAIS-CARI) to create a harmonized, global standard.
The double verification method is based on academic literature showing a tendency to overstate, rather than understate, finance commitments. For example, Ebeke and Ölçer49 show that major infrastructure projects are often timed for announcements to coincide with political campaigns. Regional case studies9,50 show patterns of planners avoiding the publication of projects’ environmental and social risks, but simultaneously maximizing the visibility of the projects and their financial commitments, often before they are finalized. For this reason, earlier datasets have struggled to correctly identify and exclude projects that have been publicized but never materialized, resulting in sometimes significant over-estimations51.
The possibility remains of under-counting. As Horn, Reinhart, and Trebesch (2019)15 point out, in reference to “hidden” Chinese finance, many overseas Chinese loans are never fully disclosed. For this reason, we cast the widest possible net for financing commitments and then narrowing those findings by applying the standard of double-verification. It is for this reason also that we perform annual updates, and in each update include previous years’ data, in order to include any additional projects that may not have been disclosed until a much later date.
Our aim is to provide the most evidence-based supported data in order to have a more empirical based understanding of Chinese overseas development finance. Erring on the side of caution then, double verification is admittedly a more conservative set of estimates but grants all scholars and stakeholders the confidence that every record in the dataset does indeed exist.
Without public reporting by CDB and ExImBank of their lending operations, we are limited to reporting by government (and government-affiliated) sources, academic, civil society, and press reports. The system of double verification ensures accuracy in this context, requiring agreement on the core characteristics of each loan agreement between at least one Chinese source and at least one international source.
For China-side verification, we rely on official and quasi-official sources associated with the Chinese government or Chinese Communist Party. We include the following sources:
- 1.
Chinese government and DFI websites (including CDB.com.cn, ExImBank.gov.cn, and any other source with a domain ending in .gov.cn)
- 2.
Websites of Chinese embassies abroad
- 3.
Chinese government or CCP-affiliated press sites:
- a.
China Daily, http://www.chinadaily.com.cn
- b.
China Global Television Network, https://www.cgtn.com
- c.
China News, http://www.chinanews.com
- d.
China Plus, http://chinaplus.cri.cn
- e.
Guangming Daily, http://www.gmw.cn
- f.
People, http://www.people.cn
- g.
Xinhua, http://www.xinhuanet.com
- a.
For international verification, we rely similarly on government reports, supplemented with academic, civil society, and private press reports. As mentioned above, when differences emerge among sources, we resolve these conflicts by giving government sources top priority, followed by academic sources, civil society sources, and private press sources. Government press sources, such as the Chinese sources listed above, are given the weight of government sources. This method coincides with that of other datasets with double verification7,8,21.
Because of the stringency of the double-verification standard used here, we exclude the smallest finance agreements (those below $25 million USD). Excluding these low-level loans necessarily involves a small degree of under-counting. For example, Brautigam et al. (2020)8 show that loans of less than $25 million each comprise just $389 million in total commitments, out of a total of $148 billion in financing commitments by CDB and ExImBank between 2008 and 2018 in Africa: approximately 0.2% of the total. However, including these loans would introduce significant geographic bias toward countries with particularly transparent governments and open media environments. As the purpose of the present effort is to enable more reliable geospatial analysis, the inclusion of this additional activity was not deemed worthy of the cost to the reliability of analysis using it.
It is worth comparing these results to those of other datasets for context. Among other independent datasets of Chinese lending, only AidData11,12 and Horn, Reinhart, and Trebesch15 have global coverage, and of those two, only AidData differentiates by lender, allowing a strict comparison. As Fig. 1 shows, AidData includes $463 billion in policy bank loans between 2008 and 2014 that would meet the standard for inclusion in the present dataset if they could be validated. However, in that same time period, our methodology found that only $271 billion of loans could pass the validation standards introduced here.
This process of double-verification results in a dataset that excludes some countries that appear in other datasets. For example, in the case of four countries, this process resulted in the present dataset having no loans listed, even though CDB and/or ExImBank loans appear in AidData, the largest global dataset, with loans that would qualify for inclusion here if they could be validated. Those four are: Central African Republic (for which we were unable to find doubly verified validation for the Boali No. 3 hydropower plant project), Dominica (for which we were unable to double verify the source of the loan for rehabilitation of State College), Turkey (whose Turk Telecom was privatized before the loan listed in AidData), and Yemen (for which we were unable to find Chinese validation for the Bajal cement factory project). In addition to these four countries, three others are included in AidData but with no loans of $25 million or more: Burundi, Colombia, and Sierra Leone.
As with other researchers in this space7,8,21 we understand that individual projects within such funds can be hidden from public view until the line of credit or framework agreement is renewed or laid down unused. Thus, we include such financing agreements when they are initially drawn up, but then withdraw them from subsequent updates if it comes to light that they were unused. If they are renewed, as lines of credit frequently are, such renewals do not represent new financing but simply a relaxation of the time period for use of the original commitment. For this reason, renewals are not considered separately.
Finally, not all projects in this dataset have been completed as of this writing. We have removed all projects that have been publicly cancelled, but ongoing projects with active financing commitments remain, even if construction has not yet begun or has been suspended. For this reason, we refer to each observation as a commitment or agreement, rather than a loan. Funds may or may not have been disbursed as of this writing, but commitments have been made and remain valid. In all, this double-verification process resulted in a final dataset of 857 finance commitments in 93 countries from 2008 through 2019.
Location validation
Of the 857 finance commitments in the final dataset, 664 have a geographic footprint of some type. These projects – encompassing agriculture, extraction, manufacturing, utilities, infrastructure, and other installations – were located according to the following procedure.
Several of the existing datasets listed above include the location of financed projects: AidData, CSIS, Dayant and Pryke, and the World Bank11,13,14,26. Among these datasets, CSIS’ Reconnecting Asia merits special mention, as it displays project locations through embedded Google Maps. For projects originating in this dataset, we queried CSIS for the coordinates in these maps (using code available in R as CSIS_to_coord_str.R on the project repository). For these observations, we used these reported locations as initial estimates, to be visually validated thereafter. For energy projects not listed in these project datasets, we used the following sources for initial estimates of project locations:
Power plants: Global Power Plant Database52.
Coal-fired power plants: Global Energy Monitor53
Fossil fuel pipelines and related infrastructure: Global Fossil Infrastructure Tracker54
For other observations, we developed an API to query Google Maps for the locations of each (available in R as GoogleMaps_OSM_API_query.R on the OSF project repository).
For all observations – those included in previous geolocated datasets, those located through querying Google Maps and Open Street Maps, and those with no query response – we validated the locations visually through the use of Google Maps, Open Street Maps, and Open Route Services, as shown in Fig. 3 below.
Examples of point, line, and polygon footprints. Left to right: Rehabilitation of Sam Lord’s Castle, Barbados; Soyo-Kapary Electrical Transmission and Transformation Project, Angola; Kirirom III hydropower plant (reservoir), Cambodia.
This process represents a significant elevation of requirement needing to be met for projects to be reported as having a precise location, in comparison to previous geocoded datasets. For example, AidData allows projects to be reported at the most precise location category based on the precise boundaries of an area of uncertainty around a project—including populated places or the political seats of geographic areas—rather than the precise point or boundaries of the true project site(s). The resulting high-precision category includes 579 sovereign finance commitments by CDB and ExImBank identified by AidData during our period of study, of which only 105 geotags are associated with specific sites of projects. The remaining projects’ location are defined by the administrative division or the political seats thereof. This is in contrast to the more stringent precision classification scheme in our dataset. Projects marked with a precision code of “1” in the present dataset have all been visually located as site-specific project footprints. The introduction of this new level of precision allows for linear and polygonal projects to be represented with their complete footprints, rather than representative points, which enables a more thorough analysis of environmental risks and impacts, including for example, the impacts of the entire length of a highway or the entire area of a mine. Analysts using this dataset will be able to avoid the under-estimation of environmental impacts necessarily introduced by relying on representative points. Our first such analysis uses these precise footprints to compare location-based social and ecological risks of Chinese overseas development finance to World Bank projects, based on their proximity to the boundaries of national protected areas, possible critical habitats, and indigenous territories48. The dataset also supports holistic environmental analysis of interconnected networks of projects, based on their collective footprints. Yang et al (2021) use these collective footprints to examine the environmental and social sensitivity of Chinese overseas development finance locations, and find that the total footprint is significantly concentrated in more sensitive territory than World Bank projects during the same time period55.
To accurately reflect the variety of types of footprints across various types of finance projects, we classified each geolocated observation as a point (or collection of points), line (or collection of discontinuous lines), or polygon (or collection of discontinuous polygons). Points are used for individual buildings or installations. Lines are used for linear infrastructure including roads, rails, power distribution, wired communications networks, and pipelines. Polygons show projects with footprints that are larger than single buildings or installations, with well-defined boundaries, including dam reservoirs, oil and gas fields, and clusters of buildings such as housing or stadium complexes. The distribution of projects among footprint types is listed in Table 4.
A few examples merit further explanation regarding their classification of footprint type. First, wind farms are comprised of turbines along access roads; to accurately show the total geographic footprints, we show them as linear infrastructure comprised of their access roads. In addition, projects with lower levels of geographic precision (at the national level or first/second-level administrative division level) are shown as polygons that encompass these areas, showing the municipal, provincial, or national boundaries48.
Source: Ecology - nature.com