in

Assessing the population-wide exposure to lead pollution in Kabwe, Zambia: an econometric estimation based on survey data

Data collection and potential selection bias

We conducted two joint surveys from July to September 2017 in Kabwe: the Kabwe Household Socioeconomic Survey (KHSS) 2017 conducted by the Central Statistical Office of Zambia and University of Zambia under the supervision of the authors, and a BLL survey performed by the authors. The surveys were approved by the University of Zambia Research Ethics Committee (UNZAREC; REF. No. 012-04-16). Further approvals were granted by the Ministry of Health through the Zambia National Health Research Ethics Board and the Kabwe District Medical Office. The data were collected in accordance to the Declaration of Helsinki, and the informed consent was obtained from all the study participants including the parents/legal guardian of the minor subjects for participating in the study.

The two surveys were designed consistently and targeted the same sample households selected in the following two-step approach. In the first step, utilising the Zambia’s national census frame which divides the Kabwe district into 384 standard enumeration areas (SEAs), we randomly selected 40 SEAs across the entire district. In the second step, we randomly selected 25 households (and a few replacements) from each sampled SEA. The sampling weights were generated to account for population differences across the SEAs.

The KHSS 2017 conducted interviews with 895 households (4,900 individuals) at houses and collected data on socioeconomic, demographic and geographic information. The response rate was 88.2%, and we could regard the data adjusted by the sampling weights as representative of the entire Kabwe population (for more details of the survey, see the report33).

To obtain BLL data, we conducted a blood sampling survey concurrently with the KHSS 2017. For hygiene and ethical considerations, we selected 13 local clinics to perform the blood sampling, instead of collecting blood at houses. We invited up to four members (two children aged 10 years or younger and their parents or guardians) from each sample household for the blood sampling. We prioritised young children over children older than 10 years old. The invitations were made sequentially. We assigned identical venues and dates for households from the same SEAs. The typical assigned dates had a 3-day window from the day after the invitation. However, we allowed for some flexibility and sampled the blood of those who visited the clinic even after the assigned time window, as long as the clinic was operational for households from other SEAs. Therefore, the window for blood sampling was effectively the number of days from the day after the invitation until the pre-set blood sampling period in each clinic was over, which had a substantial variation across households from 3 days to a month. We revisit this feature of the survey window when setting up our econometric model later. A total of 372 households (41.6%) participated in the blood sampling and, on average, 2.2 members from the participating households provided blood samples.

We performed blood digestion and metal extraction as described by our previous study34 with minor modifications and measured BLLs using an Inductively Coupled Plasma-Mass Spectrometer (ICP-MS). In addition, we also measured BLLs with a portable analyser, LeadCare II, to obtain quick results22. However, we in this study focus on the ICP-MS data, considering their general accuracy. See the Supplementary Material Section S1 for details on the methods used to measure BLLs and the difference in the data between the two analysers.

Regardless of the accuracy of the techniques, however, we further need to account for the risk of selection bias in the BLL data. In the absence of formal and compulsory testing mechanisms, we relied on individuals’ voluntary (self-selected) visits to the clinics. However, the participants in blood sampling could have traits leading to higher or lower BLLs than the population. Such traits can include education, gender, age and living standards. The survey design did not prioritise children aged 11 years or older, and this could also contribute to the deviation of characteristics, although a small number of such children attended clinics. Moreover, certain unobservable characteristics affecting BLLs can further differ between the participants and non-participants. For example, those with greater preferences for health possibly had low BLLs but tended to participate in the blood sampling surveys, whereas those with a high innate physiological capacity for lead excretion possibly tended not to participate because they had low BLLs and did not perceive symptoms of lead poisoning. These issues can lead to selection bias, and the raw data observed from the voluntary participants can fail to illustrate the lead poisoning conditions of the population.

BLL estimation approach

To correct for potential selection bias, we first estimated the equations to explain BLLs of children aged 0–10 years and adults aged 19 years or above. Then, using the estimated equations, we calculated BLLs for all individuals, including children aged 11–18 years and those in the other age groups who did not participate in the blood sampling.

BLLs generally depend on the ambient pollution level, the opportunities of exposure to pollution, the physiological capacity of lead absorption and excretion, and the knowledge and technologies used to prevent lead poisoning. We controlled for ambient pollution levels by including the distance, direction, and altitude of household location—the first two variables are with respect to the mine waste dumping site (Black Mountain). The remaining factors were measured by age and various other individual and household characteristics denoted by ({{varvec{X}}}_{i}). Data for these variables are available regardless of participation in blood sampling. We assumed the following equation for BLL:

$$begin{aligned} log BLL_{i} & = beta_{dis} log distance_{i} + beta_{dir1} direction_{i} + beta_{dir2} direction_{i}^{2} & quad + beta_{alt} altitude_{i} + fleft( {age_{i} } right) + {varvec{X}}_{i} user2{gamma^{prime}} + varepsilon_{i}. hfill end{aligned}$$

(1)

The logarithmic form for BLL adjusts its distribution to approximately normal—BLL is bounded from below and has a skewed distribution—and allows the factors on the right-hand side to have proportional effects rather than level effects. ({varepsilon }_{i}) is the independent and identically distributed error term that captures noise, such as casual fluctuations and measurement errors in BLLs, and the effects of unobservable factors. While we presented a single equation above, we assumed different equations for children aged 0–10 years and adults aged 19 years or above.

Below, we discuss our specification in detail.

Geographic factors

Existing studies have examined the relationship between the geographic location and ambient pollution level12,13,14. Since lead is transported from the mine waste dumping site through the flow of wind and water, the distance from the site is negatively correlated with ambient lead levels. The soil lead contamination spreads to the western side of the site, particularly towards the west-northwest (WNW), which corresponds to the direction of the prevailing local wind. The contamination also slightly extends to the low-elevation south-eastern side, reflecting pollution transported by water. The northern and southern sides are the least contaminated.

We defined (distanc{e}_{i}) as the distance between the mine waste dumping site and the location of (i)’s household, with ({beta }_{dis}<0) expected. Also, we assumed that the WNW is the most contaminated and, accordingly, we defined (directio{n}_{i}) as the radian of the acute angle passing through WNW, the mine waste dumping site, and the location of (i)’s household. That is, the household location is WNW at (directio{n}_{i}=0), either north-northeast or south-southwest at (pi /2), and east-southeast (ESE) at (pi). We employed a quadratic specification in Eq. (1), which allows BLLs to have two peaks at WNW and ESE if ({beta }_{dir1}<0), ({beta }_{dir2}>0) and (-{beta }_{dir1}/left(2{beta }_{dir2}right)<pi). We statistically assessed the appropriateness of the specification for direction in Supplementary Material Section S2. We also used altitude in metres, (altitud{e}_{i}), considering that elevated areas can be less exposed to dust and water flows, although the general tendency of land elevation can be absorbed by the direction variables.

Age and other covariates

For children, we assumed a non-linear relationship between their ages and BLLs and defined the following functional form:

$$fleft( {age_{i} } right) = left[ {phi_{0} + phi_{1} mage_{i} + phi_{2} mage_{i}^{2} } right] times Ileft( {age_{i} < 2} right) + phi_{3} age_{i} times Ileft( {age_{i} ge 2} right).$$

(2)

(Ileft( cdot right)) is an indicator function that takes the value of 1 if the argument condition is satisfied, and (mag{e}_{i}) denotes age in months. The functional form reflects the findings in the literature. Young children are generally at a high risk of lead poisoning. Playing outside and age-appropriate hand-to-mouth behaviours expose them to lead, and their gastrointestinal absorption of lead is high4. Foetuses and infants born to exposed mothers absorb lead in utero and through breastfeeding35. Consequently, BLLs often reach a peak at or before the age of 24 months and then decrease as children grow older, reflecting their physical and behavioural growth1,36. Thus, we employed a specification that allows an inverted U-shaped relationship between the logarithmic BLL and age up to 23 months, but assume a linearly decreasing relationship between the two factors for children aged 2 years or above.

For adults, the physiological foundation of the BLL-age relationship is not clear, but age-related changes in metabolism and lifestyle can affect BLLs. We simply assumed a log linear relationship between BLL and age for adults.

In addition, we used the following individual and household characteristics, denoted as ({{varvec{X}}}_{i}), for children: a dummy variable for female; the mothers’ education level (grades), which reflects their general, health-related and lead-related knowledge; a dummy variable for children whose mothers were absent (the mothers’ education level was set at zero for such children); a dummy variable for female-headed households; household size; dependency ratio (the proportion of household members aged 0–15 years and 65 years or above); and the log of per capita household expenditure, which measures living standards. We also used dummy variables for household location: urban areas, small-scale farming areas, large-scale farming areas, and the Makululu compound—an area of informal settlement where public services are poorly delivered. We set urban area as the base category.

For adults, we continued to use the dummy variables for female and household location, household size and dependency ratio but dropped the variables related to mothers and household heads. The per capita household expenditure was not used, either, because it is not exogenous for adults. Instead, we used their own education level, which reflects living conditions to certain extent as well as knowledge levels. We also used a dummy variable for marital status, which takes the value of one for either married or co-habiting individuals, and the duration of residence in Kabwe (in years) to account for the effects of long-term lead exposure.

Econometric methods to estimate BLL equation

We considered two methods to estimate Eq. (1). The first one is OLS, which directly estimates Eq. (1) from the data of the participants in the blood sampling survey. If the bias in BLLs are attributable to the difference in observable factors between the participants and non-participants, then the OLS estimate of Eq. (1) is unbiased and can be used to obtain estimates representing the population. However, as previously mentioned, unobservable characteristics can also affect both BLLs and participation decisions. This can disrupt the error term distribution and bias the OLS estimate of Eq. (1).

To account for this risk, we also adopted Heckman’s sample selection model24. This model corrects for the bias in unobservable factors by simultaneously estimating the probability of participation (selection equation) for the entire sample, including non-participants. Specifically, we considered the following selection equation:

$$begin{aligned} Pr left( {i;participates} right) & =Psi { delta_{dis} log distance_{i} + delta_{dir1} direction_{i} + delta_{dir2} direction_{i}^{2} & quad + delta_{alt} altitude_{i} + gleft( {age_{i} } right) + {varvec{X}}_{i} user2{xi^{prime}} + zeta window_{i} } , end{aligned}$$

(3)

where (Psi) is the normal distribution function with the probability density function of (psi), ({varvec{X}}_{i}) is the same as in Eq. (1), and (gleft( {age_{i} } right)) has the functional forms identical to (fleft( {age_{i} } right)). The bias in Eq. (1) can be fixed by estimating Eq. (1) with the inverse Mills ratio, (psi /Psi).

In the sample selection model, the use of an exclusion restriction variable, which affects the probability of participation but not BLL, is preferable. We used the number of days of the blood sampling window denoted by (windo{w}_{i}) as an exclusion restriction. As described above, the blood sampling window was effectively the number of days that the assigned clinic remained operational for blood sampling after the day following the invitation. Other factors being equal, households that received early invitations and had longer time windows would more easily manage to attend clinics and would have higher probabilities of participation. The exogenous nature of the blood sampling window renders it irrelevant for BLLs.

Estimation of the representative BLLs

After obtaining the BLL equations, we estimated the BLLs of the representative sample individuals by inputting their characteristics on the right-hand side of the equations. We applied the survey’s sampling weights when aggregating the estimated BLLs.

To estimate the BLLs of adolescents aged 11–18 years, who were basically not covered in our BLL survey and thus not used in the BLL equation estimations, we used the equation for children aged 0–10 years, assuming that age-BLL trend, which we expected to be negative, would hold up to the age of 18 years.

Next, we calculated the number of the residents with BLLs above 5 μg/dL by interacting the estimated proportion of those with such BLLs and the total population. Considering the population growth, we used the population estimates of our own33 and the Central Statistical Office of Zambia37, both as of 2017, instead of 200,000 as of 2010.

Further, we present two graphical results. The first one is an in-depth examination of the mean BLLs across age groups. In the second one, we simulated the geographic variation of the mean BLLs. We divided the entire Kabwe district into 1 km × 1 km grids, and estimated the mean BLL in each grid cell. Distance and direction were measured for each cell and other independent variables were measured by the means in the ward—official inner-district division—to which the cell corresponds (we provide additional technical notes before showing results).

All estimations were performed using Stata 15 software.


Source: Ecology - nature.com

Effectiveness of protected areas in conserving tropical forest birds

Did our early ancestors boil their food in hot springs?