Study population
The study is based on data from mothers and children participating in a longitudinal Southwest Finland cohort, Steps to Healthy development of Children (the STEPS Study) (described in detail in Lagström et al.31). The STEPS study is an ongoing population-based and multidisciplinary study that investigates children’s physical, psychological and social development, starting from pregnancy and continuing until adolescence. All Finnish- and Swedish-speaking mothers delivering a child between 1 January, 2008 and 31 March, 2010 in the Hospital District of Southwest Finland formed the cohort population (in total, 9811 mothers and their 9936 children). Altogether, 1797 mothers with 1805 neonates volunteered as participants for the intensive follow-up group of the STEPS Study. Mothers were recruited by midwives either during the first trimester of pregnancy while visiting maternity health care clinics, or after delivery on the maternity wards of Turku University Hospital or Salo Regional Hospital, or by a letter mailed to the mothers. The participating mothers differ slightly from the whole cohort population in some background characteristics (being older, with first-born child and higher socioeconomic status)31. The ethics committee of the Hospital District of Southwest Finland has approved the STEPS Study (2/2007) and all methods were performed in accordance with relevant guidelines and regulations. Written informed consent was obtained from all the participants and, for children, from one parent for study participation. Subjects have been and are free to withdraw from the study at any time without any specific reason. The STEPS Study have the appropriate government authorization to the use of the National birth register (THL/974/5.05.00/2017).
Breastmilk collection and HMO analysis
Mothers from the STEPS Study were asked to collect breastmilk samples when the infant was approximately 3 months old. In total, 812 of the 1797 mothers (45%) provided a breastmilk sample. There were only slight differences in maternal and child characteristics between the participants providing breastmilk samples and the total STEPS Study cohort40. Altogether, 795 breastmilk samples were included in this study (excluding the duplicate observations and the 2nd born twins, samples with technical unclarity or insufficient sample quantity, one breastmilk sample collected notably later than the other samples, at infant age of 14.5 months (range for the other breastmilk samples: 0.6–6.07 months), one sample with missing information on the date of collection and six mothers missing data on residential green environment) (Supplementary Fig. 2). Mothers received written instructions for the collection of breastmilk samples: samples were collected by manual expression in the morning from one single breast, first milking a few drops to waste before collecting the actual sample (~ 10 ml) into a plastic container (pre-feed sample). The samples were stored in the fridge and the mothers brought the samples to the research center or the samples were collected from their homes on the day of sampling. All samples were frozen and stored at − 70 °C until analysis.
High Performance Liquid Chromatography (HPLC) was used to identify HMOs in breastmilk as previously described40,57,58 at the University of California, San Diego (methods described in detail in Berger et al.58). Milk samples were spiked with raffinose (a non-HMO carbohydrate) as an internal standard to allow absolute quantification. HMOs were extracted by high-throughput solid-phase extraction, fluorescently labelled, and measured using HPLC with fluorescent detection (HPLC-FLD)58. Absolute concentrations for the 19 HMOs were calculated based on standard retention times and corrected for internal standard recovery. Quantified HMOs included: 2′-fucosyllactose (2′FL), 3-fucosyllactose (3FL), lacto-N-neotetraose (LNnT), 3′-sialyllactose (3′SL), difucosyllactose (DFlac), 6′-sialyllactose (6′SL), lacto-N-tetraose (LNT), lacto-Nfucopentaose (LNFP) I, LNFP II, LNFP III, sialyl-LNT (LST) b, LSTc, difucosyllacto-LNT (DFLNT), lacto-N-hexaose (LNH), disialyllacto-N-tetraose (DSLNT), fucosyllacto-Nhexaose (FLNH), difucosyllacto-N-hexaose (DFLNH), fucodisialyllacto-lacto-N-hexaose (FDSLNH) and disialyllacto-N-hexaose (DSLNH). HMOs were also summed up to seven groups based on structural features: small HMOs (2′FL, 3FL, 3′SL, 6′SL, and DFLac), type 1 HMOs (LNT, LNFP I, LNFP II, LSTb, DSLNT), type 2 HMOs (LNnT, LNFP III, LSTc), α-1-2-fucosylated HMOs (2’FL, LNFP I), terminal α-2-6-sialylated HMOs (6′SL, LSTc), internal α-2-6-sialylated HMOs (DSLNT, LSTb), terminal α-2-3-sialylated HMOs (3′SL, DSLNT). The total concentration of HMOs was calculated as the sum of the 19 oligosaccharides. HMO-bound fucose and HMO-bound sialic acid were calculated on a molar basis. The proportion of each HMO comprising the total HMO concentration was also calculated. HMO Simpson’s diversity was calculated as Simpson’s Reciprocal Index 1/D, which is the reciprocal sum of the square of the relative abundance of each of the measured 19 HMOs57,59. The higher the diversity value, the more heterogenous is the HMO composition in the sample.
Properties of the residential green environment
The selected residential green environment variables measure the properties of the green environments surrounding the homes of the participants and do not include any measures of the house characteristics, indoor environment or the actual use of green spaces by the participants. The residential green environment variables were selected due to their previously observed associations with residential microbiota and health33,34,35. The variables of the residential green environments were derived from multispectral satellite images series, with a 30 m × 30 m of spatial resolution (Landsat TM 5, National Aeronautics and Space Administration—NASA) and land cover data (CORINE). We used Landsat TM images obtained over the summertime (June–August, greenest months in Finland), to minimize the seasonal variation of living vegetation and cloud cover as well as to better identify vegetation areas and maximise the contrast in our estimated exposure. In each selected Landsat TM 5 images, the cloud was masked out, and the Normalized Difference Vegetation Index (NDVI)36 was calculated. The final NDVI map was the mean of NDVI images collected over three consecutive years (2008–2010), to make an NDVI map with non-missing values due to cloud cover for the study area. NDVI map measures the vegetation cover, vitality and density. The NDVI can get values ranging from − 1 to 1 where values below zero represent water surfaces, values close to zero indicate areas with low intensity of living vegetation and values close to one indicate high abundance of living vegetation. For the analyses, areas covered by water were removed and the value ranged from 0 to 1, to prevent negative values for underestimating the greenness values of the residential area like in some prior studies60. We assumed that summertime NDVI identified the green space and vegetation density well, but greenness intensity might vary seasonally.
Second, we used calculated indicators related to the diversity and naturalness of the land cover from CORINE Land Cover data sets of the year 201261. The 12 land cover types include: (1) Residential area, (2) Industrial/commercial area, (3) Transport network, (4) Sport/leisure, (5) Agriculture, (6) Broad-leave forest, (7) Coniferous forest, (8) Mixed forest, (9) Shrub/grassland, (10) Bare surface, (11) Wetland, and (12) Water bodies. From this information, we calculated two vegetation cover indexes. The Vegetation Cover Diversity Index (Simpson’s Diversity Index of Vegetation Cover, VCDI)37, only includes vegetation classes from CORINE land cover types (categories 5–9 and 11). VCDI approaches 1 as the number of different vegetation classes increases and the proportional distribution of area among the land use classes becomes more equitable. Furthermore, because we were particularly interested in the natural vegetation cover in the residential area, we calculated the area-weighted Naturalness Index (NI)38. This is an integrated indicator used to measure the human impact and degree of all human interventions on ecological components. The index is based on CORINE Land Cover data but reclassified to 15 classes. Residential areas have been divided to two classes: Continuous residential area (High density buildings) and Discontinuous residential area (Low density, mostly individual houses area). Agricultural area has also been divided to two classes: Agricultural area (Cropland) and Pasture as well as class 9 (Shrub/grassland) has been separated to Woodland and Natural grassland. Assignment of CORINE Land Cover classes to degrees of naturalness has been made based on Walz and Stein 201438. The area-weighted NI range from 1 to 7, where low values represent low human impact (≤ 3 = Natural), medium values moderate human impact (4–5 = Semi-natural) and high values strong human impact (6–7 = Non-Natural). To ease the interpretation of results and to correspond to the same direction than the other residential green environment variables, we have reverse-scaled the NI values, so that higher values illustrate more natural residential areas.
Background factors
As genetics is strongly linked to HMO composition, maternal secretor status was determined by high abundance (secretor) or near absence (non-secretor) of the HMO 2’FL in the breastmilk samples. Mothers with active secretor (Se) genes and FUT2 enzyme produce high amounts of α-1-2-fucosylated HMOs such as 2′-fucosyllactose (2′FL), whereas in the breastmilk of non-secretor mothers these HMOs are almost absent. Beyond genetics, other maternal and infant characteristics may influence HMO composition. So far, several associations have been reported, including lactation stage, maternal pre-pregnancy BMI, maternal age, parity, maternal diet, mode of delivery, infant gestational age and infant sex22,40. Information on the potential confounding factors, child sex, birth weight, maternal age at birth, number of previous births, marital status, maternal occupational status, smoking during pregnancy (before and during pregnancy), maternal pre-pregnancy BMI, mode of delivery, duration of pregnancy and maternal diseases [including both maternal disorders predominantly related to pregnancy such as pre-eclampsia and gestational diabetes and chronic diseases (diseases of the nervous, circulatory, respiratory, digestive, musculoskeletal and genitourinary systems, cancer and mental and behavioral disorders, according to ICD-10 codes, i.e. WHO International Classification of Diseases Tenth Revision)], were obtained from Medical Birth Registers. Self-administered questionnaires upon recruitment provided information on family net income and maternal education level. Those who had no professional training or a maximum of an intermediate level of vocational training (secondary education) were classified as “basic”. Those who had studied at a University of Applied Sciences or higher (tertiary education) were classified as “advanced”. The advanced level included any academic degree (bachelor’s, master’s, licentiate or doctoral degree). Maternal diet quality was assessed in late pregnancy using the Index of Diet Quality (IDQ62) which measures adherence to health promoting diet and nutrition recommendations. The IDQ score was used in its original form by setting the statistically defined cut-off value at 10, with scores below 10 points indicating unhealthy diets and non-adherence and scores of 10–15 points indicating a health-promoting diet and adherence dietary guidelines. Lactation time postpartum (child age) and season were received from the recorded breastmilk collection dates. Lactation status (exclusive/partial/unknown breastfeeding) at the time of breastmilk collection were gathered from follow-up diaries. From partially breastfeeding mothers (n = 277) 253 had started formula feeding and 28 solids at the time of milk collection. Last, a summary z score representing socio-economic disadvantage in the residential area was obtained from Statistics Finland grid database for the year 2009 and is based on the proportion of adults with low level of education, the unemployment rate, and proportion of people living in rented housing at each participant’s residential area55.
Statistical analyses
To harmonize the residential green environment variables we calculated the mean values for NDVI, VCDI and NI in 750 × 750 m squares (and 250 × 250 m) around participant homes in a Geographical Information System (QGIS, www.qgis.org). The same grid sizes were used to calculate residential socioeconomic disadvantage in the residential area55 at the time of child birth. The geographical coordinates (latitude/longitude) of the cohort participants’ home address (795 mothers) were obtained from the Population Register Centre at the time of their child birth.
The background characteristics of the mothers and children are given as means and standard deviations (SD) for continuous variables and percentages for categorical variables. Due to non-normal distribution, natural logarithmic transformation was performed for all HMO variables (19 individual components, sum of HMOs, HMO-bound sialic acid, HMO-bound fucose and HMO groups (all in nmol/mL)) except for HMO diversity. Associations between each background factor and HMO diversity and 19 individual HMO components were analysed with univariate generalized linear models to identify factors independently associated with HMO composition. All factors demonstrating a significant association (p < 0.05) with any HMO component were included as potential confounders in the subsequent models of residential green environment and HMO composition. In addition, we selected a priori the background factors observed in previous studies to associate with HMO (secretor status, lactation time and status (exclusive/partial/unknown breastfeeding), infant sex, birth mode, duration of pregnancy, maternal age at birth, maternal pre-pregnancy BMI and number of previous births)22,24. Before analyses, to allow for a consistent comparison of regression coefficients, all continuous variables of the residential green environment (NDVI, VCDI, NI) were standardized by subtracting the mean and dividing by the standard deviation. In addition, all associations with HMO composition are presented as standardized beta coefficients with 95% confidence intervals per 1 SD increase in residential green environment variables. For standardized beta coefficients also the outcome (HMO composition) variables have been standardized to get betas comparable between the models (presented in log scale in all other outcome variables expect Diversity, which is in the original scale). The multivariable generalized linear models adjusting for all the identified confounding background factors were conducted separately for each residential green environment variable. The analyses also tested for interactions between secretor status and residential green environment variable, to find out whether the green environment is associated differently with HMO composition in secretor vs. non-secretor mothers. In addition, curvilinear associations between the residential green environment variables and HMO composition were tested to investigate any nonlinear associations. Last, as green environment may differ between socio-economically different neighborhoods and thus the results could reflect socio-economic differences between residential areas, we conducted additional analyses for all residential green environment models adjusting for the residential socio-economic disadvantage. The level of significance was set at p value < 0.05. All analyses were conducted using the SAS 9.4 Statistical Package (SAS Institute Inc., Cary, North Carolina).
Source: Ecology - nature.com