in

Lineage-specific protection and immune imprinting shape the age distributions of influenza B cases

Case data

Medically attended influenza B cases in New Zealand were identified from samples taken from patients with influenza-like illness (ILI) attended by a network of general practitioners recruited for surveillance (2430 cases with an identified influenza B lineage) and from non-surveillance hospital samples (1606 cases with an identified lineage) analyzed by regional diagnostic laboratories and by the World Health Organization (WHO) National Influenza Centre at the Institute for Environmental Science and Research (ESR). Briefly, general practice surveillance operates from May to September, with participating practices collecting nasopharyngeal or throat swabs from the first ILI patient examined on each Monday, Tuesday, and Wednesday. ILI is defined as an “acute respiratory tract infection characterized by an abrupt onset of at least two of the following: fever, chills, headache, and myalgia”38. A subset of the New Zealand data (cases from 2002 to 2013) was previously compiled by Vijaykrishna et al.28 along with cases from Australia reported to the WHO Collaborating Centre for Reference and Research on Influenza in Melbourne.

Statistical model of influenza B susceptibility based on infection history

For lineage V (B/Victoria), we modeled the number of cases in people born in birth year b observed in season y as a multinomial draw with probabilities given by:

$${theta }_{V}(b,y)=D(b,y)beta (b,y){Z}_{V}(b,y)rho (b,y)$$

(1)

with an analogous equation defining the multinomial distribution θY(b, y) for lineage Y (B/Yamagata). D(b, y) is the fraction of the population that was born in year b as of observation season y. Z(b, y) is the susceptibility to lineage V during season y of a person born in year b relative to that of an unexposed person. β(b, y) is a baseline probability of infection with influenza B that captures differences in transmission associated with age (thus depending on b and y) and is equal to β1 if people born in year b are in preschool during season y (0–5 years old), β2 if they are school-age children or teenagers (6–17 years old), or β3 if they are 18 or older. ρ(b,y) is an age-specific factor equal to a parameter ρ if people are <2 years old and 1 otherwise. Whereas β(b, y) represents true differences in infection probabilities between preschoolers and post-preschool individuals, and thus affects the infection history probabilities in the calculation of Z(b, y), ρ(b, y) represents a differential probability of receiving medical attention and does not affect those probabilities. For each lineage and observation season, values of θ from Eq. (1) are normalized by their sum across birth years to make them proper multinomial probabilities.

We defined relative susceptibility to V, ZV(b, y), as an expectation over all possible immune histories in terms of the lineage of first infection and subsequent infection with the other lineage. We let susceptibility be 1 for a person never exposed to influenza B. Cross-immunity from any previously encountered strain of V or Y decreased susceptibility to the corresponding lineage by χVV and χYY, respectively. Susceptibility was further reduced by RV or RY if the lineage of first infection was V or Y. In the absence of a previous homologous infection, previous infection with Y decreased susceptibility to V by χVY and previous infection to V decreased susceptibility to Y by χYV. Protection due to homologous infections superseded cross-lineage protection. Protection against V from a previous Y infection was constrained to be a fraction of the within-lineage protection to V (and vice versa): χVY = χVVγVY, χYV = χYYγYV, where 0 ≤ γVY, γYV ≤ 1. Similarly, pre-1988 infections reduced susceptibility to V by χVA = χVVγVA and to Y by χYA = χYYγYA. Finally, ZV(b, y) was calculated as the sum of susceptibilities in Table 1 weighted by the probabilities of the corresponding infection histories (below). Relative susceptibility to Y, ZY(b, y) was defined analogously.

We estimated parameters by maximum likelihood using R (version 3.4.3) and package optimParallel. We calculated the total likelihood as the product of the likelihood for each lineage in each observation year, with the number of cases in each combination treated as an independent multinomial draw. For plotting, we summed the observed and predicted cases for each birth year across observation years.

Infection history probabilities

To calculate the probabilities in Table 1, we assumed infections occur in discrete time measured in units of annual influenza seasons. We considered possible infections in each season between birth and the last season before observation season y. For a person born in year b and previously unexposed to influenza B, we let ab,i be the probability of becoming infected in season i. This probability was then modified by protection from previous infections as in Table 1. Given that an infection occurred in season i, we assumed that the probability it was caused by lineages A (“ancestor”), V or Y was equal to their frequencies in that season, fA,i, fV,i, and fY,i, with fA,i = 1 for all i before 1988, and fV,i + fY,i = 1 since (Fig. 1). For simplicity, we assumed that people could not be infected more than once in each season (including simultaneous infections by the two lineages.)

Let ({{{Phi }}}_{i,j}^{B}) be the probability that no infections with influenza B occurred for a naive person born in b from seasons i to j (inclusive). It is given by

$${{{Phi }}}_{i,j}^{B}=mathop{prod }limits_{k=i}^{j}(1-{a}_{b,k})$$

(2)

where k indexes years (influenza seasons). Thus, the first probability in Table 1, that of being fully naive to influenza B, is given by:

$${P}_{0}(b,y)={{{Phi }}}_{b,y-1}^{B}$$

(3)

To shorten the expressions for the remaining probabilities, we let ({{{Phi }}}_{i,j}^{V}(h)), ({{{Phi }}}_{i,j}^{Y}(h)) and ({{{Phi }}}_{i,j}^{VY}(h)) be the probability that no V infections, no Y infections, and neither V nor Y infections occurred between seasons i and j (inclusive), respectively. Unlike ΦB, which applies to naive people, ΦV, ΦY, and ΦVY depend on the person’s infection history, h. To calculate the probability of an initial infection with A but no subsequent infections with V or Y, PA,0,0(b, y), we integrated, across all possible seasons i of first infection, the joint probability that the person’s first influenza B infection occurred in season i with the ancestral lineage A, and that no subsequent infections with V or Y occurred from i + 1 to y − 1:

$${P}_{A,0,0}(b,y)=mathop{sum }limits_{i=b}^{y-1}left[{{{Phi }}}_{b,i-1}^{B}cdot {a}_{b,i}cdot {f}_{A,i}cdot {{{Phi }}}_{i+1,y-1}^{VY}(A)right]$$

(4)

where the probability that the first infection with influenza B occurred in season i with the ancestor A is obtained by multiplying the probability of no previous infections (({{{Phi }}}_{b,i-1}^{B})) by the probability of an infection in season i (ab,i) and by the frequency of lineage A in season i (fA,i). The probability of no infections with either V or Y after the initial infection with A in season i, ({{{Phi }}}_{i+1,y-1}^{VY}(A)), depends on protection from the A infection against V (χVA) and Y (χYA):

$${{{Phi }}}_{i+1,y-1}^{VY}(A)=mathop{prod }limits_{k=i+1}^{y-1}{1-{a}_{b,k}[{f}_{V,k}(1-{chi }_{VA})+{f}_{Y,k}(1-{chi }_{YA})]}$$

(5)

Similarly, the joint probability of first infection with the ancestor A and subsequent infection with V, but not with Y, is given by

$${P}_{A,V,0}(b,y)=mathop{sum }limits_{i=b}^{y-1}left[right.{{{Phi }}}_{b,i-1}^{B}{a}_{b,i}{f}_{A,i}cdot mathop{sum }limits_{j=i+1}^{y-1}{{{Phi }}}_{i+1,j-1}^{VY}(A){a}_{b,j}{f}_{V,j}(1-{chi }_{VA}){{{Phi }}}_{j+1,y-1}^{Y}(A,V)left]right.$$

(6)

where we again integrated over all possible seasons i when the first infection with influenza B might have occurred. Given the first infection occurred in season i with the “ancestral” lineage A, we calculated the probability of subsequent infection with V, but not with Y, by integrating over all possible seasons j when the first infection with V may have occurred. Given the initial infection with A in season i, the joint probability that the first V infection occurred in j is given by the probability that neither V nor Y infections occurred from i + 1 to j − 1, ({{{Phi }}}_{i+1,j-1}^{VY}(A)), times the probability of a V infection in season j, given by ab,jfV,j(1 − χVA). The probability ({{{Phi }}}_{j+1,y-1}^{Y}(A,V)) of no subsequent Y infections after season j given the previous A and V infections is then given by

$${{{Phi }}}_{j+1,y-1}^{Y}(A,V)=mathop{prod }limits_{k=j+1}^{y-1}{1-{a}_{b,k}{f}_{Y,k}[1-max ({chi }_{YA},{chi }_{YV})]}$$

(7)

where we assumed that only the strongest cross-protection (from the previous A and V infections) applies. By analogy, for PA,Y,0(b, y) we have

$${P}_{A,Y,0}(b,y)=mathop{sum }limits_{i=b}^{y-1}left[right.{{{Phi }}}_{b,i-1}^{B}{a}_{b,i}{f}_{A,i}cdot mathop{sum }limits_{j=i+1}^{y-1}{{{Phi }}}_{i+1,j-1}^{VY}(A){a}_{b,j}{f}_{Y,j}(1-{chi }_{YA}){{{Phi }}}_{j+1,y-1}^{V}(A,Y)left]right.$$

(8)

where

$${{{Phi }}}_{j+1,y-1}^{V}(A,Y)=mathop{prod }limits_{k=j+1}^{y-1}{1-{a}_{b,k}{f}_{V,k}[1-max ({chi }_{VA},{chi }_{VY})]}$$

(9)

To calculate PA,{VY}(b, y), we first computed the probabilities for the particular cases where either V or Y were the second infection, PA,VY(b, y) and PA,YV(b, y), such that PA,{VY}(b, y) is the sum of the two. The first is given by

$${P}_{A,Vto Y}(b,y)=mathop{sum }limits_{i=b}^{y-1}left[right.{{{Phi }}}_{b,i-1}^{B}{a}_{b,i}{f}_{A,i}cdot mathop{sum }limits_{j=i+1}^{y-1}{{{Phi }}}_{i+1,j-1}^{VY}(A){a}_{b,j}{f}_{V,j}(1-{chi }_{VA})[1-{{{Phi }}}_{j+1,y-1}^{Y}(A,V)]left]right.$$

(10)

and the second is given by

$${P}_{A,Yto V}(b,y)=mathop{sum }limits_{i=b}^{y-1}left[right.{{{Phi }}}_{b,i-1}^{B}{a}_{b,i}{f}_{A,i}cdot mathop{sum }limits_{j=i+1}^{y-1}{{{Phi }}}_{i+1,j-1}^{VY}(A){a}_{b,j}{f}_{Y,j}(1-{chi }_{YA})[1-{{{Phi }}}_{j+1,y-1}^{V}(A,Y)]left]right.$$

(11)

Next, we write down the probabilities of infection histories with either V or Y as first infections and no subsequent infections. For PV,0(b, y),

$${P}_{V,0}(b,y)=mathop{sum }limits_{i=b}^{y-1}{{{Phi }}}_{b,i-1}^{B}cdot {a}_{b,i}cdot {f}_{V,i}cdot {{{Phi }}}_{i+1,y-1}^{Y}(V)$$

(12)

where

$${{{Phi }}}_{i+1,y-1}^{Y}(V)=mathop{prod }limits_{k=i+1}^{y-1}[1-{a}_{b,k}{f}_{Y,k}(1-{chi }_{YV})]$$

(13)

By analogy, for PY,0(b, y),

$${P}_{Y,0}(b,y)=mathop{sum }limits_{i=b}^{y-1}{{{Phi }}}_{b,i-1}^{B}cdot {a}_{b,i}cdot {f}_{Y,i}cdot {{{Phi }}}_{i+1,y-1}^{V}(Y)$$

(14)

where

$${{{Phi }}}_{i+1,y-1}^{V}(Y)=mathop{prod }limits_{k=i+1}^{y-1}[1-{a}_{b,k}{f}_{V,k}(1-{chi }_{VY})]$$

(15)

Finally, PV,Y(b, y) and PY,V(b, y) are given by

$${P}_{V,Y}(b,y)=mathop{sum }limits_{i=b}^{y-1}{{{Phi }}}_{b,i-1}^{B}cdot {a}_{b,i}cdot {f}_{V,i}cdot [1-{{{Phi }}}_{i+1,y-1}^{Y}(V)]$$

(16)

$${P}_{Y,V}(b,y)=mathop{sum }limits_{i=b}^{y-1}{{{Phi }}}_{b,i-1}^{B}cdot {a}_{b,i}cdot {f}_{Y,i}cdot [1-{{{Phi }}}_{i+1,y-1}^{V}(Y)]$$

(17)

As case data had information on age and not the exact birth year, we averaged each exposure history probability across the two possible birth years given the age and the observation year (for instance, a 10-year-old in 2000 may have been born in either 1989 or 1990). As the probabilities of different infection histories become very similar for cohorts born long before the lineages split, we used the probabilities calculated for the birth year 1970 for all previous cohorts to decrease computation time.

Season-specific attack rates

Let Pinf(b, i, t) be the probability that a previously unexposed person born in year b has been infected with influenza B after experiencing fraction t [0, 1] of season i. Assuming a constant instantaneous attack rate αb,i throughout the season, Pinf(b, i, t) is given by:

$${{P}}_{text{inf}}(b,i,t)=1-{e}^{-{alpha }_{b,i}t}$$

(18)

We let the instantaneous attack rate αb,i be equal to an age-specific baseline multiplied by an intensity score Si representing the strength of influenza B circulation in season i relative to other seasons:

$${alpha }_{b,i}=-{mathrm{ln}},[1-beta (b,y)]cdot {S}_{i},$$

(19)

where β(b, y) takes on value β1, if birth year b corresponds to an age of <5 in year y, β2, if the corresponding age is 6–17 years, and β3 for ages 18 and older. The probability of infection for an unexposed person born in year b across the entire season, ab, i, is obtained by substituting αb,i in Eq. (18) and setting t = 1:

$${a}_{b,i}=1-{e}^{-{alpha }_{b,i}}=1-{[1-beta (b,y)]}^{{S}_{i}}$$

(20)

The definition of αb,i in Eq. (19) was chosen such that for a season with average influenza intensity (Si = 1), the annual probability of infection for an unexposed person is equal to β(b,y).

For the season corresponding to the first year of life (i = b), people are only susceptible to infections during a fraction of the season, depending on when they were born and how long they were protected by maternal antibodies. In those cases, we defined ab,i as the expected probability of infection across all possible weeks of birth:

$${a}_{b,i}=frac{1}{{W}_{b}}mathop{sum}limits_{w}1-{e}^{-{alpha }_{b,i}{phi }_{i}(b,w+M)},quad ,{text{for}}; i=b$$

(21)

where ϕi(b, w) is the fraction of cases in season i observed in or after week w of year b and Wb is the number of weeks in year b. As people are assumed to be completely protected against infection by maternal antibodies for the first M weeks following birth, ϕi was computed for an effective birth week w + M. Based on the fraction of children under the age of 1 year with detectable antibodies to influenza B40, we set M to 26 weeks (~6 months). Averaging ϕi(b, w + M) over all possible birth weeks w in year b gives the expected fraction of season i experienced by a person born in year b assuming births are distributed uniformly in time. We estimated ϕi(b, w) by fitting the incomplete β-function to the cumulative fraction of cases in the seasons for which we had case data, using R package FlexParamCurve. Following Gostic et al.10, we truncated season-specific infection probabilities so that they never exceed 0.75 even in years of high estimated influenza B intensity.

Intensity scores

We defined the intensity score Si in Eq. (19) as:

$${S}_{i}=frac{ % {rm{Influenza}} {rm{B}} {rm{in}} {rm{ILI}} {rm{specimens}}times {rm{ILI}} {rm{incidence}},;{rm{for}} {rm{season}} i}{{rm{Mean}}[ % {rm{Influenza}} {rm{B}} {rm{in}} {rm{ILI}} {rm{specimens}}times {rm{ILI}} {rm{incidence}}];{rm{across}} {rm{seasons}}}$$

(22)

Annual influenza surveillance reports from New Zealand’s Institute of Environmental Science and Research (ESR) available from 2003 on give the “isolation” or “detection” rate (the number of influenza-positive swabs divided by the number of swabs tested), the percentage of influenza A and B viruses among all influenza-positive isolates (both “sentinel” and “non-sentinel”), and the estimated number of ILI cases in New Zealand for each season62. The reports do not directly give the fraction of ILI specimens that were influenza B positive. Instead, we calculated the fraction of ILI isolates that were influenza B positive in a season by multiplying the fraction of ILI isolates that were influenza positive by the fraction of influenza-positive specimens that were influenza B. For seasons without data on the fraction of influenza-positive specimens in ILI specimens (1988 to 2000), without data on the fraction of influenza B in influenza-positive specimens (1988–1989), or without estimates of the total number of ILI cases (1988–2001), we used the average values of those quantities across the remaining seasons. Although reports are not available for 1990–2002, the 2003 annual report lists the frequency of influenza B in influenza-positive specimens for those seasons.

The WHO’s FluNet has weekly data on the fraction of ILI specimens that were influenza B positive in Australia from 1997 to the present. However, in data from before 2003, the number of influenza-positive specimens was usually the same as the reported number of specimens processed for that week, suggesting strong case ascertainment or reporting bias. We thus used data on the percent of influenza-positive ILI cases from 2003 on. Annual influenza reports from 1994 to 2010 are available from the Australian government’s Department of Health website63. They report numbers of influenza A- and B-positive isolates but not the total number of specimens tested. Thus, only the fraction of influenza B in influenza isolates (but not in ILI isolates) can be estimated from those reports. We therefore used data from the WHO to calculate the fraction of influenza B-positive specimens in ILI specimens for 2003–2017. For 1994–2002, we multiplied the fraction of influenza B in influenza-positive specimens for each season (from the Department of Health reports) by the average annual fraction of influenza-positive specimens in ILI specimens from 2003 to 2017 (from the WHO data), to arrive at the fraction of influenza B-positive ILI specimens. Finally, for seasons where data were missing altogether (1988–1993), we used the average annual fraction of influenza B-positive specimens in ILI specimens for subsequent seasons (1994–2017).

To estimate ILI incidence in Australia, we used the maximum weekly number of ILI cases per 1000 consultations for each season from Department of Health annual and weekly reports (weekly reports are available for years since the last annual report in 2010). Different ILI definitions were used from 1994 to 2003 and from 2004 to 2010, and starting in 2009 reported weekly ILI rates were averaged from multiple branches of the Australian influenza surveillance system. We thus normalized values by the average value within each of those periods (1994–2003, 2004–2008, and 2009–2018) to arrive at a normalized peak number of ILI cases per 1000 consultations in Australia.

Historical frequencies of influenza B lineages

To estimate historical frequencies of B/Victoria and B/Yamagata, we downloaded data on lineage and date and country of isolation for all influenza B isolates on the GISAID website collected until 31 December 2019. To complement these data, we searched the NCBI Influenza Virus Database for all protein-coding HA sequences of influenza B viruses isolated from humans and excluding laboratory strains (information on passage history for GISAID entries was scarce and non-standardized, and so we did not filter out laboratory strains from the GISAID data). As lineage information was missing for virtually all sequences retrieved from NCBI, we used BLAST to assign each sequence to either B/Victoria or B/Yamagata, based on the highest bit score match with reference sequences B/Victoria/2/87 and B/Yamagata/16/1988.

We combined data from both databases to estimate the frequency of B/Yamagata and B/Victoria isolates in each season. Isolates collected in year y in Europe or North America were assigned to season y − 1/y, if collected before October and to season y/y + 1 if collected in October–December. As most European and North American isolates were collected before October of the respective year (median across years = 83% for Gisaid and 82% for NCBI), we assumed isolates with missing month of collection in those regions were collected before October (and thus belonged to the season beginning in the previous calendar year).

Isolates with the same name but reported for different countries or seasons were considered separately. We condensed multiple occurrences of the same isolate in the same country and season (within or across data sets) into one, disregarding isolates for which different lineages were assigned in different countries/seasons. Using isolates present in both databases, we found that our BLAST lineage assignment matched the lineage reported on GISAID in 98% (3159/3217) of cases. We disregarded isolates for which our BLAST assignment and the reported GISAID assignment disagreed. The final data set consisted of 35,158 isolates, 23 of which (0.07%) were represented more than once (in different countries or seasons). We estimated the frequency of a lineage as the number of isolates belonging to that lineage divided by the total number of influenza B isolates collected in a season.

As all 23 isolates collected in New Zealand and Australia in the 1990s were B/Yamagata (Supplementary Fig. 2), and as previous work suggests B/Victoria circulated only in East Asia during that period9, we assumed the frequency of B/Yamagata in New Zealand and Australia to be 100% from 1988 to 2000, even though no isolates were available for several individual years within that range. In 2001 and 2003, when both lineages are known to have been circulating in New Zealand and Australia, but fewer than ten isolates were available for the two countries combined in the sequence databases, we used frequencies estimated from all other countries combined. Frequencies estimated from all other countries combined were strongly correlated with estimates based on isolates from Australia and New Zealand only (Pearson’s correlation coefficient = 0.88, 95% CI 0.76–0.95). We also considered using frequencies estimated from isolates collected in the United States, which were also correlated with frequencies in Australia and New Zealand (Supplementary Fig. 2), but the correlation was weaker (0.69, 95% CI 0.42–0.85). For the sensitivity analysis of lineage frequencies in the 1990s, we re-fitted the model using frequencies from all countries combined for all years from 1988 on in which fewer than ten isolates were available for New Zealand and Australia combined, including years in the 1990s.

We hoped to use the sequence databases to get more reliable estimates of lineage frequencies in the 1980s than those provided by early antigenic characterization8, but fewer than ten isolates were available for each year before 1988. To accommodate uncertainty, we grouped infections with B/Victoria and B/Yamagata before 1988 with infections by the ancestral influenza B strains circulating before the lineages split.

We compared our estimates of lineage frequencies based on sequence data to estimates based on antigenic characterization of circulating strains from epidemiological surveillance reports (Supplementary Fig. 2). Surveillance reports from Australia are available from the Australian Government’s Department of Health website63. Surveillance reports from New Zealand are available from the website of New Zealand’s Institute of ESR62. Although reports from New Zealand are only available from 2003 on, Fig. 27 of the 2012 report shows B/Victoria and B/Yamagata frequencies from 1990 to 2002 (without reporting the number of isolates used to estimate those frequencies). Annual summaries of influenza surveillance in the United States are published by the Centers for Disease Control and Prevention (e.g., see ref. 64).

Age distributions of cases in non-surveillance data

We applied the model to the age distributions of influenza B isolates available on GISAID. In these analyses, isolate data are used in two different ways: both to estimate historical lineage frequencies and to estimate the age distributions of cases to which the model is fitted. Although historical lineage frequencies are calculated based on all GISAID isolates with lineage information (and complemented with isolates from the NCBI Influenza Virus Database; see above), only a subset of GISAID isolates are annotated with both the isolate’s lineage and the host’s age, and therefore only a subset can be used to estimate the age distributions of cases. We analyzed the age distributions of isolates from the EU and China, because those regions have many isolates with both lineage and age information (4702 isolates from 2006 to 2018 for the EU, 1880 isolates from 2005 to 2019 for China) and easily available demographic data for the general population (we excluded the United States, because historically high influenza vaccination coverage would require incorporating vaccination into the model). To increase statistical power for the estimates of historical lineage frequencies, we included isolates from European countries outside the EU and countries in East Asia besides China (South Korea, Japan, Mongolia, and Taiwan) represented in GISAID (Supplementary Fig. 2). Individual countries are represented in different proportions in the lineage frequency and age distribution data sets (Supplementary Fig. 24). Thus, if lineage frequencies vary in space within Europe or within East Asia, our estimates of infection history probabilities might be biased relative to the true infection histories associated with the age distributions used to fit the model.

As B/Yamagata was the only lineage circulating in Europe in the mid to late 1990s, when isolate data were more abundant, we assumed B/Yamagata was completely dominant throughout the entire 1990s, as we did for New Zealand. We relaxed this assumption by letting B/Victoria be the only circulating lineage in 1988–1993 in Europe, consistent with its high frequency in the scarce isolate data from that period.

For both the EU and China, we assumed constant intensity scores across seasons (Si = 1) and used the cumulative incidence of cases within seasons (ϕi(b, w)) from New Zealand to model infection in the first year of life.

Demographic data

We obtained annual age distributions for the general population by single year of age from governmental websites from New Zealand65, Australia66, China67, and the EU68.

Model validation with independent serological data

We compared the fraction of children predicted by our model to have been previously infected with B/Victoria, B/Yamagata, or either lineage with the fraction of children that had detectable antibodies against the corresponding lineage (or any influenza B strain) in the Netherlands40. Sera from children 0 to 7 years old collected between February 2006 and June 2007 were tested using the hemagglutination inhibition assay against a panel of reference B/Victoria and B/Yamagata strains, as well as of strains isolated in the Netherlands during the study period. Sera were considered positive if their hemagglutination inhibition titer was ≥10 against at least one strain from the corresponding set (all influenza B strains, B/Victoria strains, and B/Yamagata strains). We compared these data with predictions under the maximum likelihood parameter estimates of our model fitted to the complete New Zealand data.

We also used the seroprevalence data to independently estimate the annual probability of infection for preschoolers and school-age children, equivalent to the β1 and β2 parameters in our model. Assuming a constant instantaneous attack rate α, an individual of age A years is still naive (and therefore seronegative) to influenza B with probability PN(A) given by:

$${P}_{mathrm{N}}(A)={e}^{-alpha A}$$

(23)

The probability of observing X seronegative individuals in a sample of n individuals of age A can be calculated assuming X ~ Binomial[n,PN(A)], and α thus can be estimated by maximum likelihood. The annual attack rate can then be calculated from the instantaneous attack rate α as βNetherlands = 1 − eα.

We make two modifications to Eq. (23) to account for the presence of maternal antibodies early in life and for uncertainty in the age of individuals when their serum was collected. First, we assume individuals spend a time m (in units of years) fully protected against influenza B due to the presence of maternal antibodies. Consistent with the fraction of children under the age of 1 year with detectable antibodies to influenza B40, we assumed m = 0.5 year. Second, as ages were reported at the resolution of 1 year (e.g., an individual 2.6 years old is reported as being 2 years old), we assume individuals with recorded age A were sampled at a randomly distributed time T [0, 1) during the interval between the ages of A and A + 1. Thus, we let PN(A) be given by the expectation over T:

$${P}_{mathrm{N}}(A)=E[{e}^{-alpha (A+T-m)}]=int_{0}^{1}{e}^{-alpha (A+t-m)}f(t)dt$$

(24)

where f(t) is the probability density function of T. Assuming T is uniformly distributed between 0 and 1 (i.e., f(t) = 1), we have:

$${P}_{mathrm{N}}(A) = int _{0}^{1}{e}^{-alpha (A+t-m)}dt = , {e}^{-alpha (A-m)}int_{0}^{1}{e}^{-alpha t}dt = , {e}^{-alpha (A-m)}frac{(1-{e}^{-alpha })}{alpha }$$

(25)

valid for A > m and α > 0. Letting α1 and α2 be the instantaneous attack rates for preschoolers and school-age children:

$${P}_{mathrm{N}}(A)=left{begin{array}{ll}{e}^{-{alpha }_{1}(A-m)}frac{(1-{e}^{-{alpha }_{1}})}{{alpha }_{1}},hfill&,{text{if}},A; le; {A}_{mathrm{s}} {e}^{-{alpha }_{1}({A}_{mathrm{s}}-m)}{e}^{-{alpha }_{2}(A-{A}_{mathrm{s}})}frac{(1-{e}^{-{alpha }_{2}})}{{alpha }_{2}},&,{text{if}},A; > ; {A}_{mathrm{s}}end{array}right.$$

(26)

where As is the age at which children start going to school (4 years old in the Netherlands69). It is noteworthy that for school-age children (the equation for A > As on the bottom), the correction term for uncertainty in sampling is not necessary for the time spent in preschool (assumed to be exactly As years), only for the time after preschool (A − As).

Handling cases with missing lineage information

We assumed cases with missing lineage information in 2002 (n = 61), 2011 (n = 312), and 2019 (n = 206) belonged to B/Victoria, as 99% or more of identified cases in those seasons were B/Victoria (86/87 cases in 2002, 276/280 cases in 2011, and 552/552 cases in 2019) as were 94%, 92%, and 92%, respectively, of isolates from sequence databases (for Australia and New Zealand combined). We assumed cases with missing lineage information belonged to B/Yamagata in 2013 (n = 37), 2014 (n = 77), and 2017 (n = 87), when the majority of identified cases were B/Yamagata (268/272, 131/138, and 473/489, respectively), as were 99%, 94%, and 84%, respectively, of isolates in sequence databases. Unidentified cases in other seasons were disregarded, because both lineages were present at higher frequencies among identified cases. Removing unidentified cases altogether in all seasons led to similar parameter estimates.

Sequence divergence analysis

To estimate the amount of evolution within and between lineages, we analyzed all complete HA and NA sequences from human influenza B isolates available on GISAID in July 2019. The set of isolates used in this analysis differs from the set used to estimate lineage frequencies, because we required isolates to have complete sequences (although not all sequences listed as complete on GISAID were in fact complete). Two isolates collected in 2000 (B/Hong Kong/548/2000 and B/Victoria/504/2000) were deposited as B/Victoria but our BLAST assignment indicated they were in fact B/Yamagata (their low divergence from B/Yamagata strains was a clear outlier). NA sequences from isolates B/Kanagawa/73 and B/Ann Arbor/1994 were only small fragments (99 and 100 amino acids long) poorly aligned with other sequences and were thus excluded. We also excluded NA sequences from B/Yamagata isolates B/Catalonia/NSVH100773835/2018 and B/Catalonia/NSVH100750997/2018, because they were extremely diverged (60% and 38%) from the reference strain B/Yamagata/16/88 and aligned poorly with other sequences.

To compare sequence diversity within and between lineages over time, we aligned sequences using MAFFT v. 7.31070 and calculated percent amino acid differences in pairs of sequences from the same lineage and in pairs with one sequence from each. For each year, we sampled 100 sequences from each lineage (or used all sequences if 100 or fewer were available) to limit the number of pairwise calculations. To estimate how much B/Yamagata and B/Victoria evolved since the late 1980s, we calculated percent amino acid differences between each B/Yamagata and B/Victoria sequence, and the corresponding HA and NA sequences of reference strains B/Yamagata/16/88 and B/Victoria/2/87. Unlike in the analysis of pairwise divergence within each time point, we used all sequences from each lineage in each year. We excluded sites in which one or both sequences had gaps or ambiguous amino acids.

To compare HA and NA divergence between influenza B lineages with divergence between influenza A subtypes, we downloaded complete HA and NA sequences from H3N2 and H1N1 isolated since 1977 and available on GISAID in August 2019. Homologous sites in the HA of H3N2 and H1N1 are difficult to identify by conventional sequence alignment, and instead we used the algorithm by Burke and Smith71 implemented on the Influenza Research Database website72. Both H3N2 and H1N1 sequences were aligned with the reference H3N2 sequence A/Aichi/2/68. We verified that this method matched sites on the stalk and head of the H1N1 HA with sites on the stalk and head of H3N2 HA by comparing the resulting alignment with the alignment in Supplementary Fig. S2 of Kirkpatrick et al.73. To limit the total number of influenza A sequences analyzed, we randomly selected 100 H3N2 and 100 H1N1 sequences for years in which more than 100 sequences were available, and used all available sequences for the remaining years. Isolates A/Canterbury/58/2000, A/Canterbury/87/2000, and A/Canterbury/55/2000 were excluded, because both H1N1-like and H3N2-like sequences were available under the same isolate name on GISAID.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.


Source: Ecology - nature.com

Mechanisms and heterogeneity of in situ mineral processing by the marine nitrogen fixer Trichodesmium revealed by single-colony metaproteomics

Long term relationship between farming damselfish, predators, competitors and benthic habitat on coral reefs of Moorea Island