Neighbor GWAS: incorporating neighbor genotypic identity into genome-wide association studies of field herbivory

Neighbor GWAS

Basic model

We analyzed neighbor effects in GWAS as an inverse problem of the two-dimensional Ising model, named “neighbor GWAS” hereafter (Fig. 1). We considered a situation where a plant accession has one of two alleles at each locus, and a number of accessions occupied a finite set of field sites, in a two-dimensional lattice. The allelic status at each locus was represented by x, and so the allelic status at each locus of the ith focal plant and the jth neighboring plants was designated as x_i(j)∈{−1, +1}. Based on a two-dimensional Ising model, we defined a phenotype value for the ith focal individual plant y_ias:

$$y_i = beta _1x_i + beta _2mathop {sum }limits_{ < i,j > } x_ix_j$$

(1)

where β₁ and β₂ denoted self-genotype and neighbor effects, respectively. If two neighboring plants shared the same allele at a given locus, the product x_ix_j turned into (−1) × (−1) = +1 or (+1) × (+1) = +1. If two neighbors had different alleles, the product x_ix_j became (−1) × (+1) = −1 or (+1) × (−1) = −1. Accordingly, the effects of neighbor genotypic identity on a particular phenotype depended on the coefficient β₂ and the number of the two alleles in a neighborhood. If the numbers of identical and different alleles were the same near a focal plant, these neighbors offset the sum of the products between the focal plant i and all j neighbors (mathop {sum }nolimits_{ < i,j > } x_ix_j) and exerted no effects on a phenotype. When we summed up the phenotype values for the total number of plants n and replaced it as E = −β₂, H = −β₁ and ε_I = Σy_i, Eq. 1 could be transformed into ({it{epsilon }}_I = – {E}mathop {sum }nolimits_{ < i,j > } x_ix_j – {H}mathop {sum }x_i), which defined the interaction energy of a two-dimensional ferromagnetic Ising model (McCoy and Maillard 2012). The neighbor effect β₂ and self-genotype effect β₁ were interpreted as the energy coefficient E and external magnetic effects H, respectively. An individual plant represented a spin and the two allelic states of each locus corresponded to a north or south dipole. The positive or negative value of Σx_ix_j indicated a ferromagnetism or paramagnetism, respectively. In this study, we did not consider the effects of allele dominance because this model was applied to inbred plants. However, heterozygotes could be processed if the neighbor covariate x_ix_j was weighted by an estimated degree of dominance in the self-genotypic effects on a phenotype.

Association tests

For association mapping, we needed to determine β₁ and β₂ from the observed phenotypes and considered a confounding sample structure as advocated by previous GWAS (e.g., Kang et al. 2008; Korte and Farlow 2013). Extending the basic model (Eq. (1)), we described a linear mixed model at an individual level as:

$$y_i = beta _0 + beta _1x_i + frac{{beta _2}}{L}mathop {sum }limits_{ < i,j > }^L x_ix_j^{(s)} + u_i + e_i$$

(2)

where β₀ indicated the intercept; the term β₁x_i represented fixed self-genotype effects as tested in standard GWAS; and β₂ was the coefficient of fixed neighbor effects. The neighbor covariate (mathop {sum }nolimits_{ < i,j > }^L x_ix_j^{(s)}) indicated a sum of products for all combinations between the ith focal plant and the jth neighbor at the sth spatial scale from the focal plant i, and was scaled by the number of neighboring plants, L. The number of neighboring plants L was dependent on the spatial scale s to be referred. Variance components due to the sample structure of self and neighbor effects were modeled by a random effect (u_i in {boldsymbol{u}}) and ({boldsymbol{u}}sim {mathrm{Norm}}(textbf{0},sigma _1^2{boldsymbol{K}}_1 + sigma _2^2{boldsymbol{K}}_2)). The residual was expressed as (e_i in {boldsymbol{e}}) and ({boldsymbol{e}}sim {mathrm{Norm}}(textbf{0},sigma _e^2{boldsymbol{I}})), where I represented an identity matrix.

Variation partitioning

To estimate the proportion of phenotypic variation explained (PVE) by the self and neighbor effects, we utilized variance component parameters in linear mixed models. The n × n variance-covariance matrices represented the similarity in self-genotypes (i.e., kinship) and neighbor covariates among n individual plants as ({boldsymbol{K}}_1 = frac{1}{{q – 1}}{boldsymbol{X}}_1^{mathrm{T}}{boldsymbol{X}}_1) and ({boldsymbol{K}}_2 = frac{1}{{q – 1}}{boldsymbol{X}}_2^{mathrm{T}}{boldsymbol{X}}_2), where the elements of n plants × q markers matrix X₁ and X₂ consisted of explanatory variables for the self and neighbor effects as X₁ = (x_i) and ({boldsymbol{X}}_2 = (frac{{mathop {sum }nolimits_{ < i,j > }^L x_ix_j^{(s)}}}{L})). As we defined (x_{i(j)} in){+1, −1}, the elements of the kinship matrix K₁ were scaled to represent the proportion of marker loci shared among n × n plants such that ({boldsymbol{K}}_1 = left( {frac{{k_{ij}, + ,1}}{2}} right));(sigma _1^2)and (sigma _2^2) indicated variance component parameters for the self and neighbor effects.

The individual-level formula Eq. (2) could also be converted into a conventional matrix form as:

$${boldsymbol{y}} = {boldsymbol{X}}{boldsymbol{beta }} + {boldsymbol{Zu}} + {boldsymbol{e}}$$

(3)

where y was an n × 1 vector of the phenotypes; X was a matrix of fixed effects, including a unit vector, self-genotype x_i, neighbor covariate (({mathop {sum }nolimits_{ < i,j > }^L x_ix_j^{(s)}})/{L}), and other confounding covariates for n plants; β was a vector that represents the coefficients of the fixed effects; Z was a design matrix allocating individuals to a genotype, and became an identity matrix if all plants were different accessions; u was the random effect with Var(u) =(sigma _1^2{boldsymbol{K}}_1 + sigma _2^2{boldsymbol{K}}_2); and e was residual as Var(e) = (sigma _e^2{boldsymbol{I}}).

Because our objective was to test for neighbor effects, we needed to avoid the detection of false positive neighbor effects. The self-genotype value x_i and neighbor genotypic identity (mathop {sum }nolimits_{ < i,j > }^L x_ix_j^{(s)}) would become correlated explanatory variables in a single regression model (sensu colinear) due to the minor allele frequency (MAF) and the spatial scale of s. When MAF is low, neighbors (x_j^{(s)}) are unlikely to vary in space and most plants will have similar values for neighbor identity (mathop {sum }nolimits_{ < i,j > }^L x_ix_j^{(s)}). Furthermore, if the neighbor effects range was broad enough to encompass an entire field (i.e., s→∞), the neighbor covariate and self-genotype x_iwould become colinear according to the equation: (left(mathop {sum }nolimits_{ < i,j > }^L x_ix_j^{(s)}right)/L = x_ileft( {mathop {sum }nolimits_{j = 1}^L x_j^{left( s right)}} right)/L = x_ibar x_j), where (bar x_j) indicates a population-mean of neighbor genotypes and corresponds to a population-mean of self-genotype values (bar x_i), if s→∞. The standard GWAS is a subset of the neighbor GWAS and these two models become equivalent at s = 0 and (sigma _2^2) = 0. When testing the self-genotype effect β₁, we recommend that the neighbor effects and its variance component (sigma _2^2) should be excluded; otherwise, the standard GWAS fails to correct a sample structure because of the additional variance component at (sigma _2^2, ne ,0). To obtain a conservative conclusion, the significance of β₂ and (sigma _2^2) should be compared using the standard GWAS model based on self-effects alone.

Given the potential collinearity between the self and neighbor effects, we defined different metrics for the proportion of phenotypic variation explained (PVE) based on self or neighbor effects. Using a single-random effect model, we calculated PVE for either the self or neighbor effects as follows:

‘single’ PVE_self = (sigma _1^2/(sigma _1^2 + sigma _e^2))when s and (sigma _2^2) were set at 0, or

‘single’ PVE_nei = (sigma _2^2/(sigma _2^2 + sigma _e^2))when (sigma _1^2) was set at 0.

Using a two-random effect model, we could focus on one variable while considering relationships between two variables (sensu partial out) for either of the two variance components. We defined such a partial PVE as:

‘partial’ PVE_self = (sigma _1^2/(sigma _1^2 + sigma _2^2 + sigma _e^2)) and

‘partial’ PVE_nei = (sigma _2^2/(sigma _1^2 + sigma _2^2 + sigma _e^2)).

As the partial PVE_self was equivalent to the single PVE_self when s was set at 0, the net contribution of neighbor effects at s ≠ 0 was given as

‘net’ PVE_nei = (partial PVE_self + partial PVE_nei) − single PVE_self,

which indicated the proportion of phenotypic variation that could be explained by neighbor effects, but not by the self-genotype effects.

Simulation

To examine the model performance, we applied the neighbor GWAS to simulated phenotypes. Phenotypes were simulated using a subset of the actual A. thaliana genotypes. To evaluate the performance of the simple linear model, we assumed a complex ecological form of neighbor effects with multiple variance components controlled. The model performance was evaluated in terms of the causal variant detection and accuracy of estimates. All analyses were performed using R version 3.6.0 (R Core Team 2019).

Genotype data

To consider a realistic genetic structure in the simulation, we used part of the A. thaliana RegMap panel (Horton et al. 2012). The genotype data for 1307 accessions were downloaded from the Joy Bergelson laboratory website (http://bergelson.uchicago.edu/?page_id=790 accessed on February 9, 2017). We extracted data for chromosomes 1 and 2 with MAF at >0.1, yielding a matrix of 1307 plants with 65,226 single nucleotide polymorphisms (SNPs). Pairwise linkage disequilibrium (LD) among the loci was r² = 0.003 [0.00–0.06: median with upper and lower 95 percentiles]. Before generating a phenotype, genotype values at each locus were standardized to a mean of zero and a variance of 1. Subsequently, we randomly selected 1,296 accessions (= 36 × 36 accessions) without any replacements for each iteration and placed them in a 36 × 72 checkered space, following the Arabidopsis experimental settings (see Fig. S1).

Phenotype simulation

To address ecological issues specific to plant neighborhood effects, we considered two extensions, namely asymmetric neighbor effects and spatial scales. Previous studies have shown that plant–plant interactions between accessions are sometimes asymmetric under herbivory (e.g., Bergvall et al. 2006; Verschut et al. 2016; Sato and Kudoh 2017) and height competition (Weiner 1990); where one focal genotype is influenced by neighboring genotypes, while another receives no neighbor effects. Such asymmetric neighbor effects can be tested by statistical interaction terms in a linear model (Bergvall et al. 2006; Sato and Kudoh 2017). Several studies have also shown that the strength of neighbor effects depends on spatial scales (Hambäck et al. 2014), and that the scale of neighbors to be analyzed relies on the dispersal ability of the causative organisms (see Hambäck et al. 2009; Sato and Kudoh 2015; Verschut et al. 2016; Ida et al. 2018 for insect and mammal herbivores; Rieux et al. 2014 for pathogen dispersal) or the size of the competing plants (Weiner 1990). We assumed the distance decay at the sth sites from a focal individual i with the decay coefficient α as (wleft( {s,alpha } right) = {mathrm{e}}^{ – alpha (s – 1)}), since such an exponential distance decay has been widely adopted in empirical studies (Devaux et al. 2007; Carrasco et al. 2010; Rieux et al. 2014; Ida et al. 2018). Therefore, we assumed a more complex model for simulated phenotypes than the model for neighbor GWAS as follows:

$$y_i = beta _0 + beta _1x_i + frac{{beta _2}}{L}mathop {sum }limits_{ < i,j > }^L w(s,alpha )x_ix_j^{(s)} + beta _{12}frac{{x_i}}{L}mathop {sum }limits_{ < i,j > }^L w(s,alpha )x_ix_j^{(s)} + u_i + e_i$$

(4)

where β₁₂ was the coefficient for asymmetry in neighbor effects. By incorporating an asymmetry coefficient, the model (Eq. (4)) can deal with cases where neighbor effects are one-sided or occur irrespective of a focal genotype (Fig. 2). Total variance components resulting from three background effects (i.e., the self, neighbor, and self-by-neighbor effects) were defined as (u_i in {boldsymbol{u}}) and ({boldsymbol{u}}sim {mathrm{Norm}}(textbf{0},sigma _1^2{boldsymbol{K}}_1 + sigma _2^2{boldsymbol{K}}_2 + sigma _{12}^2{boldsymbol{K}}_{12})). The three variance component parameters (sigma _1^2), (sigma _2^2), and (sigma _{12}^2), determined the relative importance of the self-genotype, neighbor, and asymmetric neighbor effects in u_i. Given the elements of n plants × q marker explanatory matrix with ({boldsymbol{X}}_{12} = (frac{{x_i}}{L}mathop {sum }nolimits_{ < i,j > }^L w(s,alpha )x_ix_j^{(s)})), the similarity in asymmetric neighbor effects was calculated as ({boldsymbol{K}}_{12} = frac{1}{q-1}{boldsymbol{X}}_{12}^{mathrm{T}}{boldsymbol{X}}_{12}). To control phenotypic variations, we further partitioned the proportion of phenotypic variation into those explained by the major-effect genes and variance components PVE_β + PVE_u, major-effect genes alone PVE_β, and residual error PVE_e, where PVE_β + PVE_u + PVE_e = 1. The optimize function in R was used to adjust the simulated phenotypes to the given amounts of PVE.

Fig. 2: Numerical examples of the symmetric and asymmetric neighbor effects.

The intercept, distance decay, random effects, and residual errors are neglected, to simplify this scheme. a Symmetric neighbor effects represent how neighbor genotype similarity (or dissimilarity) affects the trait value of a focal individual y_i regardless of its own genotype. b Asymmetric neighbor effects can represent a case in which one genotype experiences neighbor effects while the other does not (b) and a case in which the direction of the neighbor effects depends on the genotypes of a focal individual (c). The case b was considered in our simulation as it has been empirically reported (e.g., Bergvall et al. 2006; Verschut et al. 2016; Sato & Kudoh 2017).

Full size image

Parameter setting

Ten phenotypes were simulated with varying combination of the following parameters, including the distance decay coefficient α, the proportion of phenotypic variation explained by the major-effect genes PVE_β, the proportion of phenotypic variation explained by major-effect genes and variance components PVE_β + PVE_u, and the relative contributions of self, symmetric neighbor, and asymmetric neighbor effects, i.e., PVE_self:PVE_nei:PVE_s×n. We ran the simulation with different combinations, including α = 0.01, 1.0, or 3.0; PVE_self:PVE_nei:PVE_s×n = 8:1:1, 5:4:1, or 1:8:1; and PVE_β and PVE_β + PVE_u = 0.1 and 0.4, 0.3 and 0.4, 0.3 and 0.8, or 0.6 and 0.8. The maximum reference scale was fixed at s = 3. The line of simulations was repeated for 10, 50, or 300 causal SNPs to examine cases of oligogenic and polygenic control of a trait. The non-zero coefficients (i.e., signals) for the causal SNPs were randomly sampled from −1 or 1 digit and then assigned, as some causal SNPs were responsible for both the self and neighbor effects. Of the total number of causal SNPs, 15% had self, neighbor, and asymmetric neighbor effects (i.e.,β₁ ≠ 0 and β₂ ≠ 0 and β₁₂ ≠ 0); another 15% had both the self and neighbor effects, but no asymmetry in the neighbor effects (β₁ ≠ 0 and β₂ ≠ 0 and β₁₂ ≠ 0); another 35% had self-genotypic effects only (β₁ ≠ 0); and the remaining 35% had neighbor effects alone (β₂ ≠ 0). Given its biological significance, we assumed that some loci having neighbor signals possessed asymmetric interactions between the neighbors (β₂ ≠ 0 and β₁₂ ≠ 0), while the others had symmetric interactions (β₂ ≠ 0 and β₁₂ ≠ 0). Therefore, the number of causal SNPs in β₁₂ was smaller than that in the main neighbor effects β₂. According to this assumption, the variance component (sigma _{12}^2) was also assumed to be smaller than (sigma _2^2). To examine extreme conditions and strong asymmetry in neighbor effects, we additionally analyzed the cases with PVE_self:PVE_nei:PVE_s×n = 1:0:0, 0:1:0, or 1:1:8.

Summary statistics

The simulated phenotypes were fitted by Eq. (2) to test the significance of coefficients β₁ and β₂, and to estimate single or partial PVE_self and PVE_nei. To deal with potential collinearity between x_i and neighbor genotypic identity (mathop {sum }nolimits_{ < i,j > }^L x_ix_j^{(s)}), we performed likelihood ratio tests between the self-genotype effect model and the model with both self and neighbor effects, which resulted in conservative tests of significance for β₂ and (sigma _2^2). The simulated phenotype values were standardized to have a mean of zero and a variance of 1, where true β was expected to match the estimated coefficients (hat beta) when multiplied by the standard deviation of non-standardized phenotype values. The likelihood ratio was calculated as the difference in deviance, i.e., −2 × log-likelihood, which is asymptotically χ² distributed with one degree of freedom. The variance components, (sigma _1^2) and (sigma _2^2), were estimated using a linear mixed model without any fixed effects. To solve the mixed model with the two random effects, we used the average information restricted maximum likelihood (AI-REML) algorithm implemented in the lmm.aireml function in the gaston package of R (Perdry and Dandine-Roulland 2018). Subsequently, we replaced the two variance parameters (sigma _1^2) and (sigma _2^2) in Eq. (2) with their estimates (hat sigma _1^2) and (hat sigma _2^2) from the AI-REML, and performed association tests by solving a linear mixed model with a fast approximation, using eigenvalue decomposition (implemented in the lmm.diago function: Perdry and Dandine-Roulland 2018). The model likelihood was computed using the lmm.diago.profile.likelihood function. We evaluated the self and neighbor effects for association mapping based on the forward selection of the two fixed effects, β₁ and β₂, as described below:

1.
Computed the null likelihood with (sigma _1^2, ne ,0) and (sigma _2^2 = 0) in Eq. (2).
2.
Tested the self-effect, β₁, by comparing with the null likelihood.
3.
Computed the self-likelihood with (hat sigma _1^2), (hat sigma _2^2), and β₁ using Eq. (2).
4.
Tested the neighbor effects, β₂, by comparing with the self-likelihood.

We also calculated PVE using the mixed model (Eq. (3)) without β₁ and β₂ as follows:

1.
Calculated single PVE_self or single PVE_nei by setting either (sigma _1^2) or (sigma _2^2) at 0.
2.
Tested the single PVE_self or single PVE_nei using the likelihood ratio between the null and one-random effect model.
3.
Calculated the partial PVE_self and partial PVE_nei by estimating (sigma _1^2) and (sigma _2^2) simultaneously.
4.
Tested the partial PVE_self and partial PVE_nei using the likelihood ratio between the two- and one-random effect model.

We inspected the model performance based on causal variant detection, PVE estimates, and effect size estimates. The true or false positive rates between the causal and non-causal SNPs were evaluated using ROC curves and area under the ROC curves (AUC) (Gage et al. 2018). An AUC of 0.5 would indicate that the GWAS has no power to detect true signals, while an AUC of 1.0 would indicate that all the top signals predicted by the GWAS agree with the true signals. In addition, the sensitivity to distinguish self or neighbor signals (i.e., either β₁ ≠ 0 or β₂ ≠ 0) was evaluated using the true positive rate of the ROC curves (i.e., y-axis of the ROC curve) at a stringent specificity level, where the false positive rate (x-axis of the ROC curve) = 0.05. The roc function in the pROC package (Robin et al. 2011) was used to calculate the ROC and AUC from −log₁₀(p value). Factors affecting the AUC or sensitivity were tested by analysis-of-variance (ANOVA) for the self or neighbor effects (AUC_self or AUC_nei; self or neighbor sensitivity). The AUC and PVE were calculated from s = 1 (the first nearest neighbors) to s = 3 (up to the third nearest neighbors) cases. The AUC was also calculated using standard linear models without any random effects, to examine whether the linear mixed models were superior to the linear models. We also tested the neighbor GWAS model incorporating the neighbor phenotype (y_j^{(s)}) instead of (x_j^{(s)}). The accuracy of the total PVE estimates was defined as PVE accuracy = (estimated total PVE − true total PVE) / true total PVE. The accuracy of the effect size estimates was evaluated using mean absolute errors (MAE) between the true and estimated β₁ or β₂ for the self and neighbor effects (MAE_self and MAE_nei). Factors affecting the accuracy of PVE and effect size estimates were also tested using ANOVA. Misclassifications between self and neighbor signals were further evaluated by comparing p value scores between zero and non-zero coefficients. If −log₁₀(p value) scores of zero β are the same or larger than non-zero β, it infers a risk of misspecification of the true signals.

Arabidopsis herbivory data

We applied the neighbor GWAS to field data of Arabidopsis herbivory. The procedure for this field experiment followed that of our previous experiment (Sato et al. 2019). We selected 199 worldwide accessions (Table S1) from 2029 accessions sequenced by the RegMap (Horton et al. 2012) and 1001 Genomes project (Alonso-Blanco et al. 2016). Of the 199 accessions, most were overlapped with a previous GWAS of biotic interactions (Horton et al. 2014) and half were included by a GWAS of glucosinolates (Chan et al. 2010). Eight replicates of each of the 199 accessions were first prepared in a laboratory and then transferred to the outdoor garden at the Center for Ecological Research, Kyoto University, Japan (Otsu, Japan: 35°06′N, 134°56′E, alt. ca. 200 m: Fig. S1a). Seeds were sown on Jiffy-seven pots (33-mm diameter) and stratified at a temperature of 4 °C for a week. Seedlings were cultivated for 1.5 months under a short-day condition (8 h light: 16 h dark, 20 °C). Plants were then separately potted in plastic pots (6 cm in diameter) filled with mixed soil of agricultural compost (Metro-mix 350, SunGro Co., USA) and perlite at a 3:1 ratio. Potted plants were set in plastic trays (10 × 40 cells) in a checkered pattern (Fig. S1b). In the field setting, a set of 199 accessions and an additional Col-0 accession were randomly assigned to each block without replacement (Fig. S1c). Eight replicates of these blocks were set >2 m apart from each other (Fig. S1c). Potted plants were exposed to the field environment for 3 weeks in June 2017. At the end of the experiment, the percentage of foliage eaten was scored as: 0 for no visible damage, 1 for ≤10%, 2 for >10% and ≤25%, 3 for >25% and ≤50%, 4 for >50% and ≤75%, and 5 for >75%. All plants were scored by a single person to avoid observer bias. The most predominant herbivore in this field trial was the diamond back moth (Plutella xylostella), followed by the small white butterfly (Pieris rapae). We also recorded the initial plant size and the presence of inflorescence to incorporate them as covariates. Initial plant size was evaluated by the length of the largest rosette leaf (mm) at the beginning of the field experiment and the presence of inflorescence was recorded 2 weeks after transplanting.

We estimated the variance components and performed the association tests for the leaf damage score with the neighbor covariate at s = 1 and 2. These two scales corresponded to L = 4 (the nearest four neighbors) and L = 12 (up to the second nearest neighbors), respectively, in the Arabidopsis dataset. The variation partitioning and association tests were performed using the gaston package, as mentioned above. To determine the significance of the variance component parameters, we compared the likelihood between mixed models with one or two random effects. For the genotype data, we used an imputed SNP matrix of the 2029 accessions studied by the RegMap (Horton et al. 2012) and 1001 Genomes project (Alonso-Blanco et al. 2016). Missing genotypes were imputed using BEAGLE (Browning and Browning 2009), as described by Togninalli et al. (2018) and updated on the AraGWAS Catalog (https://aragwas.1001genomes.org). Of the 10,709,466 SNPs from the full imputed matrix, we used 1,242,128 SNPs with MAF at >0.05 and LD of adjacent SNPs at r² < 0.8. We considered the initial plant size, presence of inflorescence, experimental blocks, and the edge or center within a block as fixed covariates; these factors explained 12.5% of the leaf damage variation (1.2% by initial plant size, Wald test, Z = 3.53, p value < 0.001; 2.4% by the presence of inflorescence, Z = −5.69, p value < 10⁻⁸; 8.3% by the experimental blocks, likelihood ratio test, χ² = 152.8, df = 7, p value < 10⁻²⁸; 0.5% by the edge or center, Z = 3.11, p value = 0.002). After the association mapping, we searched candidate genes within ~10 kb around the target SNPs, based on the Araport11 gene model with the latest annotation of The Arabidopsis Information Resource (TAIR) (accessed on 7 September 2019). Gene-set enrichment analysis was performed using the Gowinda algorithm that enables unbiased analysis of the GWAS results (Kofler and Schlotterer 2012). We tested the SNPs with the top 0.1% −log₁₀(p value) scores, with the option “–gene-definition undownstream10000,” “–min-genes 20,” and “–mode gene.” The GO.db package (Carlson 2018) and the latest TAIR AGI code annotation were used to build input files for gene ontologies (GOs). The R source codes, accession list, and phenotype data are available at the GitHub repository (https://github.com/naganolab/NeighborGWAS).

R package, “rNeighborGWAS”

To increase the availability of the new method, we have developed the neighbor GWAS into an R package, which is referred to as “rNeighborGWAS”. In addition to the genotype and phenotype data, the package requires a spatial map indicating the positions of individuals across a space. In this package, we generalized the discrete space example into a continuous two-dimensional space, allowing it to handle any spatial distribution along the x– and y-axes. Based on the three input files, the rNeighborGWAS package estimates the effective range of neighbor effects by calculating partial PVE_nei and performs association mapping of the neighbor effects using the linear mixed models described earlier. Details and usage are described in the help files and vignette of the rNeighborGWAS package available via CRAN at https://cran.r-project.org/package=rNeighborGWAS.

To assess its implementation, we performed standard GWAS using GEMMA version 0.98 (Zhou and Stephens 2012) and the rNeighborGWAS. The test phenotype data were the leaf damage scores for the 199 accessions described above or flowering time under long-day conditions for 1057 accessions (“FT16” phenotype collected by Atwell et al. 2010 and Alonso-Blanco et al. 2016). The flowering time phenotype was downloaded from the AraPheno database (https://arapheno.1001genomes.org/: Seren et al. 2017). The full imputed genotype data were compiled for 1057 accessions, whose genotypes and flowering time phenotype were both available. The cut-off value of the MAF was set at 5%, yielding 1,814,755 SNPs for the 1057 accessions. The same kinship matrix defined by K₁ above was prepared as an input file. We calculated p values using likelihood ratio tests in the GEMMA program, because the rNeighborGWAS adopted likelihood ratio tests.

Source: Ecology - nature.com