Study area
The study area is located in Lintong District, Xi’an City, Shaanxi Province, China (34° 21′ 59.94″, 109° 12′ 51.012″) (Meteorologists, 2020b). The study area is located in northwestern China (Fig. 1), which is a Warm temperate semi-humid continental climate with distinct cold, warm, dry and wet seasons. Winter is cold, windy, foggy, and with little rain or snow. Spring is warm, dry, windy, and variable. The summer is hot and rainy, with prominent droughts and thunderstorms, and high wind. Autumn is cool, the temperature drops rapidly and autumn showers are obvious. The annual average temperature is 13.0–13.7 °C, the coldest January average temperature is −1.2–0 °C, the hottest July average temperature is 26.3–26.6 °C, the annual extreme minimum temperature is −21.2 °C, Lantian December 28, 1991, the annual extreme maximum temperature is 43.4 °C, Chang’an June 19, 1966. Annual precipitation is 522.4–719.5 mm, increasing from north to south. July and September are the two obvious peak precipitation months. The annual sunshine hours range from 1646.1 to 2114.9 h. The dominant wind direction varies from place to place, with the northeast wind in Xi’an, west wind in Zhouzhi and Huxian, east-northeast wind in Gaoling and Lintong, southeast wind in Chang’an, and northwest wind in Lantian. Meteorological disasters include drought, continuous rain, heavy rain, flooding, urban flooding, hail, gale, dry hot wind, high temperature, lightning, sand and dust, fog, haze, cold wave, and low-temperature freeze.
Wheat (XiNong 805) was planted on September 24, 2019 and matured for harvest on May 28, 2020 (We warrant that we have the right to collect and manage wheat (XiNong 805). In addition, the study is in compliance with relevant institutional, national, and international guidelines.). Among the six strategies in the experiment (Table 1), we focused on strategies 1 and 4, fixed irrigation dates optimization and fixed fertilizer application dates optimization. Based on the custom of the study area, three days of diffuse irrigation were selected for Strategy 1. Three days of fertilization of the urea and three days of irrigation were selected for Strategy 4. The best practice for Strategy 1 was total irrigation of 201 mm for the total season and a total of 7388 kg/ha of wheat was obtained for this simulation, while the best practice for Strategy 4 was total irrigation of 197 mm for the total season and a total fertilizer application of 282 kg/ha for the total season. A total of 7894 kg/ha of wheat was obtained for this simulation.
DSSAT model
DSSAT, one of the most widely used crop growth models, is an integrated computer system developed by the University of Hawaii under the authority of the U.S. Agency for International Development (USAID). It aims to aggregate various crop models and standardize the format of model input and output variables to facilitate the diffusion and application of models7, thereby accelerating the diffusion of agricultural technology and providing decision making and countermeasures for the rational and efficient use of natural resources in developing countries.
The DSSAT 4.5 model integrates all crop models into the simulation pathway-based CSM (Cropping System Model) farming system model, which uses a set of simulated soil moisture, nitrogen, and carbon dynamics codes, while crop growth and development are stimulated through the CERES37,38, CROPGRO39, CROPSIM, and SUBSOR modules. DSSAT is applicable to single sites or same type zones and can be extrapolated to the regional level through Geographic Information System (GIS).
DSSAT–CSM simulates the growth process of crops grown on a uniform land area under prescribed or simulated management40, and the changes in soil water, carbon and nitrogen with under tillage systems. The DSSAT model is a decision support system supported by crop simulation models, which, in addition to data support, provides methods for calculating and solving problems, and provides decision-maker with the results of their decisions. It also provides scientific decisions for farmers to provide different cultivation management measures (e.g., proper fertilization and irrigation for crops) in different climatic years.
Inputs and outputs of the model
The DSSAT model has four main user-editable input files and various output files. The input files include crop management7,41, soil, weather, and cultivar parameter files; the output files include three types: (1) output files, (2) seasonal output files, and (3) diagnostic and management files.
Crop management data: Crop management data provides basic information about crop growth. Detailed and accurate parameter provision is the basis for improving the accuracy of model simulation. Crop management parameters include crop variety, soil type, meteorological name, previous season crop, sowing period, sowing density, sowing depth, irrigation amount and time, fertilizer application amount and time, the initial condition of the soil, pest management, tillage frequency and method, etc. Some of these parameters are not easily available in field experiments and can be obtained from other test sites or from existing documentation. On the other hand, if there are missing values in the model, it will increase the simulation error of the model (this situation is hard to avoid). Therefore, in this study, the parameters were selected based on the principle of being both detailed and easily available.
Soil data Soil data contains various parameters of the soil section plane, including soil color, soil slope, soil capacity, organic carbon, soil nitrogen content, drainage properties, the proportion of clay, particles, and stones in the soil. Similar to the governing documents, the more complete the parameters the smaller the error value of the simulation. The various physical and chemical properties of the soil for this study were obtained from the China Soil Database at the time of the study. The various physical and chemical properties of the soil for this study were obtained from the China Soil Database.
Weather data The DSSAT model uses daily weather data as weather input data for the model. The model requires a minimum of four daily weather data in order to accurately simulate the water cycle in soil plants (Fig. 2). These are:(1) daily solar radiation energy (MJM); (2) daily maximum temperature (°C); (3) daily minimum temperature (°C); and (4) daily precipitation (mm). Weather data were obtained from the China Meteorological Administration. Weather data were obtained from the China Meteorological Administration.
Model calibration Adjusting the cultivar parameter is very important to accurately simulate the local growing environment. In this experiment, we collected field data for 2019 and 2020, and adjusted the parameters in the cultivar parameter files by trial-and-error method to make the simulation process more closely match the actual local crop growth process.
Multi-objective optimization algorithm
Multi-objective optimization techniques have been successfully applied in many real-world problems. In general42,43,44, MOPs produce a set of optimal solutions that together represent a trade-off between conflicting objectives, and such solutions are called Pareto optimal solutions (PS). These PS cannot make any solution better without compromising the other solutions. Therefore, when solving multi-objective problems, more PS are needed to find. Some MOPs aim to find all PS or at least a representative subset of them.
A multi-objective optimization problem can be stated as follows:
$$mathrm{min }Fleft(xright)={({f}_{1}left(xright),dots ,{f}_{k}(x))}^{T}$$
(1)
$$mathrm{subject;to};xin Omega$$
(2)
where (Omega) is the decision space,(F:Omega to {R}^{k}) consists of (k) real-value objective functions and ({R}^{k}) is called the objective space. The attainable objective set is defined as the set ({F(x)in Omega }).
NSGA-II optimizer
We use non-dominated sorting genetic algorithm (NSGA-II) for Multiobjective optimization in R language. The NSGA-II algorithm is a classical multi-objective evolutionary algorithm with remarkable results in solving 2-objective and 3-objective problems45. It maintains the convergence speed and diversity of solutions by fast non-dominated sorting and crowding distance, selects the next population by elite selection strategy.
Objective function
The multi-objective optimization problem varies one or more variables to maximize or minimize two or more objective problems. In the case of crop production, where decision-makers change irrigation and fertilizer application to maximize benefits, this study focuses on when to apply irrigation or fertilizer on the field and how much irrigation or nitrogen fertilizer to apply.
There are many crop models available that can be used as optimization objective functions, and DSSAT is definitely the best choice because it is easy to use and well-proven36. The user runs the model by entering defined soil, weather, variety, and crop management files, which are fed into the core of the model, the Crop Simulation Model (CSM). The model simulates the growth, development, and yield of crops grown on a uniform land area under management, as well as changes in soil water, carbon, and nitrogen over time under cropping systems. The CSM itself is a highly modular model system consisting of a number of sub-modules. Researchers have validated the output of these sub-modules as a whole under various crops, climate, and soil conditions.
Using DSSAT, it is easy to design a set of objective functions and optimize them, as in our case.
$$mathrm{Max}:Y=mathrm{DSSAT}left.left( {i}_{a0},dots ,{i}_{mathrm{aj}},{f}_{mathrm{a}0},dots ,{f}_{mathrm{ad}},{D}_{i}right.right)$$
(3)
$$mathrm{Min}:I=sum_{n=0}^{j}{i}_{an}$$
(4)
$$mathrm{Min}:F=sum_{m=0}^{d}{f}_{am}$$
(5)
where (Y) is yield,(I) is the total amount of irrigation, (F) is the total amount of nitrogen application, ({i}_{an}) is the amount of irrigation at one time, ({f}_{am}) is the amount of nitrogen applied at one time, (j) is a number of applications of irrigation, and (d) is a number of nitrogen applications. ({D}_{i}) is a random date combination of irrigation time and fertilizer application time.
All other variables (e.g., climate, soil, location, crop variety) are kept constant during the optimization process. The irrigation unit is mm and the nitrogen application unit is kg/ha, the irrigation and nitrogen application amounts are positive integers by default (integer arithmetic reduces the program running time).
Data-driven evolutionary algorithms
In general, the key to DDEAs is to reduce the required FEs and assist evolution through data. The data is generally utilized through surrogate model. The use of suitable surrogate model can be used in place of real FEs46. Thus, DDEAs have more advantages over EAs in solving expensive problems.
In terms of algorithmic framework, DDEAs contain two parts: surrogate model management (SMM) and evolutionary optimization part (EOP)47,48. The SMM part is used in order to obtain better approximations, while EOPs will use surrogate models in EAs to assist evolution. DDEAs can be divided into two types: online DDEAs and offline DDEAs23. Online DDEAs can be evaluated by real FEs with more new data. This new information can provide SMM with more information and construct a more accurate surrogate model49. Since DSSAT can obtain new data through FEs during the EOP process, the method used in this paper is online DDEAs. In contrast, offline DDEAs can only drive evolution through historical data.
Radial Basis Function (RBF) network is a single hidden layer feedforward neural network that uses a radial basis function as the activation function for the hidden layer neurons, while the output layer is a linear combination of the outputs of the hidden layer neurons. RBF was used to approximate each objective function. According to the investigation of multi-objective optimization problems with high computational cost, radial basis functions are often used as the surrogate model, mainly because RBF networks can approximate arbitrary nonlinear functions with arbitrary accuracy and have global approximation capability, which fundamentally solves the local optimum problem of BP networks, and the topology is compact, the structural parameters can be learned separately, and the convergence speed is fast.
In this paper, a new data-driven approach is proposed and place it in the lower-level optimization of the framework. RBF is utilized as the surrogate model and NSGA-II as the optimizer. Details are described in Algorithm 1.
Data-driven method details
In step 1, the initial parent population is generated by randomly selecting points and the size is (N). In step 2, we run DSSAT (N) times to determine the objective function values of the (N) initial population solutions. Next, the algorithm then loops through the generations. At the beginning of each loop, surrogate models, which one objective train one surrogate and denoted by ({s}_{t}^{left({f}_{1}right)}) , were trained by the already obtained objective function values (step 3.1). The trial offspring ({P}_{i}^{^{prime}}left(tright)={ {x}_{1}^{^{prime}}left(tright),dots ,{x}_{u}^{^{prime}}left(tright)}) are generated by SBX and PM (step 3.2), then the trained surrogate model is used to predict the objective function values of trial offspring (step 3.3). The predicted objective function values are sorting by Pareto non-dominated and crowding distance (step 3.4), then (r) offspring (Q_{i} left( t right) = left{ {x^{primeprime}_{1} left( t right), ldots ,x^{primeprime}_{r} left( t right)} right}) are selected from the trial offspring (step 3.5).The offspring are evaluated by the DSSAT (step 3.6), and after combining the parent population and offspring population (step 3.7), the new parent population are selected by Pareto non-dominated and crowding distance sorting (step 3.8).
Maximum extension distance
MED guides a small number of individuals to approximate the entire PF. MED is defined as follow:
$$mathrm{MED}left({P}_{t}^{left(qright)}right)=mathrm{ND}left({P}_{t}^{left(qright)}right)times mathrm{TD}left({P}_{t}^{left(qright)}right)$$
(6)
where
$$mathrm{ND}left({P}_{t}^{left(qright)}right)=underset{z,qne z}{mathrm{min}}sum_{m=1}^{M}left|{f}_{m}^{left(qright)}-{f}_{m}^{left(zright)}right|$$
$$mathrm{TD}left({P}_{t}^{left(qright)}right)=sum_{z=1}^{P}sum_{m=1}^{M}left|{f}_{m}^{left(qright)}-{f}_{m}^{left(zright)}right|$$
({P}_{t}^{left(qright)}) is the qth individual in population Pt at the tth generation. (mathrm{ND}left({P}_{t}^{left(qright)}right)) calculates the minimum distance between ({P}_{t}^{left(qright)}) and ({P}_{t}^{left(zright)}). The larger (mathrm{ND}left({P}_{t}^{left(qright)}right)) value means a better individual diversity. (mathrm{TD}left({P}_{t}^{left(qright)}right)) calculates the summation of distance between ({P}_{t}^{left(qright)}) and ({P}_{t}^{left(zright)}). The larger (mathrm{TD}left({P}_{t}^{left(qright)}right)) value means that the solution ({P}_{t}^{left(qright)}) has moved away from other individuals. A larger MED value means that an individual extends the overall boundary and an individual acquires better diversity.
Modeling process
To maximize crop yield and optimize the use efficiency of water and fertilizer in a given environment, BSBOP framework is proposed. Crop growth is simulated by DSSAT, the data-driven approach reduces the runtime of the overall framework while finding optimal management strategies. The overall framework includes four main parts: upper-level screening, upper-level optimization, lower-level optimization and lower-level screening (Fig. 3).
Upper-level screening The weather file in DSSAT was loaded by R language. The data are pre-processed with precipitation and solar radiation information to narrow down the date range for irrigation and fertilizer application. In other words, the date ranges for selecting irrigation and fertilization are restricted by the ULS.
Upper-level optimization Generating random combinations of dates by the Latin hypercube sampling method (LHS). The upper-level screening starts with referencing the two variables (number of irrigation and nutrient application events). LHS uses these variables to generate a series of uniformly distributed random day combinations. For example, date combinations generated by the LHS could be May 15, July 18 and August 1 for irrigation and May 30, June 30 and July 18 for nutrient application. From the series of uniformly distributed random day combinations, one will be selected and incorporated into the lower-level optimization.
Lower-level optimization The agricultural management strategy is optimized by the online data-driven approach proposed in Algorithm 1. Assuming three irrigation and three nitrogen application events are given, these events will be incorporated into the LOP, which consists of the RBF and NSGA-II. The population size of this paper is 105. The number of iterations varies according to the different strategies, and the objective function values are calculated by DSSAT. The main idea of applying Evolutionary multi-objective algorithms(EMO) and RBF to DSSAT is to generate a large number of trial offspring by traditional Simulated Binary Crossover (SBX) and Polynomial Mutation (PM), and then evaluate them using the trained surrogate model50. The objective values of the evaluation were then ranked by Pareto non-dominated and crowding distance, and the top 105 individuals were selected from a large number of trial offspring, after which a small number of individuals out of 105 were selected by Maximum Extension Distance (MED) for real function evaluation, and then combine the parents and offspring to select the next generation of parents by Pareto non-dominated and crowding distance sorting. Furthermore, in the numerical experiments, to ensure the superiority of the algorithm and reduce the experimental complexity, we use a relatively simple radial basis function (RBF) surrogate. The NSGA-II algorithm can be used for both bi-objective and tri-objective problems, so it can optimize the system by starting with the most critical objective and then adding additional objectives. For each solution in the population, the objective functions (1: maximize yield, 2: minimize irrigation application, 3: minimize nitrogen fertilizer application) will be evaluated by invoking the DSSAT model for these dates and the amount of fertilizer irrigation applied. Populations will be tested against the termination criteria (maximum number of iterations allowed). If the termination criteria are not satisfied, the population evolves and is re-evaluated again. The process is repeated until the termination criterion is satisfied and then the local Pareto front of the selected day combination is stored. After each iteration of the UOP, the new local Pareto is combined with the global Pareto frontier. In the next step, if there are any remaining day combinations, the above process is repeated for each new day combination until all generated random day combinations have been processed.
Lower-level screening Firstly, the K-means method is used to screen the global Pareto solutions with higher yield. Then, secondary screening takes economic efficiency as the objective and optimizes it by Differential Evolution (DE) algorithm. Finally, the locally appropriate solution is intelligently selected.
Optimization strategies and configuration
Due to the complexity of the problem, a BSBOP framework was proposed in this study. Due to a large number of variables behind irrigation and fertilization, traversal date for optimization appears to be particularly difficult and time-consuming, assuming that only irrigation is optimized for 120 days of the growth cycle and the decision-maker has 0-150 mm of water per day, then there are ({151}^{120}) different solutions. If both irrigation and fertilization are considered, then there are ({151}^{120}cdot {151}^{120}) different solutions. Therefore, this study tries to reduce the number of variables while minimizing the running time of the algorithm.
Here we hypothesize that more precision and effective agricultural management can be implemented through the proposed framework. Not only can crop yields be increased, but also irrigation application and fertilizer application can be reduced, while the solutions obtained have important guidance for decision-makers: such as the selection of irrigation and fertilizer application dates during the growing season of the crop, the selection of irrigation and fertilizer application amounts, and the relationship between economic benefits and application costs. To test this hypothesis, different optimization strategies were developed and evaluated (Table 1). Each optimization strategy was aimed at maximizing yield while minimizing resource wastage.
The various strategies are listed below (Table 1). Strategy 1—Fixed irrigation dates: Keeping the number of irrigation days and all parameters constant, only the amount of irrigation on each date is changed, trying to reduce the amount of irrigation as much as possible, make it easy to compare the results with best practices. Strategy 2—Optimal irrigation dates: Traverse through the irrigation dates to optimize irrigation, and try to find a better combination of irrigation dates (optimal dates) and better amount of irrigation over the wheat growth cycle. Strategy 3—Optimal irrigation dates based on surrogate model: RBF is added to Strategy 2, which makes it possible to reduce lots of time. Strategy 4—Fixed fertilizer application date: Using the optimal irrigation date found in Strategy 2 while keeping the number of days of fertilization and all other parameters constant, irrigation and fertilization are optimized in an attempt to minimize the amount of irrigation and fertilizer applied. Strategy 5—Optimal fertilizer application date: while ensuring the optimal irrigation date, traverse the fertilizer application date for optimization, trying to find out the potential yield of the crop. Strategy 6—Optimal fertilizer application date based on surrogate model: RBF is introduced based on Strategy 5. The time consumption was reduced.
The stopping criterion in this study is when the optimization results converge visually. The algorithm population size was set to 105, and the generation of offspring used traditional polynomial Mutation. The number of hidden layers of the surrogate model is equal to the dimension of the decision variables, the learning rate is 0.01, the Gaussian kernel function is chosen as the activation function of the hidden layer in the RBF network. The neurons centers are generated by the K-means clustering method. The width parameter of the function is generated by calculating the variance of each cluster. The optimization weight parameters are selected by the recursive least square method. This is because the use of the least square method is likely to encounter situations where matrix inversion is troublesome. Therefore, recursive least squares (RLS) is often used to give a recursive form of the matrix in which the inverse needs to be found, making it computationally easier.
Source: Ecology - nature.com