Machine-learning algorithms for forecast-informed reservoir operation (FIRO) to reduce flood damages

This study applies the SVM, ANN, RT, and GP, for forecasting monthly reservoirs inflow with 1- and 2-month time lags. The historical data for inflow to the Dez, Karkheh, and Gotvand reservoirs were collected and used to build the ML algorithms. The inputs to the algorithms for the Dez, Karkheh, and Gotvand reservoirs are the monthly inflows for 1965–2019, 1957–2019, and 1961–2019, respectively. Four projections were designed for the 1-month time lag and the 2-month time lag patterns based on the input and output months, as depicted in Fig. 1. Figure 2 displays the flowchart of this paper’s methodology.

Figure 1

Schematic of projections of 1-month and 2-month time-lag patterns.

Full size image

Figure 2

Flowchart of this study’s methodology.

Full size image

Support vector machine

Support Vector Machine was introduced by Vapnik et al.⁴³. SVM performs classification and regression based on statistical learning theory⁴⁴. The regression form of SVM is named support vector regression (SVR). Vapnik et al.⁴⁵ defined two functions for SVR design. The first function is the error function. (Eq. (1), see Fig. 3). The second function is a linear function that calculates output values for input, weight, and deviation values (Eq. 2):

$$ left| {y – fleft( x right)} right| = left{ {begin{array}{ll} 0 & {if;left| {y – fleft( x right)} right| le varepsilon } {left| {y – fleft( x right)} right| – varepsilon = xi } & {otherwise} end{array} } right. $$

(1)

$$ fleft( x right) = W^{T} x + b $$

(2)

Figure 3

Illustration of the error function of SVR.

Full size image

where (y), (f(x)), (varepsilon), (xi), (W), (b), (T) denote respectively the observational value, the output value calculated by SVR, a function sensitivity value, a model penalty, the weight applied to the variable (x), the deviation of (W^{T} x) from the (y), and the vector/matrix transpose operator.

It is seen in Fig. 3 that the first function (Eq. 1) does not apply a penalty to the points where the difference between the observed value and the calculated value falls within the range of (( – varepsilon , + varepsilon )). Otherwise, a penalty (xi) is applied. SVR solves an optimization problem that minimizes the forecast error (Eq. 3) to improve the model’s forecast accuracy. Equations (4) and (5) represent the constraints of the optimization problem.

$$ minimizefrac{1}{2}left| W right|^{2} + Csumlimits_{{i = 1}}^{m} {left( {xi _{i}^{ – } + xi _{i}^{ + } } right)} $$

(3)

Subject to:

$$ (W^{T} x + b) – y_{i} < varepsilon + xi_{i}^{ + } ;;;i = 1, , 2, ldots , , m $$

(4)

$$ y_{i} – left( {W^{T} x + b} right) le varepsilon + xi_{i}^{ – } ;;;i = 1, , 2, ldots , , m $$

(5)

where (C), m, (xi_{i}^{ – }), (xi_{i}^{ + }), (y_{i}), and || || denote respectively the penalty coefficient, the number of input data to the model in the training phase, the penalty for the lower bound (( – varepsilon , + varepsilon )), the penalty for the upper bound (( – varepsilon , + varepsilon )), the i-th observational value, and vectorial magnitude. The values of W and b are calculated by solving the optimization problem embodied by Eqs. (3)–(5) with the Lagrange method, and they are substituted in Eq. (2) to calculate the SVR output. SVR is capable of modeling nonlinear data, in which case it relies on transfer functions to transform the data to such that linear functions can be fitted to the data. Reservoirs inflow is forecasted with SVR was performed with the Tanagra software. The transfer function selected and used in this study is the Radial Basis Function (RBF), which provided better results than other transfer functions. The weight vector W is calculated using the Soft Margin method⁴⁶, and the optimal values of the parameters (xi_{i}^{ – } , + xi_{i}^{ + }) and C were herein estimated by trial and error.

Regression tree (RT)

RT involves a clustering tree with post-pruning processing (CTP). The clustering tree algorithm has been reported in various articles as the forecasting clustering tree⁴⁷ and the monothetic clustering tree⁴⁸. The clustering tree algorithm is based on the top-down induction algorithm of decision trees⁴⁹; This algorithm takes a set of training data as input and forms a new internal node, provided the best acceptable test can be placed in a node. The algorithm selects the best test scores based on their lower variance. The smaller the variance, the greater the homogeneity of the cluster and the greater the forecast accuracy. If none of the tests significantly reduces the variance the algorithm generates a leaf and tags it as being representative of data^47,48.

The CTP algorithm is similar to the clustering tree algorithm, except that its post-pruning process is performed with a pruning set to create the right size of the tree⁵⁰.

RT used in this study is programmed in the MATLAB software. The minimum leaf size, the minimum node size for branching, the maximum tree depth, and the maximum number of classification ranges are set by trial and error in this paper’s application.

Genetic programming (GP)

GP, developed by Cramer⁵¹ and Koza⁵², is a type of evolutionary algorithm that has been used effectively in water management to carry out single- and multi-objective optimization⁵³. GP finds functional relations between input and output data by combining operators and mathematical functions relying on structured tree searches⁴⁴. GP starts the searching process by generating a random set of trees in the first iteration. The tree’s length creates a function called the depth of the tree which the greater the depth of the tree, the more accurate the GP functional relation is⁵⁴. In a tree structure, all the variables and operators are assumed to be the terminal and function sets, respectively. Figure 4 shows mathematical relational functions generated by GP. Genetic programming consists of the following steps:

Select the terminal sets: these are the problem-independent variables and the system state variables.
Select a set of functions: these include arithmetic operators (÷ , ×, −, +), Boolean functions (such as “or” “and”), mathematical functions (such as sin and cos), and argumentative expressions (such as if–then-else), and other required statements based on problem objectives.
Algorithmic accuracy measurement index: it determines to what extent the algorithm is performing correctly.
Control components: these are numerical components, and qualitative variables are used to control the algorithm’s execution.
Stopping criterion: which determines when the execution of the algorithm is terminated.

Figure 4

Example of mathematical relations produced by GP based on a tree representation for the function:(fleft( {X_{1} , X_{2} ,X_{3} } right) = left( {5 X_{1} /left( {X_{2} X_{3} } right)} right)^{2}).

Full size image

The Genexprotools software was implemented in this study to program GP. The GP parameters, operators, and linking functions were chosen based on the lowest RMSE in this study. The GP model’s parameters and operators applied in this study are listed in Table 1.

Table 1 Operators and range of parameters used in GP.

Full size table

Artificial Neural Network (ANN)

ANN, developed by McCollock and Walterpits⁵⁵, is an artificial intelligence-based computational method that features an information processing system that employs interconnected data structures to emulate information processing by the human brain⁵⁶. A neural network does not require precise mathematical algorithms and, like humans, can learn through input/output analysis relying on explicit instructions⁵⁷. A simple neural network contains one input layer, one hidden layer, and one output layer. Deep-learning networks have multiple hidden layers⁵⁸. ANN introduces new inputs to forecast the corresponding output with a specific algorithm after training the functional relations between inputs and outputs.

This study applies the Multi-Layer Perceptron (MLP). A three-layer feed-forward ANN that features a processing element, an activation function, and a threshold function, as shown in Fig. 5. In MLP, the weighted sum of the inputs and bias term is passed to activation level through a transfer function to create the one output.

Figure 5

The general structure of a three-layer feed forward ANN and processing architecture.

Full size image

The output is calculated with a nonlinear function as follows:

$$ Y = fleft( {mathop sum limits_{i = 1}^{n} W_{i} X_{i} + b} right) $$

(6)

where (W_{i}), (X_{i}), (b), (f), and (Y) denote the i-th weight factor, the i-th input vector, the bias, the conversion function, and the output, respectively.

The ANN was coded in MATLAB. The number of epochs, the optimal number of hidden layers, and the number of neurons of the hidden layers were found through a trial-and-error procedure. The model output sensitivity was assessed with various algorithms; however, the best forecasting skill was achieved with the Levenberg–Marquardt (LM) algorithm⁵⁹, and the weight vector W is calculated using the Random Search method⁶⁰. Furthermore, the Tangent Sigmoid and linear transfer function were chosen by trial and error and used in the hidden and output layers, respectively.

70% of the total data were randomly selected and used for training SVM, ANN, RT, and GP. The remaining 30% of the data were applied for testing the forecasting algorithms.

Performance-evaluation indices

The forecasting skill of the ML algorithms (SVM, ANN, RT, and GP) was evaluated with the Correlation Coefficient (R), the Nash–Sutcliffe Efficiency (NSE), the Root Mean Square Error (RMSE), and the Mean Absolute Error (MAE) in the training and testing phases. The closer the R and NSE values are to 1, and the closer the RMSE and MAE values are to 0, the better the performance of the algorithms²⁰. Equations (7)–(10) describe the performance indices:

$$ NSE = 1 – frac{{mathop sum nolimits_{i = 1}^{n} left( {Q_{fore,;i} – Q_{obs,i} } right)^{2} }}{{mathop sum nolimits_{i = 1}^{n} left( {Q_{obs,i} – Q_{mean ; obs} } right)^{2} }} $$

(7)

$$ R = frac{{mathop sum nolimits_{i = 1}^{n} left( {Q_{fore,i} – Q_{mean ; fore} } right)left( {Q_{obs,i} – Q_{mean ; obs} } right)}}{{sqrt {mathop sum nolimits_{i = 1}^{n} left( {Q_{fore,i} – Q_{mean ; fore} } right)^{2} } sqrt {mathop sum nolimits_{i = 1}^{n} left( {Q_{obs,i} – Q_{mean ; obs} } right)^{2} } }} $$

(8)

$$ MAE = frac{1}{n}mathop sum limits_{i = 1}^{n} left| {Q_{fore,i} – Q_{obs,i} } right| $$

(9)

$$ RMSE = sqrt {frac{{mathop sum nolimits_{i = 1}^{n} left( {Q_{fore,i} – Q_{obs,i} } right)^{2} }}{n}} $$

(10)

in which ( Q_{fore,i}), (Q_{obs,i}), (Q_{mean ; fore}), (Q_{mean ; obs}), (i), and (n) denote the forecasted inflow, observed inflow, mean forecasted inflow, mean observed inflow, time step, and the total number of time steps during training and testing phases, respectively.

Ethics approval

All authors complied with the ethical standards.

Consent to participate

All authors consent to participate.

Consent for publish

All authors consent to publish.

Source: Ecology - nature.com

Machine-learning algorithms for forecast-informed reservoir operation (FIRO) to reduce flood damages

Support vector machine

Regression tree (RT)

Genetic programming (GP)

Artificial Neural Network (ANN)

Performance-evaluation indices

Ethics approval

Consent to participate

Consent for publish

Phenological mismatches between above- and belowground plant responses to climate warming

Staphylococcus aureus isolates from Eurasian Beavers (Castor fiber) carry a novel phage-borne bicomponent leukocidin related to the Panton-Valentine leukocidin

ITALIAN LANGUAGE

ENGLISH LANGUAGE