An improved deep learning model for predicting daily PM2.5 concentration

Study area

The Beijing–Tianjin–Hebei (BTH) region of China is one of the most economical and active areas in China, containing Beijing, Tianjin, and 11 cities of Hebei Province. According to CSY (2018), the regional GDP of BTH in 2017 contributed 9.77% to the total GDP of China, and its population accounted for 8.09%. However, serious air pollution occurred, and its damage to public health cannot be ignored. According to the Ministry of Environmental Protection (MEP) (2018), among the top 20 most polluted cities, 9 cities belonged to Hebei Province, and Tianjin and Beijing ranked 15th and 19th, respectively. Thus, this study adopted the BTH region as the study area for constructing the PM2.5 concentration forecasting model. Figure 1 shows the Locations of air quality stations in the BTH region.

Figure 1

Locations of air quality stations in BTH region. The color represents the rank of the average daily PM2.5 concentration during 01 Jan., 2015 to 31 Dec., 2017 as described in the bottom of the Figure. (This Figure is drawn by using Matlab software).

Full size image

Data

The research data and their description used in this study are listed in Table 1. We collected data from the period of 01 Jan. 2015 to 31 Dec. 2017. The variables included the PM2.5 concentration, meteorological data, the latitude and longitude of the monitoring station, and time stamp data, i.e., the month and week of the observation.

Table 1 List of the research data.

Full size table

1.
PM2.5 concentration data: There are 110 air pollution monitoring stations distributed in the BTH region, as shown in Fig. 1. The hourly concentration of PM2.5, PM10, CO, NO2, O3, and SO2 are recorded and published by the Beijing Municipal Environmental Monitoring Center and http://pm25.in/. Station parameters are also recorded, including the latitude and longitude of the station, the month and the week of the observation. Here, we used the mean of the PM2.5 concentration from 0:00 to 23:00 to represent the daily PM2.5 concentration, and the rank of the average daily PM2.5 concentration from 2015 to 2017 is indicated by the color in Fig. 1. It can be found that air pollution is more serious in the south area.
2.
Meteorological data: The meteorological variables considered in this study included temperature, wind speed, wind direction, mean sea level pressure, dew point temperature and total column water vapor. Meteorological data were obtained from CAMS Near-real-time dataset of ECMWF with around 10 km spatial resolution. Particularly, this dataset only provides (U) and V component of wind (i.e., zonal and meridional wind speed), thus, the wind speed and wind direction were obtained by the Eqs. (1) and (2).
$$wind ; speed=sqrt{{u}^{2}+{v}^{2}}$$
(1)
$$wind ; direction=frac{pi }{2}-{mathit{tan}}^{-1}frac{v}{u}$$
(2)

where (u) and v refer to (U) and (V) components of wind, respectively. In addition, to further improve the spatial resolution of temperature, the 1 km spatial resolution Modis temperature product (MOD11A1) were also collected.

Data pre-processing

Due to critical failure or temporary power cutoff, missing values for a long or short periods happened in air pollution monitoring stations²⁴. Similarly, some missing values occur in the ECMWF dataset. We got rid of data of 20 stations which have over 10% missing values in PM2.5 concentration data or ECMWF data. The missing PM2.5 concentration values of remaining stations were interpolated by inverse distance weight method.

Next, time stamp data, including month and day, were one-hot encoded; PM2.5 concentration and meteorological data were centralized and standardized in accordance with Eq. (3):

$${x}_{i}^{*}=frac{{x}_{i}-stackrel{-}{x}}{sigma }$$

(3)

where ({x}_{i}) and ({x}_{i}^{*}) represent the original and transformed observation of a factor (x), respectively; (stackrel{-}{x}) and (upsigma) are the mean and standard deviation of all observations, respectively.

Finally, the temperature data collected from MOD11A1 and ECMWF dataset were merged together to enhance its reliability. Since the spatial resolution of MOD11A1 data is higher, we used MOD11A1 data as basis, and filled its missing values by ECMWF data. Linear regression model was built between the centralized and standardized ECMWF temperature data ((cs_E)) and MOD11A1 data ((cs_M)) as the Eq. (4) shows,

$$c{s}_{M}=0.953842*c{s}_{E}-0.074635$$

(4)

The ({R}^{2}) value of this model was 0.91, indicating a great consistency between them. Therefore, we filled the missing values of (cs_M) by the regression results of corresponding (cs_E) values.

Methods

Figure 2 shows the overall framework of the proposed WLSTME model. As shown in Fig. 2, the model is a hybrid model that integrates three neural networks, including:

(1)
An MLP network to generate the weighted PM2.5 by combing wind speed and direction, geographical distance with historical PM2.5 concentration.
(2)
An LSTM network to address spatiotemporal dependency simultaneously and extract spatiotemporal features.
(3)
Another MLP network to optimize the prediction by integrating the spatiotemporal features and weather forecast data.

Figure 2

(a) Overall framework of WLSTME. The red, blue and green flags stand for the central site, K nearest sites, and other sites, respectively. r represents the time lag; (b) Structure of MLP layer; (c) Structure of the memory cell of LSTM layer. x_t and h_t are the inputs.

Full size image

In detail, the three network works together to form an organic whole to achieve the daily PM2.5 prediction. Firstly, the MLP was used to combine historical wind speed and wind direction, the geographical distance between central sites and neighbor sites with corresponding days’ PM2.5 data of neighbor sites, and generate the weighted PM2.5. Then, the generated weighted PM2.5 of the neighbor sites in the past ten days were merged with the historical PM2.5 data of the central station, and input into the LSTM network to address the spatial and temporal dependence simultaneously and extract spatiotemporal features. Finally, another MLP was used to conduct bias adjustment by integrating the spatiotemporal features produced by LSTM with the central site’s meteorological data and time stamp data.

The input of WLSTME model consists of two parts (blue arrows in Fig. 2): (1) historical air quality (PM2.5), latitude and longitude; historical meteorological data (weather, temperature, pressure, humidity, wind speed, wind direction) of the target site and nearby sites. ; (2) Weather forecast data (weather, wind direction, wind level, up and down temperature) of the target site. The output of WLSTME model is the PM2.5 forecast value of the target site the next day.

MLP for generating the weighted PM2.5

MLP can theoretically approximate any Borel measurable function with arbitrary precision²⁵. We constructed a three-layer MLP spatial correlation processor to generate weighted PM2.5 series data for each neighbor site. Neighbor sites were defined as the K nearest surrounding sites to the central site. Since pollutants are transported among areas based on wind, air pollution of the central sites are spatially correlated with that of neighbor sites. However, the distribution of monitoring stations is not even. Consequently, the distance between neighbor sites and the central site is different for different central sites. For example, the density of stations in the south area is much sparser than Beijing, as Fig. 1 shows. Thus, for central sites in the south area, the selected neighbor sites were more distant, and the spatial correlation was lower than that for sites in Beijing. Based on the above consideration, the geographical distance of the selected neighbor sites should be considered in the model.

The three-layer MLP integrates the distance and wind of neighbor sites with its PM2.5 to generate weighted PM2.5 data for each neighbor site j of central site i. Figure 3 shows the structure of MLP. Given ({PM2.5}_{ jt}) and ({v}_{jt}) represent the PM2.5 concentration and wind speed of the (jth) neighbor at time t, respectively; ({d}_{ij}) represents the distance between the central site (i) and its (mathrm{jth}) neighbor site; ({uptheta }_{ijt}) represents the angle between the wind direction of (mathrm{jth}) neighbor site at time (t) and the edge between (i) and (j). ({H}_{1},dots ,{H}_{n}) are neurons of the hidden layer, and ({WPM2.5}_{jt}) is the weighted PM2.5 concentration, which is calculated by the Eqs. (5) and (6).

Figure 3

The structure of the three-layers MLP model.

Full size image

$${H}_{s}={g(omega }_{1s}^{1}{PM2.5}_{jt}+{omega }_{2s}^{1}{d}_{ij}+{omega }_{3s}^{1}{v}_{jt}+{omega }_{4s}^{1}{theta }_{ijt}$$

(5)

$${WPM2.5}_{ jt}={sum }_{s=1}^{n}{omega }_{s}^{2}{H}_{s}$$

(6)

where (g) is the activation function used for the nonlinear transformation of inputs. (omega) is the weight between the neuron of previous layer and the next layer.

LSTM for extracting spatial–temporal feature

LSTM is a special recurrent neural network, with its recurrent neuron simultaneously captures long and short dependencies in time series data. LSTM has been used in many fields, such as financial market predictions²⁶, epileptic seizures²⁷, and reservoir operation²⁸. All of the LSTM models used in these fields exhibited better performance than many other machine learning methods. The LSTM model used in our model was a two-layer stateful LSTM, which used the state of the current batch of LSTM samples as the initial state of the next batch of samples. It is more suitable for processing long-term time series data than the other models. The structure of the recurrent memory cell of the LSTM model is shown in Fig. 4.

Figure 4

The illustration of two-layers LSTM model. ({x}_{t}) and ({h}_{t}) are the inputs and outputs at time t respectively.

Full size image

The bottom layer of the proposed LSTM FRAME corresponds to the input layer. The middle core layer comprises two LSTM layers, and the output layer follows. Each neuron of the LSTM layer has an architecture similar to that in the right part of Fig. 4. Three key gates, namely, forget gate (({f}_{t})), input gate (({i}_{t})), and output gate (({o}_{t})), of LSTM are designed to control the memory of new information and to forget old information. The values of the three gates are updated with time respectively by Eqs. (7), (8), and (9).

$$f_{t} = sigma left( {W_{f} cdot left[ {h_{t – 1} ,x_{t} } right] + b_{f} } right),;;i_{t} = sigma left( {W_{i} cdot left[ {h_{t – 1} ,x_{t} } right] + b_{i} } right),$$

(7)

$$widetilde{{C_{t} }} = tanhleft( {W_{C} cdot left[ {h_{t – 1} ,x_{t} } right] + b_{C} } right),;;C_{t} = f_{t} cdot C_{t – 1} + i_{t} cdot widetilde{{C_{t} }}$$

(8)

$$o_{t} = { }sigma left( {W_{o} cdot left[ {h_{t – 1} ,x_{t} } right] + b_{o} } right),;;h_{t} = o_{t} cdot tanhleft( {C_{t} } right)$$

(9)

where ({x}_{t}) and ({h}_{t}) are the inputs and outputs of time (t), respectively; (sigma) and hyperbolic tangent are widely used activation functions; (W) and (b) are the weight matrix and bias vector, respectively.

In detail, LSTM is designed to extract spatiotemporal features from the pollution data of central site and neighbor sites to make a prediction. The weighted PM2.5 series data of neighbor sites and PM2.5 concentration observation data of the central site were merged as a 2D matrix, with each column represented the historical PM2.5 concentration of the central site or weighted PM2.5 concentration of a neighbor site. The size was (mathrm{r}times (mathrm{K}+1)), where r was time lag, and K was the number of selected neighbor sites. We placed this matrix into a two-layer LSTM model so that their spatial and temporal dependence and synthetic action could be considered simultaneously. The output of the LSTM model is called pre-prediction.

MLP weather forecasts optimizer

Finally, auxiliary variables were introduced to the WLSTME model to promote prediction accuracy. The auxiliary variables considered in this study included meteorological data (temperature, wind speed, dew point temperature, mean sea level pressure and total column water vapor), time stamp data (day of week and month of the year), and latitude of the central site at time T. We integrated the auxiliary variables with the spatiotemporal features extracted by LSTM and input them into MLP to output the prediction of the next day’s PM2.5 concentration of the central site. The structure of MLP was the same as Fig. 3, however, the input and output were substituted by the spatiotemporal features and PM2.5 concentration prediction, respectively.

Evaluation methods

We predict real-values of PM2.5 concentration for the next day. Three criteria, namely, mean absolute error (MAE), root mean square error (RMSE), and total accuracy index (p), were used in the experiments to evaluate the effectiveness of our model. Their definitions are given in Eqs. (10), (11) and (12):

$$MAE=frac{1}{n}sum_{begin{array}{c} i=1end{array}}^{n}left|{y}_{i}-{y}_{i}^{*}right|$$

(10)

$$RMSE=sqrt{frac{1}{n}sum_{i=1}^{n}{({y}_{i}-{y}_{i}^{*})}^{2}}$$

(11)

$$p=1-frac{sum_{i=1}^{n}left|{y}_{i}-{y}_{i}^{*}right|}{sum_{i=1}^{n}{y}_{i}}$$

(12)

where (n) is the number of samples, ({y}_{i}) is the observation of the (mathrm{i})th sample, and ({y}_{i}^{*}) is the (mathrm{i})th forecasting value. In addition, we employed the spatial anomaly correlation (ACC) and the temporal correlation coefficient (TCC) as the evaluation metrics, which are defined as in the following equations:

$$stackrel{-}{{y}_{i}}=frac{1}{N}{sum }_{j=1}^{N}{y}_{ij}, quad stackrel{-}{{y}_{i}^{*}}=frac{1}{N}{sum }_{j=1}^{N}{y}_{ij}^{*}$$

(13)

$${Delta y}_{ij}={y}_{ij}-stackrel{-}{{y}_{i}}, quad {Delta y}_{ij}^{*}={y}_{ij}^{*}-stackrel{-}{{y}_{i}^{*}}$$

(14)

$$ACC=frac{1}{N}{sum }_{j=1}^{N}frac{{sum }_{i=1}^{M}({Delta y}_{ij}-stackrel{-}{{Delta y}_{j}})({Delta y}_{ij}^{*}-stackrel{-}{{Delta y}_{j}^{*}})}{sqrt{{sum }_{i=1}^{M}{({Delta y}_{ij}-stackrel{-}{{Delta y}_{j}})}^{2}{sum }_{i=1}^{M}{({Delta y}_{ij}^{*}-stackrel{-}{{Delta y}_{j}^{*}})}^{2}}}$$

(15)

$$TCC=frac{1}{M}{sum }_{i=1}^{M}frac{{sum }_{j=1}^{N}({y}_{ij}-stackrel{-}{{y}_{i}})({y}_{ij}^{*}-stackrel{-}{{y}_{i}^{*}})}{sqrt{{sum }_{j=1}^{N}{({y}_{ij}-stackrel{-}{{y}_{i}})}^{2}{sum }_{j=1}^{N}{({y}_{ij}^{*}-stackrel{-}{{y}_{i}^{*}})}^{2}}}$$

(16)

where ({y}_{ij}) and ({y}_{ij}^{*}) represent the observed and predicted value of station (i) in day (j), respectively. (M) and (N) represent the count of stations and days of the test, respectively.

Source: Ecology - nature.com

An improved deep learning model for predicting daily PM2.5 concentration

Study area

Data

Data pre-processing

Methods

MLP for generating the weighted PM2.5

LSTM for extracting spatial–temporal feature

MLP weather forecasts optimizer

Evaluation methods

3 Questions: Hessam AzariJafari on mitigating climate change with reflective pavements

Early evidence of a shift in juvenile fish communities in response to conditions in nursery areas

ITALIAN LANGUAGE

ENGLISH LANGUAGE