Study area and dataset
The study area covers all Estonia located between 57.5(^circ ) N, 21.5(^circ ) E and 59.8(^circ ) N, 28.2(^circ ) E. The study area is relatively flat with no steep slopes and altitudes ranging between 0 and 200m above the sea level. Data about events were collected directly from field books that contained information about the mowing activity’s start and end date and the covered area. Considering the main agricultural areas of the country, we consider 2000 fields in which events are geographically evenly distributed across all Estonia, as shown in Fig. 1. In total, data about 1800 mowing and 200 non-mown events were collected in 2018, based on manual labelling. During manual labelling, the specific mowing days were labelled based on the following: a) information recorded by farmers in field books regarding mowing days, b) domain experts knowledge about the most probable days for mowing based on the climate, weather, and field conditions, c) rapid decrease in the Normalized Difference Vegetation Index (NDVI) and rapid increase in the coherence compared to past measurements. The average field size is 6.0ha, and around 95% of the fields were mown during the year. 90% of the fields are in the range of (0.5-10)ha. The greatest density of the fields is located in Lääne-Viru, Tartu and Jõgeva countries. Grassland parcels vector layer is provided by Estonian Agricultural Registers and Information Board (ARIB)50. The satellite imagery used in the study is from Copernicus program that provides free open Earth observation data to help service providers, public authorities, and international organizations improve European citizens’ quality of life.
Geographic distribution of events used in this study (This map was created by QGIS version 3.16, which can be accessed on https://qgis.org/en/site/).
Sentinel-1 and Sentinel-2 data
For Sentinel-1 data, in total, 400 S1A/BSLCIW products acquired between 1st of May 20017 and 30th of October 2018, were processed. 87 products were from relative orbit number (RON)160, 62 from RON131, 84 from RON87, 93 from RON58, and 60 from RON29. These were organised into S1A/S1B 6-day pairs. Sentinel-2 provides high spatial resolution optical imagery to perform terrestrial observations with global coverage of the Earth’s land surface. Sentinel-2 data is provided by the European Space Agency (ESA) together with a cloud mask, which can filter clouds on the image with moderately good accuracy. 400 Sentinel-2A and -2B L2A products acquired between 1 May 2017 and 30 October 2018 were processed. Each Sentinel-2 image is a maximum of three days off from the closest Sentinel-1 image. Only the NDVI was derived from Sentinel-2. NDVI has been widely used in the classification of grassland24,51 and that is mainly due to its ability in limiting spectral noise. The spatial resolution of the derived Sentinel-2 NDVI feature is 10 m.
Methods
The goal of the analysis is to detect mowing events from Sentinel-1 (S-1) and Sentinel-2 (S-2) data. For this, coherence time series were calculated about every field in the database about the event. Average coherence of a field, imaging geometry parameters, imaging time and average NDVI were stored in a database. The database formation process involved preprocessing many satellite images where average coherence and NDVI value was calculated for every parcel for every available date (constrained by image availability and cloud cover). The overall scheme of the proposed methodology is illustrated in Fig. 2. First, the time-series data from S-1 and S-2 images are preprocessed. Then, the most important features are used in a deep neural network to predict mowing events. The model has a reject region option that enables the model to abstain from the prediction in case of uncertainty, which increases trust in the model.
We used the Sentinel Application Platform (SNAP) toolbox for processing S-1 data. More specifically, we followed the same following pre-processing steps in16: apply orbit file, back-geocoding (using Shuttle Radar Topography Mission (SRTM) data), coherence calculation, deburst, terrain correction, and reprojection to the local projection (EPSG:3301). Lastly, we resampled the data to 4m resolution to preserve the maximum spatial resolution and square-shaped pixels. Because the study areas’ terrain is relatively flat, there are few topographic distortions in the SAR data. Each swath’s coherence was calculated independently. Only pixels totally inside the parcel boundaries (including the average window used for coherence computation) were utilized to calculate results, and any interference beyond the parcel limits was discarded. Pair-wise coherence was calculated with 6-day time step. The data was stored into a database using a forward-looking convention: coherence regarding date X refers to the coherence between S-1 images over the period between date X and X + 6 days. For preprocessing S-2 data, L1C and L2A Sentinel-2 products were obtained through Copernicus Open Access Hub6. Next, a rule-based cloud mask solution was applied52. Finally, the fourth and eighth bands were extracted to compute NDVI values.
Flowchart of the proposed approach to detect mowing events.
Feature extraction from Sentinel-1 data
Coherence is a normalized measure of similarity between two consecutive (same relative orbit) S-1 images. Interferometric 6 day repeat pass coherence in VV polarization (cohvv), and coherence in VH polarization (cohvh) are chosen features as they are shown to be sensitive to changes in vegetation and agricultural events25. The shorter the time interval after the mowing event and the first interferometric acquisition, the higher the coherence value. Generally, up to 24 to 36 days after a mowing event, coherence stays relatively high. Precipitation caused the coherence to drop, which disturbs the detection of a mowing event. The spatial resolution of the S-1 6-day repeat pass interferometric coherence is 70 m. Given two S-1 images (s_{1}) and (s_{2}), coherence is calculated as follows:
$$begin{aligned} wp =frac{|langle s_{1}s*_{2}rangle |}{sqrt{langle s_{1}s*_{1}rangle | langle s_{2}s*_{2}rangle |}} end{aligned}$$
(1)
where (|langle s_{1}s*_{2}rangle |) is the absolute value of the spatial average of the complex conjugate product.
Coherence between two S-1 images (s_1) and (s_2) reaches its maximum value of 1 when both images have the same position and physical characteristics of the scatters. In contrast, the coherence value declines when the position or properties of the scatters change.
Feature extraction from Sentinel-2 data
NDVI is related to the amount of live green vegetation. Generally, NDVI increases and decreases over the season, indicating the natural growth decay of vegetation, while the significant drops in the NDVI indicate an agricultural event such as mowing. NDVI is derived from S2 images and is calculated as follows:
$$begin{aligned} NDVI=frac{band8 – band4}{band8 + band4} end{aligned}$$
(2)
Typical signature of NDVI and coherence in VV and VH polarisation for non mown field during the year.
Field with single mowing event during the year.
NDVI measurement for a field example with a single mowing event during the season.
Figures 3, 4 and 5 show different samples of mown and non mown fields. NDVI measurements are green, cohvh and cohvv are blue and black, respectively. For non mown field, the typical signature of NDVI during the year is shown in Fig. 3. For non mown field, the typical signature of NDVI during the season is a half-oval curve; coherence is not stable but remains at almost the same level without apparent trend changes, as shown in Fig. 3. An example of a field with a single mowing event during the season is shown in Fig. 4. A mowing event is characterized by a rapid increase in both cohvh and cohvv and a sharp decrease in NDVI, as observed at day 150 (See Fig. 4). Forty days later, a similar signature is probably not due to a mowing event but likely caused by drought during summer.
Notably, NDVI measurements are irregular and relatively sparse. Around 75% of total NDVI measurements are invalid in Estonia, and the percentage is slightly lower in Southern Sweden and Denmark due to cloud cover. The Cloud mask indicates the percentage of cloud coverage and allows the cloudy and cloud-free pixels to be identified. Using the standard cloud mask technique by the European Space Agency (ESA) leads to outliers noticed in the sudden decrease in the NDVI. Figure 5 shows an extreme value of NDVI that is supposed to be an outlier due to high differences to the precedent and subsequent values. The outlier is marked with a yellow dot (NDVI=0.38), nearest previous (NDVI=0.75), and next (NDVI=0.78) measurements are marked with a blue colour.
Sentinel-1 and Sentinel-2 data preprocessing
To detect NDVI outliers effectively, a good understanding of the data is needed. NDVI outliers due to cloud mask errors rarely co-occur together, and hence, they can be treated as independent events53. NDVI outliers are usually identified with a sudden drop to almost zero and do not form a sequence. It is enough to look at neighbouring measurements (one before and one after) to detect individual outliers. If the difference between the adjacent measurements is high, this is an outlier signature. Hence, outliers can be handled by iterating through every three consecutive NDVI measurements for a given field and checking the difference between the first and second values and between third and second values. Figure 6 shows the scatter plot of all three consecutive NDVI measurements. The Y-axis shows the difference between third and second NDVI values in a triplet, while X-axis represents the difference between second and first NDVI values in a triplet. Triplets with up to 7 days difference are shown in blue, and triplets from 7 to 14 days are shown in green. The points structure forms a rhombus shape with a small cloud of possible outliers in the upper left corner. To filter outliers from the list of actual mowing events, we only consider triplets within up to 10 days interval (as the mowing event signature can recover in 10 days). Knowing rhombus equation (the centre is approximately in (0, 0), and the side length is around 0.6), the filtering rule can be easily applied as follows:
$$begin{aligned} ndvi_3 – 2 cdot ndvi_2 + ndvi_1 ge 0.6 end{aligned}$$
(3)
where ndvi_1, ndvi_2, and ndvi_3 are consecutive NDVI measurements within 10 days interval.
All outliers are removed, which represent around 0.1% of NDVI measurements.
Scatter plot of NDVI triplets.
Smoothing is an essential pre-processing step for noisy features. In this work, cohvh and cohvv features are smoothed using different techniques, including exponential moving average (EMA), moving average54, and Kalman filter55. Smoothing using moving average is done by taking the averages of raw data sequences. The length of the sequence over which we take the average is called the filter width. Table 1 shows the performance of moving average smoothing technique using different values for the filter width. The results show that the best AUC-ROC of 0.9671 is achieved at a filter size of 7. The Kalman filter produces estimates of the current state variables and their uncertainties. Once the outcome of the subsequent measurement is observed, these estimates are updated using a weighted average, giving more weight to estimates with higher certainty. The AUC-ROC achieved using Kalman filter is 0.962. The EMA is done by taking averages of sequences of data, in addition to assigning weights to every data point. More specifically, as values get older, they are given exponentially decreasing weights. The smoothed cohvh and cohvv EMA for cohvh and cohvv are calculated using a recursive definition (i.e., from its previous value) as follows:
$$begin{aligned}&cohvh_sm(cohvh_n, alpha ) = alpha cdot (cohvh_n) + (1 – alpha ) cdot cohvh_sm(cohvh_{n-1}, alpha ) end{aligned}$$
(4)
$$begin{aligned}&cohvv_sm(cohvv_n, alpha ) = alpha cdot (cohvv_n) + (1 – alpha ) cdot cohvv_sm(cohvv_{n-1}, alpha ) end{aligned}$$
(5)
where (cohvh_sm(cohvh_{n-1}, alpha )): exponential moving average for end of (cohvh_{n-1}). (cohvv_sm(cohvv_{n-1}, alpha )): exponential moving average for end of (cohvv_{n-1}). (alpha ): a smoothing parameter.
The higher the smoothing parameter, the more it reacts to fluctuations in the original signal. The lower the smoothing parameter, the more the signal is smoothed. Experimentally, we found that the best value for (alpha ) to achieve the best AUC-ROC of 0.968 is (frac{1}{3}) as shown in Table 2. The different smoothing techniques achieve comparable performance. EMA technique was selected as it achieves slightly higher performance.
Derived features
New derived features from S-1 and S-2 are extracted to improve the performance of the machine learning model. The features were derived based on the following knowledge about mowing events: coherence tends to increase. In contrast, ndvi tends to decrease after mowing events and, many farmers perform mowing during the same time of the year due to the good weather conditions. Such knowledge was elaborated with the derived features. In the following, we will go through the list of derived features considered in this study. Mixed coherence is derived from S-1 features to capture the overall coherence trend. Mixed coherence is a non-linear combination of cohvh and cohvv and is calculated as follows:
$$begin{aligned} Mixed_coh = sqrt{cohvh cdot cohvv} end{aligned}$$
(6)
The date is an important feature for the model to adapt, as it is more likely to have mowing events in the summer rather than in early spring, especially in Estonia. The normalized day of the year is calculated as normalization improves the training process of the neural network. Some methods normalize features during the training process, such as Batch Normalization used in this study56. However, neighbouring batches could have entirely different normalization variables (batch mean and variance). At the same time, DOY is a feature susceptible to small changes, e.g., mowing prediction on day 108 or 109 could have drastically different meaning (weekend or working day, day with sunny weather or day with heavy rain). It implies that unified normalization of the DOY feature before training could help avoid the unwanted impact of Batch normalization and possible gradient computation issues. The normalized day of the year is calculated as follows:
$$begin{aligned} t = frac{day_of_year}{365} end{aligned}$$
(7)
where (day_of_year) is the year’s day, which is a number between 1 and 365, January 1st is day 1.
In addition, we use another time feature dt to capture the gaps in time series. dt is defined to be the normalized difference in days between the current measurement and the previous one. Normalization was performed with min-max scaling. dt is calculated as follows:
$$begin{aligned} dt = frac{diff – min_diff}{max_diff – min_diff} end{aligned}$$
(8)
where (min_diff): the minimum difference in days between two previous consecutive measurements obtained from training data. (max_diff): the maximum difference in days between two previous consecutive measurements obtained from training data.
Since mowing is characterized by an increase in the coherence and decline in the NDVI, it is important to capture the difference in the values of features and/or slopes of the features’ curves. In the following, we summarize the list of original and derived features extracted from Sentinel-1 and Sentinel-2 included in this study.
ndvi Normalized difference vegetation index, obtained from Sentinel-2.
cohvv Coherence in VV polarization, Sentinel-1 feature.
cohvh Coherence in VH polarization, Sentinel-1 feature.
t Normalized day of the year when the measurement is obtained.
dt Normalized difference in days between current and previous measurement. The data was interpolated with a daily grid, this feature differentiated between interpolated data and real data by capturing the difference between valid (not interpolated) measurements.
cohvv_sm Smoothed cohvv with exponential mowing average (with parameter (frac{1}{3})).
cohvh_sm Smoothed cohvh with exponential moving average (with parameter (frac{1}{3})).
mixed_coh Harmonic mean of cohvv and cohvh. The harmonic mean is chosen as one of the simplest options of non-linear combination.
ndvi_diff Difference between current and previous NDVI measurements. This feature captures the decrease in the ndvi, which is highly related to mowing detection.
cohvv_sm_diff difference between current and previous (cohvv_sm) measurements. This feature captures the increase in the (cohvv_sm), which is highly related to mowing detection.
cohvh_sm_diff difference between current and previous (cohvh_sm) measurements. This feature captures the increase in the (cohvh_sm), which is highly related to mowing detection.
ndvi_der The slope of the line between previous and current NDVI values.
cohvh_sm_der The slope of the line between previous and current (cohvh_sm) values. This feature captures the change in the smoothed cohvh.
cohvv_sm_der The slope of the line between previous and current (cohvv_sm) values. This feature captures the change in the smoothed cohvv.
Feature selection
The permutation feature importance measurement was introduced by Breiman57. The importance of a particular feature is measured by the increase in the model’s prediction error after we permuted the values of this feature, which breaks the relationship between the feature and the outcome. A feature is important if shuffling its values increases the model error and is less important otherwise. The importance of features considered in this study is ranked in Table 3. It is notable from Table 3 that the ordinal features are significantly more important than the derived ones. We used backwards elimination to select the optimal subset of features to be used by the machine learning model. More specifically, we start with all the features and then remove the least significant feature at each iteration, which improves the model’s overall performance. We repeat this until no improvement is observed on the removal of features. Figures 7 and 8 show that the end of season accuracy(EOS) and event accuracy, respectively, for training using a different subsets of the most important features. We refer to (F_{x}-F_{y}) to be the set of important features from feature x to feature y in Table 3. Figure 7 shows that using only ndvi and (mixed_{coh}) achieves EOS of 93%. Increasing the number of the most important features to 3 achieves a comparable performance to the best one, as shown in Fig. 7. The results show that using the ndvi and (mixed_{coh}) achieve around 73% event accuracy while increasing the number of features, the performance declines as shown in Fig. 8. As an outcome of the feature selection process, the developed machine learning model used all the 14 features, shown in Table 3, that achieve the highest combined performance.
End of season accuracy for different number of features.
Event-based accuracy for different number of features.
Machine learning model
Each record in our dataset represents specific features about a field during one season at a particular time, in addition to the target variable (mown or non mown). In this work, we use a neural network to predict mowing events. We are interested only in observations during the vegetative season, so winter measurements are not included. More specifically, we only include the data in the vegetative season, which is almost the same across all Estonia from April till October (215 days). The dataset is partitioned into 64% for training, 20% for testing and 16% for validation. All training and testing were performed using TensorFlow58 deep learning framework with default parameters. The architecture of the neural network used is shown in Fig. 9. To guarantee a fixed time interval of 1-day, all the missing values in S-1 and S-2 features are interpolated, as shown in Fig. 10. The data is processed in batches of size (64 times 215) (times )14, where 64 is the number of fields considered per patch, 215 is the number of days in the vegetation season in Estonia, 14 is the number of selected features.
Architecture of the proposed model.
The network’s output is a vector of size 215, representing the probability of a mowing event on each day in the vegetation season. The network consists of three one dimension convolution layers. The first and second convolution layers are followed by the Softmax activation function and batch normalization layer, while the third convolution is followed by Sigmoid activation function. The NN hyperparameters required to achieve the model learning process can significantly affect model performance. These hyperparameters include the following56:
Number of epochs represents how many times you want your algorithm to train on your whole dataset.
Loss function represents the prediction error of Neural Network.
Optimizer represents algorithm or method used to change the attributes of the neural network such as weights and learning rate to reduce the loss.
Activation function is the function through which we pass our weighted suown to have a significant output, namely as a vector of probability or a 0–1 output.
Learning rate refers to the step of backpropagation, when parameters are updated according to an optimization function.
Time series mowing events before and after linear interpolation.
A good model uses the optimal combination of these hyperparameters and achieves good generalization capability. The training was performed with the conjugate gradient descent method and the binary cross-entropy loss function. The neural network was trained during 300 epochs; an early stopping was used59. The optimizer used in our model is Nadam optimizer60 with the following parameters: (beta_1=0.9), (beta_2=0.999), (epsilon=None), (schedule_{decay}=0.004), and learning (rate=0.001). Different activation functions such as ReLU, Sigmoid, Linear, and Tanh have been experimentally evaluated on the testing dataset as shown in Fig. 11. The results show that the Softmax activation function achieves the highest combined performance (event accuracy of 72.6% and EOS of 94.5%), as shown in Fig. 11.
Performance of different activation functions.
Using 1D convolution layer acts as a filter that slides on the time dimension allowing the model to predict future mowing events from past events. However, this approach is not suitable for real-time detection of mowing events, but we use it to predict mowing events within a fixed time frame (window). Such a time frame should be greater than half the (1-D) convolution window length.
Model evaluation
To evaluate our model, we used two metrics, EOS accuracy and Event-based accuracy. EOS is the accuracy of detecting a mowing event at least once during the season. If the probability of detecting a mowing event at least once during the season is more than 50%, then the field is considered mown, otherwise not mown. Event-based accuracy is used to evaluate how well our model correctly predicts mowing events. The formula for quantifying the binary accuracy is defined as follows:
$$begin{aligned} acc = frac{TP + TN}{TP + TN + FP + FN} end{aligned}$$
(9)
where TP is the number of times that the model correctly predicted mowing events, given that the start day of the predicted mowing event is not more than 3 days earlier and not more than 6 days later than the actual start day of the mowing event. Within these 9 days, several mowing events may be predicted. To handle this case, only the first predicted mowing event is considered TP, and every next one is considered an FP. TN is the number of times that the model correctly predicted the absence of mowing events. FP is the number of times that the model incorrectly predicted mowing events. It also includes the number of times that the model correctly predicted mowing events, but the start of the event does not fit into a 9-days time frame with the actual start of some mowing event. FN is the number of times where the model missed actual mowing events.
Reject region
Calibration plot for proposed model.
Sometimes the model is not confident enough to give a reliable decision about the state of the field. We cannot expect reliable and confident predictions from inaccurate, incomplete or uncertain data. So, it is better in the cases of uncertainty about the prediction to allow the model to abstain from prediction. In this way, the obtained predictions are more accurate, while human experts could check rejected fields. Given the true positive rate and the true negative rate on the validation set, the reject region technique outputs a probability interval ((t_{low}), (t_{upper})) in which the model abstain prediction, where (t_{low}) and (t_{upper}) are the minimum and maximum probabilities that the model is uncertain about its prediction. Out of this interval, the model is confident about its prediction and predicts afield as mowed if the probability is higher than (t_{upper}) and not mown if the probability is less than (t_{low}). We select (t_{upper}), such that the desired true positive rate is reached. To find (t_{upper}), we sort all positives descending by their predicted probabilities and select the top percentage equal to the true positive rate. We choose (t_{low}) such that the desired true negative rate on validation data is reached. To find (t_{low}), we sort all negatives ascending by their predicted probabilities and select the top percentage equal to the true negative rate.
Figure 12 shows the calibration plot for our proposed model. Notably, the predicted probabilities are close to the diagonal, which implies that the model is well-calibrated.
Source: Ecology - nature.com