Data sources and pre-processing
Each of the predictor variables used in our analysis (Table 1), as well as the dependent variable (fire hotspots) underwent pre-processing to transform the data into a format suitable to be passed to our CNN model for prediction. Here we briefly outline these processes and describe the method of generating a training and validation data set for model development. For further details about each predictor variable pre-processing, see Horton et al. (2021).
Fire hotspots
We used both Moderate Resolution Imaging Spectroradiometer (MODIS) and Visible Infrared Imaging Radiometer Suite (VIIRS) fire hotspot data as the dependent variable for use in our model development. As fire hotspots do not give precise locations, but rather indicate that a fire hotspot occurred within a grid cell of the size of the dataset (MODIS 1 km, VIIRS 375 m), we represented each fire hotspot as a 500 m buffered area around the centre point of each grid square identified. We used all fire hotspot occurrences with a confidence rating >50%.
Landcover
We use a collection of historic land cover maps generated by the Ministry of Forestry Indonesia from 1996 to 2016 at 2–3 year intervals38. Before use, we re-designated the land cover map classifications to reduce the number from 25 to just 8 (supplementary Table S2), which are ‘Primary and secondary dry forest’, ‘Swamp forest, ‘Swamp scrubland’, ‘Scrubland, Transition, and bare land’, ‘Riceland’, ‘Plantation’, ‘Settlements’, ‘water, and Cloud’.
In addition to these 8 land cover classifications, we also derived a forest clearance index, which identifies areas cleared of forest and assigns an index value that is large negative (−10) immediately after clearing and degrades back towards 0 as time since clearing increases yearly. Areas that are re-forested are assigned large positive values (10) that degrade towards 0 yearly as time since afforestation increase25.
Vegetation indices
All vegetation indices were taken as pre-fire season 3-month averages from May to July. In addition to the original MODIS ET, PET, NDVI, and EVI products, we also included ‘normalised’ variables, whereby each vegetation index was expressed as the ratio of the same index taken at a reference site. The reference site was an area of dense primary forest outside of the EMRP area.
Proximity to anthropogenic factors
The distance to roads and settlement rasters were derived from OpenStreetMap data as the Euclidean distance to nearest feature in 250 m resolution. The same was done for all water bodies, which were then classified by hand into either canals or rivers. These features are taken as those shown in 2015 for all years, and therefore may misrepresent earlier years. However, the majority of canal development in the region took place between 1996 and 1998 and so should not differ dramatically from this date onwards.
Oceanic Niño Index (ONI)
We use a single value for the entire study area taken as the three-month average for the early fire season each year (July–September).
Number of cloud days
Using the state_1km band in the daily MODIS terra product (MOD09GA version 6), which classifies each pixel as either ‘no cloud’, ‘cloud’, ‘mixed’, or ‘unknown’, we counted the number of ‘cloud’ or ‘mixed’ designations for each pixel for the pre-fire season period May–July.
Cross year normalisation
All predictor variables are normalised to be represented between 0 and 1 as the range between the minimum and maximum values for each variable that occur across all years, such that:
$${V}_{{{{{{rm{norm}}}}}}}=frac{V-{V}_{{min }}}{{V}_{{max }}-{V}_{{min }}}$$
where ({V}_{{{{{{rm{norm}}}}}}}) is the normalised version of the predictor variable (V), ({V}_{{max }}) is the maximum value within the training dataset across all years (2002–2019), and ({V}_{{min }}) is the minimum value within the training dataset across all years.
Training and validation dataset assembly
Once pre-processed, all predictor variable rasters were resampled to the same dimensions (with a resolution of 0.002 degrees in the WGS84 co-ordinate system) and stacked yearly, so that each year (2002–2019) comprised of a 31 feature maps input as a raster stack, with each feature map representing a different predictor variable. Each yearly stack was then split into tiles matching the input dimensions of the CNN model. Our final model was built to take an input size of 32 × 32 pixels (raster cells). Therefore, each yearly raster stack was split into many 32 × 32 × 31 raster stack tiles that span the defined study area. These were then converted to 3D arrays holding the values of all predictor variables for each raster stack tile.
The same process was repeated for the yearly fire hotspot rasters used as the dependent variable in building our model. Each year was split into 32 × 32 × 1 tiles across the study area, and then converted to 3D arrays, each of which pairs with one predictor variable array.
The 3D predictor variable arrays (dimensions: 32 × 32 × 31) were then stacked into one large 4D array containing all these individual tiles across all years (dimensions: W × 32 × 32 × 31, where W is a large value). The same was done with the 3D dependent variable arrays (dimension: 32 × 32 × 1), preserving the order so that each element in this large 4D array (dimensions: W × 32 × 32 × 1) matches with its counterpart in the predictor variable array.
The order of this large 4D training data array was then randomised along the first dimension to avoid bias in passing to the CNN training algorithm, but the randomised re-ordering was repeated with the dependent variable array so as to preserve the elementwise pairing for cross-validation.
Model development and application
Fire prediction requires the combination of spatial and temporal indicators to generate a probabilistic output for each location within a given study area. There is a need to preserve a certain level of proximity information, as the location of variables in relation to one another may have a substantial impact on the results. For example, a patch of secondary forest that is immediately adjacent to an area recently deforested may have a significantly higher probability of fire occurrence than an area surrounded entirely by primary forest.
CNNs retain spatial features by employing a moving window of reference, known as a kernel, over the input image that captures these proximity relationships within the model structure. For this reason, CNNs are often used for image classification problems, and is an ideal model configuration for the problem of fire prediction across an area. Therefore, we have developed a CNN binary classification model using the Keras API package39 that builds on the TensorFlow machine learning platform40.
Model structure
CNN models typically apply a combination of kernel layers and dense layers that perform a series of transformations on the multi-channel input to either reduce it down to a single value, or to output an image the same width and height as the input with a single channel. These classification models can either assign a single value (binary classifier), or return one of many possible classifications.
Kernels act on a subsection of the input stack (31 feature maps), assigning weights according to each cell’s position within the subsection to transform and combine the values into a new format to pass forward. As the kernel is applied to all subsections of the input stack, it transforms them to the new format, and builds a reconstituted image with dimensions that usually differ from the input. A dense layer will do the same operation, but acting only on a single grid cell of the input stack, acting at the same location upon all input feature maps within the stack at a time—using all values at that location (i.e., the 1 × 1 subsection) and transforming them according to assigned weights to pass forward a new set of channels to a single grid cell on the output stack. Each layer, either kernel or dense, may expand or contract the number of channels it passes forward. A kernel layer may also change the width and height dimensions of the subsection it passes forwards.
We require an output that corresponds to a map of fire-occurrences; therefore our model needs to perform a series of transforms that preserve the width and height of the input, but reduce it to a single channel. The single channel in the output then represents the probability of each cell being classified as fire or not-fire (0–1).
Our CNN model is comprised of 5 kernel layers (K1–K5 in Fig. 5), each acts on a 3 × 3 subsection and preserves width and height, passing forwards a transformed 3 × 3 section. Kernel K1 takes an input of 31 channels (predictor variables) but passes forward 128 channels to form the transformation T1 (Fig. 6). Kernels K2–K4 take inputs of 128 channels and pass forward 128 channels (T2–T4). Kernel K5 takes an input of 128 channels but passes forward 1 channel—the output. After each kernel applies its weights, there is an activation function applied before the values are passed on, which modify the answer to fit the necessary criteria to be a valid input to the next process. Kernels K1–K4 have a rectified linear (relu) activation function, which returns the input value if positive, and 0 if negative. Kernel K5 has a sigmoid activation function, that transforms the input values to between 0 and 1 such that negative values are transformed to <0.5, and positive values are transformed to >0.5.
Model training and validation
We used a stochastic gradient descent optimising function called Adam41 combined with a binary cross-entropy loss function to train the model against our fire-hotspot dataset iterated over 20 epochs. We split the data 70/30, using 70% as training data and 30% as validation data, recording accuracy, precision, and recall as the performance metrics, as well as the loss function itself.
After model training, we applied the model to each yearly raster stack and compared the output against the fire-hotspot data for further model validation. Before validating the model outputs, we applied a simple 3 × 3 moving average window as a smoothing function to reduce the edge effects of tiling that are a by-product of having to split the study area into smaller tiles (32 × 32) for passing to the model. For this yearly validation, we again used the metrics accuracy, precision, and recall, such that:
$${{{{{rm{Accuracy}}}}}}=100({{{{{rm{TP}}}}}}+{{{{{rm{TN}}}}}})/({{{{{rm{TP}}}}}}+{{{{{rm{TN}}}}}}+{{{{{rm{FP}}}}}}+{{{{{rm{FN}}}}}})$$
$${{{{{rm{Precision}}}}}}=100({{{{{rm{TP}}}}}})/({{{{{rm{TP}}}}}}+{{{{{rm{FP}}}}}})$$
$${{{{{rm{Recall}}}}}}=100({{{{{rm{TP}}}}}})/({{{{{rm{TP}}}}}}+{{{{{rm{FN}}}}}})$$
where TP is true positive, TN is true negative, FP is false positive, and FN is false negative. These comparisons were made on a raster cell to raster cell basis after designating a 500 m buffer around each fire hotspot observation (MODIS and VIIRS data) and converting the buffers to a raster image of the same resolution and extent as the model prediction.
Scenarios
After validating the model performance, we built future scenarios to investigate the impact on fire occurrence of managing key anthropogenic features of the landscape: canals and land cover (Table 2).
Studies have shown that unmanaged areas of heavily degraded or cleared swamp-forest are most susceptible to fires16,17,25,26,33,42. Therefore, we have built scenarios that investigate the possible impact of managing these areas by altering the model inputs to re-assign the land-cover designations ‘Swamp shrubland’ and ‘Scrubland’, as well as other land designation alterations. The first such restoration scenario investigates the impact of reforesting these areas by re-assigning the designations to ‘Swamp forest’. The second such scenario investigates the impact of converting these unmanaged areas to plantations by re-assigning the designations to ‘Plantation’. We also built two further land cover scenarios to investigate the impact of continued deforestation in the region by re-assigning the ‘Swamp forest’ designation to ‘Swamp shrubland’ and ‘Plantation’.
We then built a scenario to investigate the impact of canal blocking on fire occurrence, modifying the proximity to canals model input by reducing the number of canals included in our proximity analysis to just two major canals, one that runs north-south, and one that runs west-east (Fig. 1). These canals could not practically be blocked due to their size and importance as navigation conduits.
The final scenario simulates the combined impact of both re-foresting unmanaged degraded and cleared forest areas and the blocking of canals simultaneously.
To evaluate the impact of each scenario on fire occurrences, we calculated the ratio of model predictions >0.5 probability (i.e., that a fire would occur in that raster cell) for each year for each scenario against the same year for the baseline scenario.
Model use as a predictive tool
To evaluate the model’s potential to predict future fire distribution across the wider ex-Mega Rice Project area, we trained a second version of the model following the same methodology outlined above, but included only data from 2002 to 2018 in the training and test data passed to the model fitting algorithm. We then applied the model to the predictor variables corresponding to 2019 and compared model outputs to the observations of fire-occurrences by again looking at the metrics accuracy, precision, and recall. We also present a visual comparison of the outputs from the full model (2019 included in training data), the predictive model (2019 not included), and the observation data (MODIS and VIIRS hotspots).
Source: Ecology - nature.com