Study area
Penang is situated in the northern part of Peninsular Malaysia and lies within the latitudes 5°12’N to 5°30′ N and longitudes 100°09’E to 100°26’E (Fig. 7). Penang with a land area of 295 Km2, has an estimated population of 720,000 and is regarded as the most populated island in Malaysia. Penang shares the same border on the north and east with Kedah State and the south with Perak State. There are two main parts of Penang State: Penang Island and the mainland which is also regarded as Seberang Perai. These two parts of the State are connected by the two bridges. The eastern part of Penang Island is the most urbanized area comprising industries, commercial centres and residential buildings. However, the western part is less developed comprising mainly hilly terrain and forests22. This study is focused on the Island part of Penang. This island is endowed with a yearly equatorial climate (hot and humid). It has a mean annual temperature ranging between 27 and 30 °C while the mean annual relative humidity ranges between 70 and 90%. Also, the mean annual rainfall is about 267–624 cm.
Data acquisition
The flow chart of the methodology is presented in Fig. 8. Landsat satellite images were used for the assessment of changes in land use covering a period of 2010–2021 (11 years).
These images were gotten from the website of the United State Geological Survey (https://earthexplorer.usgs.gov). The Landsat images include the Landsat 5 TM (thematic mapper) and Landsat 8 OLI / TIRS (operational land imager / thermal infrared sensor). These were downloaded from the Landsat level 1 dataset (Table 6) with additional criteria which reduced the.
Determination of LST and NDVI for Landsat 5 and 8
Band 6 of Landsat 5 and band 10 of Landsat 8 were used for the determination of the land surface temperature (LST). The LST and normalized difference vegetation index were determined using the following steps:
Conversion of top of atmosphere (TOA) radiance
Using the radiance rescaling factor, thermal infra-red digital numbers were converted to TOA spectral radiance using the equation below29: (frac{Red – NIR}{{Red + NIR}}) (frac{Red – NIR}{{Red + NIR}}) For Landsat 8,
$$ {text{L}}lambda = left( {{text{ML}} times {text{ Qcal}}} right) + left( {{text{AL}} – {text{Oi}}} right) $$
(1)
For Landsat 5,
$$ {text{L}}lambda = left( {{ }frac{{{text{LMax}}lambda – {text{LMin}}lambda }}{{{text{QcalMax}} – {text{QcalMin}}}}} right) times left( {left( {{text{Qcal }} – {text{QcalMin}}} right) + {text{LMin}}lambda } right) $$
(2)
where Lλ is TOA spectral radiance, ML is radiance multiplicative band Number, AL is radiance add band number, Qcal is quantized and calibrated standard product pixel values (DN for band 6 or band 10), Oi is the correction value for the respective bands, LMaxλ is spectral radiance scaled to QcalMax, LMinλ is spectral radiance scaled to QcalMin, QcalMax is maximum quantized calibrated pixel value, and QcalMin is minimum quantized calibrated pixel value.
Conversion to TOA brightness temperature (BT)
Spectral radiance data were converted to TOA brightness temperature using the thermal constant values in the Metadata file29.
Kelvin (K) to Celcius (°C) degrees
$$ BT = {raise0.5exhbox{$scriptstyle {K2}$} kern-0.1em/kern-0.15em lower0.25exhbox{$scriptstyle {ln left( {frac{K1}{{{text{L}}lambda { } + { }1}}} right)}$}} – 273.15 $$
(3)
where BT is the Top of atmosphere brightness temperature (°C), Lλ is TOA spectral radiance (W.m−2 .sr−1 .µm−1)), K1 is the K1 constant band number, and K2 is the K2 constant band number. For Landsat 5, K1 is 607.76, and K2 is 1260.56.
Normalized difference vegetation index (NDVI)
The Normalized Difference Vegetation Index (NDVI) is a standardized vegetation index which reveals the intensity of greenness and surface radiant temperature of the area30,31. The index value of NDVI usually ranges from − 1 to 1. The higher NDVI value indicates that the vegetation of the area is denser and healthier. This shows that the NDVI values of normal healthy vegetation range from 0.1– 0.75, while it is almost zero for rock and soil, and negative value for water bodies24. The NDVI is calculated using the followings:
$$ {text{NDVI }} = frac{{left( {{text{NIR }}{-}{text{ RED}}} right){ }}}{{left( {{text{NIR }} + {text{ RED}}} right)}} $$
(4)
In Landsat 4–7
$$ {text{NDVI }} = , left( {{text{Band 4 }}{-}{text{ Band 3}}} right) , / , left( {{text{Band 4 }} + {text{ Band 3}}} right) $$
In Landsat 8
$$ {text{NDVI }} = , left( {{text{Band 5 }}{-}{text{ Band 4}}} right) , / , left( {{text{Band 5 }} + {text{ Band 4}}} right) $$
where: RED = DN values from the RED band, and NIR = DN values from the Near Infra-red band.
Land Surface Emissivity (LSE)
Land Surface Emissivity is the average emissivity of an element on the surface of the earth calculated from NDVI values.
$$ {text{PV }} = left{ {frac{{left( {{text{NDVI }} – {text{ NDVImin}}} right)}}{{left( {{text{NDVImax }} – {text{ NDVImin}}} right)}}} right}^{2} $$
(5)
where PV is the Proportion of vegetation, NDVI is the DN value from the NDVI image, NDVImin is the minimum DN value from the NDVI image, and NDVImax is the maximum DN value from the NDVI image.
$$ {text{E }} = left( {0.004{ } times {text{PV}}} right) + 0.986 $$
(6)
where E is land surface emissivity, PV is the Proportion of vegetation, 0.986 corresponds to a correction value of the equation.
Land Surface Temperature (LST)
Land Surface Temperature (LST) is the radiative temperature which is calculated using top of atmosphere brightness temperature, the wavelength of emitted radiance and land surface emissivity.
$$ {text{LST}} = {raise0.5exhbox{$scriptstyle {BT}$} kern-0.1em/kern-0.15em lower0.25exhbox{$scriptstyle {left( {1 + left( {{lambda } times { }frac{{{text{BT}}}}{{{text{c}}2}}} right) times {text{ln}}left( {text{E}} right)} right)}$}} $$
(7)
Here c2 is 14388. The value of λ for Landsat 5 (Band 6) is 11.5 µm and Landsat 8 (Band 10) is 10.8 µm.
Where BT is the top of atmosphere brightness temperature, λ is the wavelength of emitted radiance, and E is land surface emissivity.
c2 = h*c/s (1.4388*10–2 mK = 14388 mK), h is Planck’s constant (6.626*1034 Js), s is Boltzmann constant (1.38*1023 JK), c is velocity of light (2.998*108 m/s).
Determination of land use and land cover (LULC) of the study area
The Landsat images were pre-screened and subjected to clipping and classification32. The boundary shape file of Penang was used to clip out the area of study.
Image classification
The unsupervised method involving a random assignment of sample training points and supervised methods of satellite image classification was employed in this study for determining the LULC types. This mixture of image classification methods has been reported as vital in achieving a high accuracy level33. Bands 5, 4 and 3 were used to classify Landsat 8 while bands 4, 3 and 2 were used for classifying Landsat 5. We used the extraction by mask in the spatial analyst tool of ArcMap 10.2.1 software to extract the study area from the selected bands of the Landsat satellite images. A widely used supervised image classification method was adopted for classifying the Landsat bands in this study32,34. The principle of operation of this method involves the identification of known sample training points which are then used to classify other unknown points with related spectral signatures35. The three monochromatic satellite bands were combined to produce the false colour composite (FCC) using the data management tool36. This involves drawing polygons on the LULC type to select the training points. The LULC types adopted for this study include urbanized areas, agricultural land, rocks, forests, bare surfaces, and water bodies. These were modified LULC types from IPOC Good Practice Guidance37. To achieve this, a minimum of 40 sample points were selected randomly for each category of LULC type36. Having prior knowledge of the study area assisted in the selection of the training points38.
The multivariate maximum likelihood classification (MLC) technique was used for transforming the images. Other image transformation techniques have been used by researchers. These include the fuzzy set classifier, neural networks (NN) classifier, extraction and classification of homogenous objects (ECHO) classifier, per-field classifier, sub-pixel classifier, decision trees (DTs), support vector machines (SVMs), minimum distance classifier (MDC) and so-on39. The adoption of any of these techniques is dependent on the knowledge of the area of study, band selection, accessibility of data, the complexity of the landscape, the classification algorithm, and the proficiency of the analyst39. We preferred MLC to other techniques in this study due to its reported high level of accuracy in tropical regions32,34. Another reason for choosing MLC is that it is readily incorporated in many widely used GIS software packages. This MLC algorithm operates based on assigning pixels to the highest probability class and establishing the class ownership of such pixels. It is also regarded as a parametric classifier whose data follows almost a normal distribution39. We ensured the accuracy of this classifier by assigning a large number of training sample points using our prior knowledge of the study area.
Description of the LULC categories
The urbanized area is the developed part of the study area. This includes houses, roads, railways, and industries. This is also known to be a settlement in other literature40. Agricultural land is the part of the study area dominated by agricultural activities and herbaceous plants and grasses. Agricultural land is generally a product of deforestation36. Rocks are part of the study area comprising solid mineral materials (rocks). Bare land is the bare soil which is either made open by natural or human activities.
Forests are parts of the study area dominated by trees. They can be primary or secondary forests depending on the rate of disturbances. According to41, forest land is an area having more than 0.5 ha of flora comprising trees (height is above 5 m) with a canopy greater than 10%. The forests in Penang are generally both primary and secondary42. Water bodies are parts of the study area covered by water seasonally or permanently. These include seas, rivers, lakes, ponds, streams, or reservoirs40.
Determination of change in the LULC
The rate and extent of change in the LULC of Penang within the periods under consideration were determined following the formula below43:
$$ {text{Changed area }}left( {{text{C}}_{{text{a}}} } right) , = {text{ T}}_{{text{a}}} left( {text{year 2}} right) , {-}{text{ T}}_{{text{a}}} left( {text{year 1}} right) $$
(8)
$$ {text{Changed extent }}left( {{text{C}}_{{text{e}}} } right) , = {text{ C}}_{{text{a}}} /{text{ T}}_{{text{a}}} left( {text{year 1}} right) $$
(9)
$$ {text{Percentage of change }} = {text{ C}}_{{text{e}}} {text{x 1}}00 $$
(10)
where Ta means the total area.
Determination of relationship between LST and NDVI
The values of LST and NDVI at 20 random points of each LULC class were used. The relationship between the LST and NDVI across all the LULC classes in each year was determined using the bivariate linear regression analysis. This was done in Paleontological Statistical (PAST) package 3.0.
Classification accuracy assessment
The classification accuracy was assessed by taking ground truth coordinate data of the LULC of the study area using a geographical positioning system (GPS) device (Garmin Etrex 10). These data were compared with the LULC classified in this study32. Consequently, an error matrix was generated. This normally uses ground truth data to explain the accuracy of the classified LULC. The error matrix comprises the user’s accuracy, the producer’s accuracy, overall accuracy and the Kappa index32.
The producer’s accuracy (omission error) represents the probability of the correctly classified reference pixel and it is determined using this formula below:
$${text{Producer’s accuracy }}left( % right) , = { 1}00% , – {text{ error of omission}} $$
(11)
Also, the user’s accuracy (commission error) represents the probability that the classified pixel matches the one on the ground36 and it is determined using the formula below:
$$ {text{User’s accuracy }}left( % right) , = { 1}00% , – {text{ error of commission}} $$
(12)
The statistical accuracy of the matrix was determined using the Kappa coefficient44. This Kappa coefficient ranges from − 1 to + 145. Therefore, the overall accuracy of the classification was determined by dividing the total number of correctly classified pixels by the total number of sampled ground truth data40.
Source: Ecology - nature.com