An indirect approach to predict deadwood biomass in forests of Ukrainian Polissya using Landsat images and terrestrial data

An indirect approach to predict deadwood biomass in forests of Ukrainian Polissya using Landsat images and terrestrial data


Introduction
Spatially explicit estimation of forest structural compartments has become a main target of studies and forest monitoring during the last decades (Lim et al., 2003;Zimble et al., 2003;Zald et al., 2016). Deadwood as the key legacy component of forest ecosys-tems reflecting natural dynamics processes in the stand structure, is increasingly considered in conservation and management policies throughout the world (Franklin et al., 2002;Seidl et al., 2014). Consistent assessment of deadwood stocks has been recognized as an important task in the studies of biodiversity and habitat maintenance, carbon sequestration, nutrient cycling, pest controlling and other ecosystem functions and services (Angelstam & Kuuluvainen, 2004).
Landsat image time series with simultaneous pre-processing of spectral data that are freely available on the cloud-computing platform Google Earth Engine have created new capabilities for mapping of forests and their biophysical parameters and features (Cohen & Goward, 2004). New temporally continuous and spatially explicit approaches have significantly empowered a robust estimation of deadwood (Gonzalez et al., 2018). Temporal segmentation of Landsat seasonal mosaics allows an identification of natural disturbances which are strictly linked to deadwood accumulation Kennedy et al., 2010). Other remote-based methods have been developed (e.g., Oeser et al., 2017), focused on different natural drivers of tree mortality: either stand-replacing events (e.g., wildfires, intensive storms) or gap-scale disturbances (e.g., insect outbreaks). Significant improvement of deadwood estimation has been achieved with rising application of terrestrial-based and airborne LiDAR techniques coupled with empirical field-based and satellite-based data (Gonzalez et al., 2018).
Herewith, precise estimation of forested areas and quantification of ecosystems' biophysical values is linked to the various statistical classification and regression approaches. For the last decades, non-parametric machine learning methods have shown a strong capability to be applied for forest remote sensing data. A number of studies refer to successful implementation of numerous classifiers (e.g., decision trees (Ruefenacht et al., 2008); support vector machines (SVM, Li et al., 2013); artificial neural networks (Civco, 1993)) with application to coarse and medium (Nitze et al., 2015), high (Thanh Noi & Kappas, 2018) and extra-high (Hayes et al., 2014) spatial resolution of data. These methods have shown good performance in classify-ing land cover types (Gislason et al., 2006), forest successional stages (Liu et al., 2008), and tree species occupation (Evans & Cushman, 2009). Regression tasks like estimation of growing stock volume (GSV), live and dead biomass and sequestered biomass carbon have met more robust and user-friendly implementation since R-package "yaImpute" and similar applications were developed (Crookston & Finley, 2008;McRoberts, 2012).
To date, ensemble machine learning methods are successfully used in remote-sensing applications (Belgiu & Dragut, 2016). Using a set of so-called weak learners (mainly decision trees), these models apply either a bootstrap or bagging process (well-known Random Forest, RF), or boosting fitting (e.g., gradient boosting machines, GBM). The main challenges those ensemble and above-mentioned methods face are the quality of reference and ancillary data, while tuning of modelling hyper-parameters (e.g., learning rate of number of iterations) was reported as an issue more dependent on dataset features (Maxwell et al., 2018).
In the twenty-first century, Ukraine is facing both an increasing interest in a full carbon accounting of forest ecosystems, including estimation of deadwood stocks, and substantial challenges due to the absence of the main sources of reference data (e.g., sampling plots-based National Forest Inventory, LiDAR applications, Shvidenko et al., 2017). Polissya is the second forested region in Ukraine and can be described by specific circumstances that call for robust applications of deadwood estimation. Those are the presence of primeval forests of black alder (Alnus glutinosa L.), recovery processes on abandoned areas due to the radioactive contamination since the Chornobyl disaster in 1986 and economic shifts, and land degradation caused by illegal amber mining (Lesiv et al., 2018). Some studies refer to the possible assessment of deadwood stocks using field sample plots and multispectral satellite images in this region (Bilous et al., 2017;Lakyda et al., 2019).
Here we aim to (i) examine the performance of creating forest mask using Random Forest for the study site in Ukrainian Polissya; (ii) map stocks of coarse woody debris (CWD) biomass using k-Nearest Neighbours (k-NN) and gradient boosting models (GBM).

Study area
A study site in Ukrainian Polissya was chosen with the aim to fully capture regional geographical and forest composition features. The study was carried out for a site in the Chernihiv region (Northern Ukraine, Figure 1) with an area of ca 1,192 km 2 , between N 50°54′1″, E30°44′60″ and N51°14′15″, E31°12′16″. The dominated land cover types (according to the local operational forest management data) are croplands, grasslands in Desna river bank and homogeneous coniferous replanted forests (with Scots pine (Pinus sylvestris L.) as the main tree species). Some forest stands are formed by silver birch (Betula pendula Roth.) and common aspen (Populus tremula L.); natural forests of black alder are also presented. Pine-dominated forests are mainly pure, with insignificant admixtures (i.e., the share of mean GSV is ca less than 10%) of silver birch. Forests within the study area are managed by two main authorities: State Forest Enterprise "Osterskiy" and Regional Landscape Park "Mezhyrichensky".
The local climate represents the continental climate subtype, with a mean annual temperature of 7.2 ºC and an average total precipitation of 702 mm. Soils are mainly sod-podzolic, with alluvial soils in river banks. Considering the forest type classification implemented in Ukraine, Scots pine grows there in A 2 and B 2 types (i.e., poor and semi-poor level of soil fertility and fresh sites in terms of humidity), silver birch -B 2 and C 3 , and black alder -C 4 and C 3 (soils with a rich nutrient content and high humidity level; for more detailed classification of forest types see .

Data on deadwood biomass
Forest inventory maps delineating forest polygons have been incorporated in the study as a source of forest attributes derived from the forest planning and management (FPAM) data (available for 2011). We considered utilizing FPAM data instead of statistical forest inventory datasets, since these are currently limited to several regions (Myroniuk, 2018) which do not include the study area. FPAM data for these stands contain information on dominant tree species, mean diameter at breast height, mean height, age, site index, relative stocking, forest type, and GSV estimated during the last forest accounting. In total, we used data from 2,125 primary units available from the FPAM dataset for the study area. These polygons were given as ESRI shapefiles.
Stocks of CWD biomass were estimated using allometric linear models developed for the Ukrainian Polissya region (Lakyda et al., 2019). According to the classification implemented in Ukraine , CWD is divided into such well-common deadwood components as snags (i.e., dead standing trees), logs (i.e., downed stems and stumps) and litter of coarse (d > 1 cm) branches. In this study, only biomass stocks of snags and logs were estimated: according to Lakyda et al. (2019), the litter of coarse branches weakly correlates with spectral data and mainly depends on the soil and fine litter features. The sum of the studied stocks further is considered as deadwood biomass stock. The allometric model used is given as (1): where M is the biomass stock of snags or logs (t·ha -1 ), a 1 , a 2 , a 3 and a 4 are regression parameters, D is the diameter at breast height (cm), H is the mean height (m), S is the relative stocking (ranging from 0 to 1). According to the source of these models (Lakyda et al., 2019), there was a relatively small root mean square error (3% of the estimated mean). This error was ignored in the following modelling and uncertainties estimations. For more detailed information on the regression parameters and predictor sets across compartments and tree species, see the respective paper (Lakyda et al., 2019).

Remote sensing data
Accuracy and reliability of produced maps is strictly dependent on the agreement between the used terrestrial-based and remote sensing data. Considering that FPAM data for the study area was available only for 2011, the same point of time was chosen for obtaining remote sensing data. We used the open United States Geological Service archive presented in the Google Earth Engine (GEE, Gorelick et al., 2017) to download Landsat-5 Thematic Mapper (TM) images (with 30 m spatial resolution) and create a cloud-free mosaic for the given site and time frame from 22 May 2011 to 22 September 2011 (according to the duration of the vegetation season in Ukraine). Landsat-5 TM imagery was already geometrically corrected. Radiometric correction to the surface reflectance and mosaicking with the aim of masking pixels with clouds were conducted using the GEE. The mosaics were then downloaded from the GEE and processed in the R environment (R Core Team, 2019).

Geospatial modelling: classification and data imputation
The main workflow of this study is illustrated in Figure 2. Further, the methods of classification and data imputation are described as given in this flowchart. For the land cover classification task, the RF classifier was chosen as the most popular among the models used in the various scientific papers (e.g., Jhonnerie et al., 2015;Thanh Noi & Kappas, 2018). Then, this type of a model was utilized for the tree species classification task (e.g., Lang et al., 2018). The relatively novel approach of gradient boosting (e.g., Sachdeva et al., 2018) was compared to the k-NN, as the last method is commonly used for regression and imputation purposes (e.g. Lang et al., 2014;Latifi et al., 2015;Fu et al., 2019).
Workflow of the study. The flowchart illustrates utilization of the data, and the modelling process for classification and imputation purposes.

Classification reference data
We provided a two-step classification of Landsat mosaics. First, land cover classes were predicted using the RF machine learning algorithm to produce a forest cover mask. Then, the classification on dominant tree species across forest stands within the produced mask was performed.
For both steps we used the same set of 6 predictors: Normalized Vegetation Difference Index (NDVI), Normalized Water Difference Index (NDWI), Tasselled Cap Transformation (TCT, namely brightness TCB, greenness TCG and wetness TCW), and elevation from the Digital Elevation Model (DEM) downloaded from the GEE for the study area. We chose NDVI and TCT as predictors because of their ability to capture GSV and biomass stock changes (Kauth & Thomas, 1976). In order to better classify water bodies, we utilized the NDWI predictor here. As an auxiliary non-spectral variable, elevation from the DEM is useful for distinguishing black alder from other broad-leaved species, since these stands are more likely to grow in lower sites regarding elevation above sea level (Lakyda et al., 2019). Ancillary datasets are illustrated in Figure 3. Multispectral data for land cover classification was extracted for a set of 1,010 sample points, randomly distributed across the study area. Sample plots were generated in the Quantum Geoinformatics System (QGIS 3.4) software, and land cover classes were then visually interpreted using freely available software from Google Inc. (Google Earth for visual interpretation). Each sampling point was visually inspected using high-resolution images available on Google Earth as a reference to assign one of eight thematic classes: bog, croplands (including gardens), forests (including the stands of linear spatial shape in the fields, across roads and rivers), grasslands (including unforested sites surrounded by forests, but without signs of clear-cuts being identified with the time series of high-resolution images available on Google Earth), settlements, shrublands, water bodies (rivers and lakes) and other (sands, rocks).
The validation dataset for the land cover classification was also compiled inde- Figure 3. Ancillary datasets used for spatial modelling. The DEM variable is given in meters above sea level.
pendently using the Collect Earth plugin and included 964 points, different from points used in the training dataset. The dataset for tree species classification was extracted from the FPAM database for 742 points that fell within forested areas. All points were carefully interpreted on Google Earth to ensure they reliably represented characteristics of the inventory polygons which they were linked to. For the training model we used 557 points (75%) and the rest (185 points) were used for the validation task. We considered four dominating tree species in the study: PISY (acronym for Scots pine), BEPE (acronym for silver birch), ALGL (acronym for black alder), and POTR (acronym for common aspen). Other species were grouped in the thematic class OTHER.

Parameterizing classification models
RF is an ensemble method that decreases the variance of output, using a set of simple learners (here Classification and Regression Trees, CARTs) that preferably give outputs with low bias. The training dataset is split into small subsets used by each learner in the RF model in an independent way, then the results are averaged (Breiman, 2001) to get the best performance (a process called "bootstrap", or "bagging") and the lowest variance. We used the "randomForest" R-package (Liaw & Wiener, 2002) to classify land cover. Default parameters (number of decision trees n = 500 and number of variables tried at each split by model mtry = 3) were reported as sufficient for good performance of the RF classifier, so we did not tune these (Belgiu & Dragut, 2016).
Performance of the RF models was assessed in terms of the following metrics: overall accuracy and the kappa (Congalton, 1991) index for all datasets, and user accuracy and producer accuracy for the class "forest". Confusion matrixes were built in the "caret" R-package (Kuhn, 2008).

Imputation of deadwood biomass
We used a training dataset that consisted of 439 sample stands with information on deadwood biomass stocks calculated for each forest stand using the abovementioned allometric models. Such a number of stands and points was chosen because of full spatial consistency between FPAMbased polygon map data and the forest mask obtained by the machine classifier for these stands. Imputation of continuous values of deadwood biomass stocks was performed using k-NN and GBM models based on the extracted mean spectral values of all pixels within the polygon of each stand from the sample dataset.
Since the allometric models of deadwood biomass are consistent with the parameters of a growing forest stand, we directly model this compartment instead of linking to GSV values (Pflugmacher et al., 2012). This decision was made based on an apparently strong relationship between GSV and deadwood biomass values obtained from FMAP data for the study area ( Figure 4).

Tuning imputation models
For imputation purposes, the performance of the k-NN algorithm was compared to GBM. The first method searches for an average value between the so-called "nearest neighbors" for a given point in the features' space. The weighting of values of "neighbors" is defined by different schemes matrixes (McRoberts, 2012). In this study we used the "randomForest" method, which implies no weighting of matrixes in searching for the values of "nearest neighbors", while the number of "neighbors" (k) was iteratively increased, with consideration of the threshold when the k-NN model likely starts to overfit. We iteratively increased the number of k (from 1 to 10), thus training 10 k-NN models. Imputation was performed in the R package "yaImpute" (Crookston & Finley, 2008). GBM was implemented in the "gbm" (Greenwell et al., 2019) R-package. Similar to the RF, it is an ensemble model, but weak learners (CART here as well as in the RF classifier) in it "boost" each other instead of performing independently like RF models. It means that in gradient boosting, every additional CART amplifies the performance of the previous learner or is rejected if fails to provide more precise output. To tune the GBM model, it is necessary to control the various hyperparameters that define training velocity and performance on the datasets depending on their size. Here we built a hyperparameter grid (Table 1), thus running 540 different models in the algorithm that was searching for the lowest mean square error (MSE). Inner validation was provided by this tuning algorithm, which took for this purpose a random 25% subset for each model. The shrinkage parameter (Table 1) defines a learning rate applied to each CART in the expansion. Interaction depth shows the highest level of variable interactions allowed in each tree. The number of trees shows how many CARTs were utilized in the GBM model. Bagging fraction is a proportion of observations randomly chosen to propose the next CART in the expansion, i.e., making the given GBM a stochastic boosting model (if this fraction is less than 1.0). We assessed the produced GBM models, considering their root mean square error (RMSE) produced by a random inner validation subset (25% of the whole training dataset, as the training fraction for all models was 0.75). Performance of the GBM and k-NN models on the whole (without an inner validation subset) training dataset was compared in terms of RMSE, and visual analysis of fit against a 1:1 line.
Additionally, we compared model performance on a test landscape which was not covered by the forest stands from the training dataset. This landscape (1,020 ha) represents forest communities typical of the entire study area: Scots-pine dominated stands with admixtures of silver birch and rare deciduous forests. Mean and total k-NN and GBM outputs within this landscape were compared to the FPAM data. Moreover, we estimated the mean net difference between reference and model data at the stand (polygon) scale.

Land cover and tree species classification
The RF model produced has shown quite good overall accuracy (81.9%) and the kappa (74.6%) index. Land cover types "water", "forest" and "croplands" on the validation dataset were interpreted with a high level of user accuracy (93.1%, 92.2%, and 90.0%, respectively), while the largest misclassifications were met for the types "bog", "other" and "shrublands" (Table 2).  The tree classification model has shown relatively good performance (overall accuracy is 78.9%, the kappa index is 56.4%), with user accuracy ranging from 90.0% for Scots pine to 60.0% for silver birch and 80.0% for black alder. The class "OTHER" and common aspen achieved worse accuracies (23% and 29%, respectively).  Figure 5. Importance of predictors for specific tree species and in total given by the RF classification model. Figure 6. Land cover classes map (a) and tree species map within the forest mask (b) produced by the respective RF model.
Elevation was the most important variable to predict forests of black alder ( Figure 5), while TCG and TCW contributed the most to the prediction of Scots pine stands. All predictors rather weakly classified less abundant broadleaved tree species (silver birch, common aspen and other), which resulted in higher errors (Table 3). Figure  6 illustrates the outputs of both RF classification models.

Imputation of deadwood biomass stocks
Tuning the GBM algorithm has shown that the lowest RMSE (2.09 t·ha -1 , or 26% of the mean deadwood biomass stock) on the random 25% inner validation subset was achieved by a model with the following hyperparameters: number of trees = 1,000, shrinkage = 0.01, minimum number of observations in the terminal node = 15, bagging fraction = 0.5, interaction depth = 10. The other nine best GBM models produced quite similar errors (ranged from 2.087 to 2.093 t·ha -1 ). According to their hyper-parameters, the best learning rate for training the GBM in this case is quite high -0.01. As well, all the best models used a bagging fraction value of 0.5 and a minimum number of observations in the terminal node of 15. Interaction depths there are mostly high (> 6), i.e., decision trees with a low number of "branches" did not produce the lowest error. In addition, models with various numbers of CARTs utilized were almost equal in terms of RMSE on the inner validation subset (25% of the total dataset).
When predicting deadwood biomass on the training dataset without an inner validation subset, there was a lower RMSE -1.3 t·ha -1 , or 16% of the mean deadwood biomass stock. According to the validation results, all 10 k-NN models have shown rather similar performance: RMSE ranges from 1.42 to 1.51·ha -1 . The lowest RMSE was achieved by the model with k = 1 (1.42 t·ha -1 , or 17 % of the mean deadwood biomass stock). The best GBM and k-NN a b models thus were used to impute on the spatial dataset (Figure 7). Importantly, low deadwood biomass values (2-3 t·ha -1 ) were predicted by the GBM model (Figure 7b) as two or three times higher. Therefore, the k-NN model ( Figure 7a) visually has produced more "outliers" throughout the graph. A comparison of mean (with standard deviations, SD) and total deadwood biomass outputs for the entire study area is given in Table 4. The total deadwood biomass stock within the study area produced by the k-NN model is 0.34 Mt, while the best GBM resulted in similar stock (0.33 Mt, Table 4). The highest differences in the mean stock between model outputs were found for silver birch, and, with less extent, for Scots pine and black alder. Maps of predicted dead biomass stock for test landscapes (with 95% CI) are given in Figure 8, with a respective comparison in Table 5. While there were no likely large differences between model outputs for the test landscape (Figure 8), there was at least 13% total overestimation of deadwood biomass stock compared to reference FPAM data (Table 5). However, mean differences at polygon (stand) level could reach half of the mean dead biomass stock.

Challenges towards robust classification
To date, Ukraine is facing increasing challenges due to the lack of precise, statistically-based NFI data (Lesiv et al., 2018). A pilot project run in two regions (Sumy and Ivano-Frankivsk) a decade ago showed both new evidence (forest biometric parameters of stands located outside stateowned forests, deadwood stocks) on the state of local Ukrainian forests and a strong demand for further expanding such inventory (Myroniuk, 2018). However, Ukraine does not have a full national forest inventory due to a number of reasons . Since the quality of ancillary data strictly defines the performance of any classification of remote sensing data, uncertainties and mistakes always occur while working with imprecise data. In this study, FPAM data also had a limited temporal frame (2010)(2011), so only Landsat-5 TM imagery could be used for land cover classification and further processing, while the Landsat-8 Operational Land Imager archive has been available for years. FPAM data in Ukraine is also commonly recognized as biased and limited in precision, strictly focusing on management purpos-es and neglecting the ecological functions of forests (Lesiv et al., 2018). One possible suggestion is to carry out independent research of forest inventory (Bilous et al., 2017). However, the use of such an approach for larger areas as proposed in this study requires substantial time and funds. The spatial resolution of Landsat-5 TM imagery can still be considered as sufficient for the classification models while using the tools for validation purposes like Google Earth and Collect Earth. The cloudmasked mosaic, even with a limited period (one vegetation season in 2011), reliably produces a multispectral dataset without noises. The forest RF-based mask produced in this study may be compared with the forest cover map proposed by Global Forest Change (GFC, Hansen et al., 2013). The GFC mask was utilized with a 40% canopy closure threshold, which is recognized as a suitable value for distinguishing forests in Ukraine (Myroniuk, 2018). The forested area according to this data was 31.2%, thus closely matching the value produced by our RF model (32.0%).
Some challenges linked to local geographical and biophysical features were met in this study. Linear and relatively narrow forest stands are common in Ukraine (shelterbelts, across roads and rivers, etc.). Myroniuk (2018) reported that the spatial resolution of Landsat bands is too coarse for proper classification of such forests surrounded by fields (Figure 9b). However, utilization of the NDWI as a predictor can be sufficient to provide reliable distinction of forest patches in river banks from water bodies ( Figure 9a) and thus create a more precise forest mask. This task can be solved Because of the sparse distribution of silver birch stands within the study area, tree species classification (Table 3) has shown strong biases in distinction of this species from Scots pine forests. Lakyda et al. (2019) indicated that there was uncertainty in distinguishing between black alder and silver birch, since these species had similar spectral reflectance. This issue was solved by adding soil maps as an ancillary dataset, while in this study the non-spectral predictor DEM was important for delineating stands of black alder.

Reliability of deadwood estimation
Successful use of the k-NN method for forest inventory purposes throughout the world (McRoberts, 2012) has proven its capability to provide robust data imputation. However, this approach is sensitive to the quality of field-sampled data (Tomppo et al., 2017). Gradient boosting machines are commonly used for classification purposes and often compared to another ensemble method -RF, so there is some lack of information for tests of the imputation performance (Dube et al., 2014). In this study we have observed that GBM can slightly outperform the k-NN model in terms of accuracy (Figure 7). On the other hand, k-NN does not require control of as many modelling parameters, so it is more user-friendly. The GBM tuning algorithm needs several hours to run depending on the number of hyperparameter combinations and computation power. However, it allows inner validation using a training fraction parameter, k-fold cross-validation and stochastic nature through the bagging fraction hyperparameter. Because of these advantages, XGBoost (Chen & Guestrin, 2016), the next extension of the AdaBoost Figure 9. Subsets of predicted forest cover within the study area vs. actual linear stands across water bodies (a-b) and pathways (c-d). The green mask is produced by the RF model and the purple mask is created using GFC data.
(and thus GBM) framework, has shown the best accurate results in various data science competitions working with structured datasets. Relatively low attention to GBM capabilities in forest imputation tasks requires further utilization of this method to obtain more robust woody stock and biomass predictions. Possible limitations to such use, compared to the simple and classic k-NN method, could be associated with local spectral features, the size and quality of the training dataset, nature of the satellite spatial and temporal resolution.
In this study we used linear allometric models to obtain data on deadwood biomass stocks. These models were developed based on the biophysical parameters of a growing stand and thus are strictly linked to natural mortality processes in different successional stages and reflect local site productivity. However, such models cannot predict deadwood stocks after abrupt events (natural disturbances at different scales) and different management considerations common in Ukraine . A network of sample plots with a reasonable allocation scheme can be harnessed for this issue but requires data from statistical national inventory (currently available only for two regions in Ukraine), global datasets, such as Schepashenko et al. (2019), or substantial costs to carry out the LiDAR measurements (Gonzalez et al., 2018). While we estimated deadwood indirectly through satellite images, more precise data could be directly obtained with the use of laser scanning (Zald et al., 2016).

Conclusions
Deadwood stocks are strictly related to various issues ranging from biodiversity conservation to carbon management, but hitherto are not reflected in operational forest management data. However, those can be considered in the forest inventory agenda of countries with developing or transition economy utilizing free available remote sensing data. To meet climate mitigation and ecosystem diversity maintenance targets, there is credible capability to apply a bundle of novel ensemble machine learning methods, e.g. Random Forest and Gradient Boosting Machines, with satellite multispectral data. These non-parametric approaches, together with the traditional k-NN regression estimator, could be utilized in different cases, varying in their interpretability, tuning suitability, and biological linkages with real data.
Assessment and mapping tasks of deadwood in Ukraine require urgent implementation of a national statistical inventory, preferably with utilization of diverse sources of data (like point clouds provided by LiDAR). Also, the proposed approach matches with the necessity to estimate dead biomass, while this ecosystem compartment is crucial to maintain forest resilience under the changing global climate.