Options for estimating horizontal visibility in hemiboreal forests using sparse airborne laser scanning data and forest inventory data


 Horizontal visibility v in hemiboreal forest transects was measured in the field and then predicted, both from forest inventory (FI) data and from airborne laser scanning (ALS) data. Stand density N and mean diameter at breast height D were used as arguments in an FI predictive model assuming Poisson distribution of trees on a horizontal plane. It was found that a lack of FI data on forest regrowth and understorey trees caused v to be overestimated. Point cloud metrics of sparse ALS data from summer 2017 and spring 2019 were used as predictive variables for v in regression models. The best models were based on three variables: the 10th percentile of the point cloud height distribution, relative density of returns in a horizontal layer ranging 0.7–2.2 m above the ground, and canopy cover. The models had a coefficient of determination of up to 67% and a residual standard error of less than 25 m. In forests in which fertile soil produces rapid height growth of understorey woody vegetation after recent thinning, visibility was found to be substantially overestimated because the understorey was not detected by the lidar measurements.


Introduction
of visibility may be useful for planning initial placements of camera traps for wildlife detection and monitoring (Hofmeester et al., 2017), and for predetermining the visibility of signs, information panels, or media screens, in places where visibility may be decreased by vegetation (Chmielewski & Tompalski, 2017). For the tactical planning of military operations, horizontal visibility is an important factor, with short observation zones characteristic in wooded, as opposed to open terrain, and with visibility between trees a relevant determinant for orientation, movement, and fields of fire (ATP-3.2.1., 2018). In virtual platforms of communication, as proposed by Korjus et al. (2017) for online-streaming public participation in forest management planning, visibility plays an important role, as parts of a forest will be either exposed or hidden, thereby influencing the participants' overall impression of the constructed landscape. Preliminary path planning according to visibility will help to optimize field operations in the rapidly developing technology of unmanned ground vehicles and autonomous robots (Li et al., 2020), designed to drive through forested landscape or to carry out particular forest management tasks.
If a flat terrain is assumed, then the question of "How far can one see in a forest?" may be answered with exact formulas (Adiceam, 2016). For this it is necessary to assume a particular pattern of tree locations, additionally taking as known quantities the density N of trees (as the number of trees per unit area, in SI notation m -2 ), and the diameter of blocking elements (trunks) at the sighting level. If plantations are excluded, the tree location pattern tends to be random or grouped. A random distribution of tree locations is therefore assumed, with (for example) a Poisson distribution found suitable by Nilson (1992) for predicting the probability of zero contacts with tree trunks along a line of sight with length l. Such models can be applied for the prediction of gap probability, provided the available forest management inventory data include stand density N and diameter of tree stems at breast height D. Gap probability measurements can be carried out by photographing vegetation against artificial coloured backgrounds, or alternatively by using terrestrial laser scanners (Straatsma et al., 2008;Zasada et al., 2013).
In contrast to gap probability (a well defined quantity, amenable to instrumental measurement), visibility within a forest, as recorded visually by a person, depends on several factors. Drummond & Lackey (1956) carried out visibility measurements by visually targeting a standard green cylinder 1.8 m in height and 45 cm in diameter. The authors defined continuous visibility v c as the greatest distance to which a quiet, erect, and stationary person can be kept in constant view as the observer recedes. They report a v c range of 17-116 m for US forests, finding v c to depend on illumination, and to be a function of tree species composition, season, and stand structure. As a proxy for v c , the authors propose forest height H, with the caveat that the relationship is nonlinear, depending on tree species composition and on the availability of light, as an influence both on branch survival on dominant trees and on forest undergrowth. Anstey (1964) defines visibility v pi as the maximum distance for positive visual identification, in the sense of permitting visual determination of specific characteristics (size, shape, colour, and position) of a remote object. Under this definition, the detection of merely a reflection, shadow, outline, flash, or movement is insufficient for a measurement of v pi . Anstey finds the v pi of a motionless standing object to be the smallest in a dense tropical forest (falling in the range of 3-17 m). In high-and mid-latitude mixed forests, Anstey finds motionless targets to become invisible at less than about a 50 m distance in 75% of cases. Anstey points out that field-observed values of v pi are dependent on the relative movement of the observer and target, and also on the observer's level of experience.
In the present study, we seek options for predicting field measurements of hori-zontal visibility v fm in hemiboreal forests in Estonia. Following Drummond & Lackey (1956), we define v fm as "the greatest distance to which a quiet, erect, stationary man can be kept in constant view as the observer goes away from him". We (1) compare field-measured v fm to values predicted from forest management inventory (FI) data, and (2) construct regression models to search for informative predictive variables from the list of metrics of sparse point clouds obtained by airborne laser scanning (ALS).

Selection of sample stands and measurements of visibility
The Estonian forest resource register database was used to select forest stands for establishing field transects for visibility measurements. The database gives for each stand element within each stand its basic FI variables (age, density, mean tree height, and diameter at breast height). An element of a forest stand is defined as a combination of tree species, together with its membership in a tree layer. For each stand, visibility was first calculated using 1.5 / (D • N) as a proxy, with D given in metres and N given as the number of trees per square metre. This preliminary estimate allowed the selection of a sample of 100 stands, each large enough to establish a field transect. The sample set was collected to potentially cover a wide range of visibility values in the forest (as illustrated in Appendix 1).
The observer was positioned at a point with medium visibility in each direction in the forest, thereby defining the starting point of a transect. The observed subject, wearing camouflage, then started pacing away from the observer, stopping when the observer could no longer see movement. The visibility determination was carried out in one direction. The direction was chosen more or less randomly. The observer point, on the other hand, was selected in an area for which the average density of forest was typical of the given stand. The location coordinates of the observer and the observed subject were determined with the Garmin GPSmap 60CSx receiver.
For the prediction of horizontal visibility with a theoretical model according to FI data, we subselected those stands that were inventoried in 2014 or later. To these we added some stands with age over 80 years, assuming in their case only minor changes, and slow growth, since the most recent inventory. These old stands were inventoried in or at some point subsequent to 2009, except four stands inventoried still earlier, in 2003, but growing on poor soil. The complete subsample comprised 70 stands, out of the original sample of 100 (Table 1).

Prediction of horizontal visibility using forest inventory (FI) data
Let (x 0 , y 0 ) be a random point in a forest comprising similar trees, at density N (m -2 ), each of diameter at breast height D (m), located according to the Poisson distribution (Nilson, 1992). From the point (x 0 , y 0 ) we draw a line of length l (m) to end point (x 1 , y 1 ). We now seek the probability that no trees occur on the area S, consisting of the rectangle of area D· l and the two halfdisks corresponding to trees at the ends of the line. The area of this restricted zone is The probability P that the entire area is free from trees (free line-of-sight x greater than l) is, under the Poisson distribution, However, the contribution to S of the π • D 2 /4 term is small (amounting to just 1-2% of S), and so may as a permissible simplification be dropped. With this term dropped, the resulting expression continues to strictly predict values between 0 and 1. In the case of multiple tree classes (stand elements), D i • N i values are summed before the exponential is evaluated. For each stand element, the height from ground to live crown base was predicted with a regression model (Lang, 2001). If the height to live crown base was below 1.5 m, then the crown radius (from model (14) in Lang et al. (2007)) was used instead of D as an effective diameter. This recourse to an effective diameter accounted for the influence of lower tree layers, notably from Picea abies (L.) H. Karst. (a species frequently present in FI data). For each forest stand, we found the 90th, 75th, and 50th percentile of the FI-based horizontal visibility v FI , by incrementing l in 1 m steps from 0 to 300 m in our simplified version of equation (2). Exact values of l for any probability P may be calculated as l p = -(ln P)/(D • N).

Airborne laser scanning (ALS) data and models
Airborne lidar data, from the Riegl VQ-1560i scanning system and a digital elevation model, were provided by the Estonian Land Board. The point density of the archived lidar data was on average between 0.15 m -2 and 2.0 m -2 , with the pulse footprint at ground level about 50 cm, and the scanning nadir angle confined to within 30°. We used the FUSION toolbox (McGaughey, 2020) to calculate point heights over ground, to cut point clouds (with a 10 m buffer flanking the transect) out of ALS data, and to calculate point cloud metrics. The point cloud height distribution was calculated using all points, and then again using only the points with a height (over ground) greater than 1 m. The relative density of returns was calculated for horizontal layers with point heights over ground in the ranges 0.7 m ≤ h r < 2.2 m, 2.2 m ≤ h r < 5.0 m, 5.0 m ≤ h r < 10.0 m, and h r ≥ 10.0 m. As a proxy for canopy cover, the relative density K of all returns with the height threshold of 2 m over ground was calculated. We included all pulse returns because in dense forests the K-value based on first returns is not sensitive to differences in canopy cover (Arumäe & Lang, 2018).
In seeking predictor variables for regression models, we found, contrary to Drummond & Lackey (1956), that the upper height percentiles of the point cloud, used in FI practice to predict forest height, lack a correlation with v fm . We constructed four regression models (with, in what follows, ε the model residual error, and a, b, c, and d the model parameters). The first was a nonlinear model, based on the relative density ρ L1 of pulse returns found in the first layer, where 0.7 m ≤ h r < 2.2 m: The second was a linear model based on the 10th height percentile H P10 of the point cloud, with points near the ground excluded:  (6) Parameter values were estimated in the R statistical computing environment (R Core Team, 2018), using procedure nls() for nonlinear and lm() for linear modelling.
We excluded one outlier from the empirical dataset, a Picea abies stand growing on a fertile Hepatica type according to Lõhmus (2004). The outlier stand had been thinned before 2017, and at the time of field measurements there was already abundant regrowth of Corylus avellana L., Prunus padus L., and Tilia cordata Mill. with heights of 1.5-2.0 m, substantially limiting visibility. This regrowth was not captured in ALS data, as in 2017 the trees were still too short and in the 2019 spring there were no leaves to trigger lidar pulse returns. With this outlier discarded, data from 99 field transects remained for the construction of ALS-based models.
The field observations from spring and from the end of summer were used as a joint empirical dataset. We tested a dummy variable, indicating season, in the multiple linear models. The parameter for the variable was found to be significant, showing the measured springtime v fm to be on average 10-15 m greater than for the transects measured in summer. This outcome was expected, because deciduous trees were present in the forest understorey. However, on scatter-plots of measured and predicted values, there were no distinct clusters corresponding to the choice of transect measurement season.

Model statistics
The residual standard error of the models was calculated as where n DF is the number of degrees of freedom, y is the observed value, and ŷ is the predicted value. The root mean square error was calculated to assess test results of regression models as where n is the number of observations. To test the model performance, we fitted parameter values using ALS data from 2017 summer, and again from 2019 springtime data, in each case then applying the model to the other dataset. An estimate of the determination coefficient R 2 for the nonlinear regression model (3) was calculated as where SSE = Σ(ŷ -y) 2 and SSY = Σ(ŷ -ȳ ) 2 , with ȳ as the mean of observed values.

Results
The relationship between the measured visibility and visibility as predicted from FI data was found to be somewhat scattered (Figure 1). The measured visibility v fm fitted best with the 75th percentile of the predicted values, with values in the 90th percentile values lying overall above, and values in the 50th percentile values lying overall below the 1:1 line. Field measurements made in spring and in summer were equally scattered and did not form distinct clusters. In many cases the predicted visibility values in the 75th percentile were overestimated in comparison with the overall relation-ship. This was attributed to the failure of the FI database to give information on forest understorey woody vegetation. Figure 1. Observed visibility and the predicted 75th percentile of visibility using FI data. Transects were measured in April-May (black) and in September (green).
In the case of visibility predicted from ALS data, all the regression models (Table  2) were found significant. The variability explained lay in the range of 53-67%, and the residual standard error in the range of 21-28 m. The significance and values of parameters did not depend on the choice between summer and springtime ALS data. A cross-check of models on ALS data from vegetation periods other than those used for model fitting revealed almost no signs of the clustering that would be expected when mixing field measurements from springtime and summer (Figure 2), with the value of RMSE less than 23 m in both cases. A small overestimation of visibility occurred in denser stands when a model based on summer ALS data was applied to springtime data (Figure 2a).

Discussion
At the initial state of a forest stand after its establishment by planting or through natural regeneration, the trees are small, and their early growth is not limited by competition with other trees. The stems are small, with visibility blocked mainly by needles and leaves. As time passes, trees grow taller, the canopy closes, and competition for light and nutrients drives changes in crown shapes and in the location of canopy foliage. Lower branches die, while for some time remaining attached to trunks. With insufficient light, the mortality of trees increases, as does the height to the live crown base. These changes, in turn, create more open space. A point is eventually reached at which the amount of photosynthetic energy transmitted through the upper canopy layer is sufficient to support regrowth of shade-tolerant bushes and trees in the understorey. In the case of managed forests, thinning alters the canopy cover, increasing the light available for lower vegetation layers. The feedback process, driven by light availability, is probably the reason why forest height or canopy cover alone were found not to be sufficient variables for the prediction of horizontal visibility. It was assumed that horizontal visibility determinations, as made by the human eye, would be best predicted by combining information on tree density, crown dimensions, and tree location patterns. The needed information is only partly available in FI databases: FI records are not updated each year, and information on regrowth is additionally limited by the concentration of FI interest on the dominant upper canopy. In our study, we found the 75th percentile of visibilities, as predicted by a theoretical model from FI data, to be in concordance with field measurements. However, in many stands the predicted visibility distance was found to substantially exceed the measured distance. The discrepancy was attributed to a lack of FI information on regrowth.
Field measurements of visibility using visual tracking are subject to many influences, including not only environmental conditions and stand structure, but also the observer's level of experience and the relative movement of observer and target. This combination of influences introduces random error, and also possibly systematic error, into the data, and is probably one reason why the field measurements from spring and summer did not form clusters in our analysis. We do not have repeated measurements on the same plots with leaf-on and leaf-off conditions. However, by using the measurement season as a factor in regression modelling, we were able to determine that on average the visibility distance was greater for the springtime measurements. The increase was probably driven on the one hand by the better illumination conditions under the forest when the upper canopy layer was without leaves, and on the other hand (in a particularly direct way) by the presence of shade-tolerant deciduous broadleaf tree species in the forest understorey.
The lack of FI database information regarding the forest understorey can be overcome by instead using ALS data. In ALS datasets, the pulse returns are triggered mainly by the upper dominant canopy layer and (because laser scanners are designed for measuring elevation of the terrain surface) by the ground. Properties of the point cloud are influenced by flight altitude, pulse incidence angle, and the extent to which the scanner is capable of recording multiple returns per emitted pulse, and also by canopy depth and canopy density. We found that up to 70% of variability in the measured horizontal visibility could be described by a regression model based on three variables: canopy cover, the relative proportion of returns in the lower layer of the point cloud, and the 10th percentile of the point cloud height distribution. The remaining variability is due to uncertainties both in lidar and in field measurements (including the definition and the recording of field-measured visibility; errors in transect location coordinates; local variations in point cloud properties resulting from lidar scan nadir angle; and the phenological status of each given transect, whether at the time of lidar flight or at the time of fieldwork). Two factors decrease the probability of obtaining accurate data for a key determinant of horizontal visibility, the understorey vegetation: the interception of lidar pulse energy by the upper canopy, and the structure of the lidar-system internal software, which searches preferentially for the last possible return position near the ground surface. Nevertheless, our tests indicate that the proposed models based on ALS data are applicable both to measurements taken at leaf-off and leaf-on time, permitting the use of the models in practical applications where estimates of horizontal visibility within a forest are needed.