A percentile-based estimator for the log-logistic function: Application to forestry

. Developing a simplified estimation method without compromising the performance of the distribution is germane to forest modelling. Few estimation methods exist for the Log-Logistic distribution and are relatively complex. A simplified estimator for the Log-Logistic parameters will increase its application in diameter distribution yield systems. Therefore, in this study, a percentile-based estimator was applied for the Log-Logistic distribution. The Kolmogorov-Smirnov, Anderson-Darling and Cramer-von Mises statistics were used to evaluate the method in two natural forest stands and two monospecific plantations of Gmelina arborea Roxb. and Tectona grandis Linn. f. in Nigeria. The parameter recovery model (PRM) and parameter prediction model (PPM) were used to predict the diameter distributions of independent stands of G. arborea and T. grandis . The results showed that the percentile estimator did not compromise the quality of fits of the Log-Logistic function across the four forest stands and are comparable to the maximum likelihood estimator. The 25th and 75th, and 40th and 80th were the best sample percentiles for the estimator. The predicted diameter distributions of G. arborea and T. grandis stands from the PRM and PPM were reasonable and compare well with the observed distribution. Thus, either of the models can be incorporated into the growth and yield system of forest stand management.


Introduction
Statistical functions are often used to model diameter distributions of forest stands (e.g., Poudel & Cao, 2013;Gorgoso-Varela & Rojo-Alboreca, 2014;Ogana et al., 2017), and have become an important part of forest management and planning. Diameter distribution forms the link between stand level and tree level models (Nord-Larsen & Cao, 2006). It can be used to provide information on the growth, value and volume production of forest stands (Gorgoso et al., 2012;Ogana et al., 2017).
Some important statistical functions that have been adequately used to model diameter distribution of forest stands include: the beta, gamma, Johnson's SB and Weibull; more recently the Burr XII and Logit-Logistic functions. Extensive forestry literature is available on the application of these statistical functions. For example, Poudel & Cao (2013) compared several methodologies to estimate the parameters of Weibull for tree diameter characterisation. Similarly, Gorgoso-Varela & Rojo-Alboreca (2014) evaluated different estimation methods for fitting the Johnson's SB and Weibull functions to pedunculate oak (Quercus robur L.) and birch (Betula sp.) stands in Northwest Spain. In Nigeria, Ogana et al. (2017 modelled the diameter distributions of Gmelina arborea Roxb. and Eucalyptus camaldulensis Dehn. plantations, respectively. Recently, Sun et al. (2019) characterised tree diameter distributions for mixed stands of oak and pine in China. Other studies include: Zhang et al. (2003), Cao (2004), Wang & Rennolls (2005), Palahi et al. (2007), Zheng & Zhou (2010), Gorgoso et al. (2008Gorgoso et al. ( , 2012, etc. Although research on diameter distribution has spanned several decades in forestry, there are however, some simplified statistical distributions such as the Log-Logistic which have not been sufficiently applied to forestry. The two-parameter Log-Logistic distribution (hereafter referred to as LogL) is a continuous probability distribution characterised by shape (α) and scale (β) parameters. The LogL has been applied to several disciplines, including actuarial science, computer science, economics, hydrology, medicine, etc. The LogL also known as Fisk distribution in economics was applied to model income distribution (Kleiber & Kotz, 2003). It has also been used in survival analysis to model accelerated failure time (Bennett, 1983). In hydrology, Ashkar & Mahdi (2006) modelled precipitation and the rates of stream flow using the LogL function. The application of LogL to forestry has been limited. Ogana & Dau (2019) used the LogL function to derive crown distributions from diameter at breast height for Parkia biglobosa Benth. stands in Nigeria.
The choice of distribution function depends on its relative flexibility, simplicity in terms of model expression, ease of parameter estimation and ease of computing proportion of trees in diameter classes (Burkhart & Tomé, 2012). The LogL has a simplified expression and a closed-form cumulative distribution function (cdf), thus, it does not require numerical integration to estimate the proportion of trees in diameter classes. Furthermore, different estimation methods such as the maximum likelihood estimation (MLE), generalized moments, and the least square have been used to estimate the parameters of LogL with different levels of success (e.g., Kleiber & Kotz, 2003;Ashkar & Mahdi, 2006;Ogana & Dau, 2019). These methods vary in complexity; and as such, there is a need for developing a simplified method of estimating the parameters of LogL for characterising tree diameter distributions. The percentile estimator is a good substitute to the MLE (Clutter et al., 1983). This method has not been used to estimate the parameter of LogL in forestry. A simplified estimation method for the LogL function will increase its application in diameter distribution yield systems.
Generally, diameter distribution yield systems predict values for the parameters of distribution functions (Clutter et al., 1983). The parameters are related either directly with stand variables such as the quadratic mean diameter, basal area per ha, stand age, number of trees per ha, site index (or the average height of dominant trees), etc. (i.e., parameter prediction model) or diameter moment/percentiles derived from forest stand variables (parameter recovery model). This can be used for implicit prediction of future yield. To date, no published forestry literature exists that relate the parameters of LogL distribution with stand variables to the knowledge of the researcher. Therefore, the objectives of this study were to evaluate a percentile-based estimator for the fitting of the LogL function; and to develop models that could be used to derive the parameters of distribution using Nigerian forest stands as a case study.

Methodology Data
The data used in this study were collected from two natural forest stands (multiple species composition with indeterminate age structure) and two monoculture stands of Gmelina arborea Roxb. and Tectona grandis Linn. f. One of the natural stands is situated in Ikrigon Forest Reserve, Cross River State, Nigeria and lies between Latitude 6°17.597′-6°17.862′ N and Longitude 8°35.597′-8°35.276′ E with an area of 542.7 ha. The second natural forest and part of the G. arborea stand are in Oluwa Forest Reserve, Ondo State, Nigeria. The natural forest lies between 6.83°-6.91° N and Longitude 4.52°-4.59° E with an area of 629 km 2 . The G. arborea stand lies between Latitude 6°55′-7°20′ N and Longitude 3°45′-4°32′ E and occupies an area of 87,816 ha (Onyekwelu, 2001). T. grandis and G. arborea data were also collected from Omo Forest Reserve. Omo Forest Reserve is situated in Ogun State, Nigeria between Latitude 6°35′-7°05′ N and Longitude 4°10′-4°19′ E with an area of 130,500 ha.
Data were collected from 75 and 35 temporary sample plots (TSPs) in G. arborea and T. grandis stands, respectively; while 10 and 7 TSPs were used in Ikrigon and Oluwa natural forests, respectively. The plot sizes ranged from 0.04 to 0.25 ha, to achieve at least 30 trees in a plot. All living trees within the plots of the monoculture stands were measured, while in the natural forest, only trees with diameter at breast height (Dbh) ≥ 10 cm were measured. Details of the sampling procedures used for the data collection in Ikrigon natural forest, Oluwa natural forest, G. arborea and T. grandis can be found in T.E. Adeniyi (unpubl.) and Ogana et al. (2015Ogana et al. ( , 2017, and Chukwu & Osho (2018), respectively. The descriptive statistics of the Dbh, total height (Ht), quadratic mean diameter (dg), dominant height (i.e., average height of the 100 tallest trees per ha, H o ), basal area per ha (G), density (N) and stand age (A) are presented in Table 1.

Log-Logistic (LogL)
The probability density function (pdf) and cumulative distribution function (cdf) of the LogL are expressed as: The LogL has a closed-form cdf as to facilitate the estimation of tree proportion in different diameter classes without resorting to numerical integration.

The percentile estimator
The idea of the percentile estimator for the LogL stems from the principle developed for the Weibull distribution by Clutter et al. (1983). The idea is, if two sample percentiles are known, each can be related to its corresponding LogL cdf, and the resulting equations are iteratively solved to get the estimates of shape (α) and scale (β). Thus, the equations were developed as follows: "Let X p represent the p-percentile value in the sample so that 100p-percentile of the sample values are less than X p ". Therefore, from the definition of the LogL cdf: Equation (3) was then solved for X p to give equation (4) The parameters of the LogL distribution were derived from equation (4). Since this is the first attempt of applying this percentile-based estimator for the LogL, several sample percentiles were considered. The following sample percentiles were ana lysed:  (4), method 1 was derived as: where X 25 and X 75 are the diameters corresponding to the 25 and 75%, respectively. Equation (5) and (6) were solved iteratively using the Newton-Raphson algorithm of the 'lmfor' package (Mehtätalo, 2017) implemented in R (R Core Team, 2017). A similar procedure was used for M2 to M8 by substituting the various percentiles and solving the system of equation iteratively. To ascertain the suitability of the percentile method, it was compared with the well-known maximum likelihood estimator (MLE). The MLE involves taking the partial derivative of the log-likelihood function of LogL with respect to the pa-rameters (α and β) and setting the expression equal to zero. The resulting function is solved by numerical algorithm to get the estimates of the parameters. The log-likelihood function (ℓ ) of the LogL is expressed as: the partial derivative of equation (7) with respect to α and β will give equations (8) and (9), respectively: is diameter at breast height and i ranged from 1 to n, other parameters are previously defined in equation (1). Since there are no explicit solutions to equations (8) and (9), the estimates were obtained numerically using the mledist function from the fitdistrplus package (Delignette-Muller & Dutang, 2015) implemented in R (R Core Team, 2017). The R code for the MLE is presented in Appendix 1.

Evaluation statistics
Three goodness-of-fit statistics were used to evaluate the suitability of the percentile estimator. For each method, the Kolmogorov-Smirnov (KS), Anderson-Darling (AD) and Cramer-von Mises (W 2 ) statistics were computed. The smaller the statistics are, the better the method.

KS statistic:
AD statistic: W 2 statistic: where F(x i ) = observed cumulative frequency distribution for x i (i ranged from 1 to n), F 0 (x i ) = theoretical cumulative frequency distribution.

Ranking of methods
The relative rank introduced by Poudel & Cao (2013) and recently used by Sun et al. (2019) was used for this study. It is expressed as: where R i = relative rank of method i (i = 1, 2, …, m); m = number of methods assessed (8 percentiles plus the MLE, i.e., 9 altogether), S i = evaluation statistics value of method i; S min and S max = minimum and maximum value of S i , respectively. The relative rank is a real number between 1 (best) and 9 (worst). For each method, the relative ranks were summed across the three goodness-of-fit statistics to ascertain the best percentile point for the estimator.

Modelling the diameter distribution
A scatterplot was used to establish the form of association between the dependent (LogL parameters and diameter percentiles) and independent variables (stand variables, e.g., quadratic mean diameter (dg), dominant height (H o ), stand density (N), basal area per ha (G), stand age (A), etc.). Thereafter, a stepwise linear regression technique was used in the modelling process. This ensured that only the best predictor variables were included in the models. Relating the parameters directly with stand variables is known as the parameter prediction method (PPM); while modelling the diameter percentiles with stand variables from which the LogL parameters could be derived is referred to as the parameter recovery model (PRM). Both PPM and PRM were evaluated to determine the best option for the LogL distribution. The models were of this form: where Y i is a vector of the dependent variables (parameters of the LogL distribution and various diameter percentiles), X i is the vector of the predictor variables (quadratic mean diameter, dominant height, stand density, basal area per ha, stand age, etc.) and is the error term.
Models were only developed for the G. arborea and T. grandis stands because a sufficient number of plots is required for effective modelling. There were more than 30 plots from these stands compared to the few plots in the natural stands (10 and 7 in Ikrigon and Oluwa, respectively). Since sufficient data are not available for modelling, a k-fold cross-validation was used to ascertain the predictive ability of the models (Sileshi, 2014). Different fit indices, such as root mean square error (RMSE), coefficient of variation (CV), corrected Akaike Information Criterion (AICc), Bayesian Information Criterion (BIC) and adjusted coefficient of determination ( � � ) were used to evaluate the cross-validation results.

Fit of the LogL from percentile and MLE
The estimated parameters of the LogL from the different percentiles and those from the MLE are presented in Table 2 and Table 3 for the natural forests and plantations, respectively. Little variation was observed in the values of the parameters of the LogL estimated from the MLE and percentiles methods. Evaluation of the performance of the methods based on the fit statistics showed that MLE had the smallest relative rank sum (R) of 3.79 and 3.32 for Ikrigon and Oluwa natural forests, respectively. This was followed by method 4 (M4: 40th and 80th diameter percentiles), method 1 (M1: 25th and 75th) and method 7 (M7: 40th and 60th) with respect to Ikrigon forest. In Olwua natural forest, M4 had the largest R. The Cramer-von Mises (W 2 ) statistics of M1, M4 and M7 were smaller relative to the value of the MLE in Ikrigon natural forest. Just as with KS and AD, W 2 statistic is an important index that gives the overall squared difference between empirical distribution and the theoretical cdf (Wang, 2005). Similarly, in G. arborea stand, the MLE had the smallest relative rank sum of 3.05 which was closely followed by M4 with a value of 3.47. The performances of M1 and M7 were equally good. In the case of T. grandis stand, M4 had the smallest relative rank sum of 3.00 whereas MLE had 3.52. This was followed by M1 and M5 (30th and 70th diameter percentiles). Method 2 (17th and 97th diameter percentiles) with the largest relative rank sum had the worst performance in three out of the four stands. The 17th and 97th sample percentiles were recommended for the Weibull distribution by Shiver (1988). Thus, the study shows that these diameter percentiles are unsuitable for the LogL distribution. Gorgoso et al. (2007) also observed a poor fit with the percentiles for the Weibull distribution in the birch-dominated stands of Northwest Spain. Also, the most efficient sample percentiles -24th and 93rd recommended for the Weibull function by Dubey (1967) performed poorly (M3) for the LogL function.
The graph of the relative frequency of trees against the equal diameter class of the best diameter percentiles and MLE for the four stands are shown in Figure 1a, b, c and d. The graph showed that the estimation methods produce a good representation of the four stands, that is, the reverse J-shaped and the Gaussian distribution typical of natural forests (uneven-aged) and plantations (even-aged), respectively. However, the relative frequency of trees was underestimated in the diameter class of < 20 cm in Oluwa natural forest ( Figure  1b) and overestimated in the middle diameter class of G. arborea stand.
The results obtained with the LogL function in this study were better than those reported by Ogana & Wali (2018) for similar natural forest stands. Studies on LogL in forestry have used either the MLE (Ogana & Wali, 2018) or the least square regression method (Ogana & Dau, 2019). The least square method may seem handy, it is however, time-consuming. One important advantage of the MLE is the ability to evaluate the reliability of the estimates of the parameters using the standard error which can be derived from the hessian matrix. However, the MLE requires specifying the log-likelihood function for the distribution which may not converge due to poor initial guess for the parameters. The MLE is also affected by sample size (Shiver, 1988). Contrary to the MLE and least square method, the percentile estimator is more simplified, handier and does not require a complex iterative procedure. Therefore, when considering simplicity, vis-à-vis the ease of estimating the LogL parameters, the percentile estimator can be considered as a good alternative.

Modelling the diameter distribution
Two sets of models for predicting the diameter distributions of G. arborea and T. grandis stands were developed. The first set of models relate the best diameter percentiles with stand variables (parameter recovery model, PRM) in a simple linear relationship wherein the shape and scale parameters of the LogL were recovered. The quadratic mean diameter (dg) was the only suitable predictor variable for the dependent variables (diameter percentiles). The second set of models relate the LogL parameters directly with stand variables (parameter prediction model, PPM). The scale (β) parameter was predicted from the quadratic mean diameter (dg) in a simple linear relationship. The shape (α) parameter was predicted from the inverse of the natural logarithm of the ratio of the 25th diameter percentile and dg i.e., where a and b are the estimated parameters. The parameters and standard errors of the models are presented in Table 4. The results of the 5-fold cross-validation showed that both model forms had a relatively low RMSE, CV, AICc, BIC and a high R ̅ 2 . Thus, they can be relied on for prediction purpose. Sun et al. (2019) also found dg as one of the most important predictors of the Weibull parameters in the mixed stand of oak and pine. The graphs of the application of the PRM and PPM to independent stands of G. arborea (0.08 ha) and T. grandis (0.125 ha) are shown in Figure 2a and  The performances of the PRM and PPM were relatively the same for the LogL distribution. One major argument with the PPM is the difficulty of developing models that would account for high variations in the parameters of distributions (Borders & Patterson, 1990). While this statement may be valid for the shape parameters of some distributions, such as the well-known Weibull, it is not application to the LogL function. The models predicting the shape and scale parameters of the LogL explained more than 0.75 (> 75%) of the variation in the parameters for the stands. Thus, both the PRM and PPM could be used for the LogL in the G. arborea and T. grandis stands.

Conclusion
Developing a simplified estimation method without compromising the performance of the distribution is germane to forest modelling. The percentile estimator introduced for the Log-Logistic did not compromise the quality of fits of the model across the four forest stands in Nigeria and are comparable to the maximum likelihood estimator. The 25th and 75th, and 40th and 80th were the best sample percentiles for the estimator in the forest stands. The predicted diameter distributions of G. arborea and T. grandis stands from the PRM and PPM were reasonable and compared well with the observed diameter distribution. This implies that both the PRM and PPM can be incorporated into the growth and yield system of forest stand management.