LR and RF (072521)


 A common approach when analyzing the driving factors of *** is the use of a logistic regression (LR) model, which can predict *** with respect to different @@@ variables. LR is powerful in terms of predictability, but is limited by the assumptions of normality and linear relationships. Non-linear and complex relationships often exist between biophysical and social data. Random Forest (RF), a non-parametric technique, is thought to be a more flexible method to  evaluate complicated interactions among variables as it can automatically select the important variables, regardless of how many variables are used initially, and does not encounter the problem of over-fitting (Breiman 2001). However, RF also has some limitations, as it does not calculated regression coefficients or confidence intervals (Cutler et al. 2007) and the individual tress cannot be examined separately (Prasad et al. 2006).

In order to conduct a comprehensive analysis of the driving factors of *** occurrence, we applied two different statistical methods: logistic regression (LR) and Random Forest (RF). We considered the potential driving factors of *** in two broad categories: 'climate factors' and local factors', which include vegetation, topography, infrastructure and socioeconomic factors. Analysis was conducted in  two phases. In the first phase, we used LR and RF methods to analyze each category of potential driving factors separately. In the second phase, the significant factors from each category were combined and then analyzed together using both LR and RF methods. (...)

Dependent variable

(...) The LR and RF models require the target variable be binary. In order to meet this requirement, (...) For the purpose of analysis, we assigned a value of 1 to *** and 0 to not ***.

Models and computing procedure

Logistic regression (LR)

Logistic regression ~ 

log it(Pi) = ln(Pi/1-Pi) = alpha_0 ~

where P is the probability of forest fire occurrence, n is the number of covariates, (alpha_0, alpha_1 ~ ) are the coefficients for each variable and (x1, x2, ... , xn) are variables that influence the occurrence of forest fires. 

Comments