A study on the effects of unbalanced data when fitting logistic regression models in ecology
Primer Autor |
Salas-Eljatib, Christian
|
Co-autores |
Fuentes-Ramirez, Andres#Gregoire, Timothy G.#Altamirano, Adison#Yaitul, Valeska
|
Título |
A study on the effects of unbalanced data when fitting logistic regression models in ecology
|
Editorial |
ELSEVIER SCIENCE BV
|
Revista |
ECOLOGICAL INDICATORS
|
Lenguaje |
en
|
Resumen |
Binary variables have two possible outcomes: occurrence or non-occurrence of an event (usually with 1 and 0 values, respectively). Binary data are common in ecology, including studies of presence/absence, alive/dead, and change/no-change. Logistic regression analysis has been widely used to model binary response variables. Unbalanced data (i.e., an extremely larger proportion of zeros than ones) are often found across a variety of ecological datasets. Sometimes the data are balanced (i.e., same amount of zeros and ones) before fitting the model, however, the statistical implications of balancing (or not) the data remain unclear. We assessed the statistical effects of balancing data when fitting a logistic regression model by studying both its statistical properties of the estimated parameters and its predictive capabilities. We used a base forest-mortality model as reference, and by using stochastic simulations representing different configurations of 0/1 data in a sample (unbalanced data scenarios), we fitted the logistic regression model by maximum likelihood. For each scenario we computed the bias and variance of the estimated parameters and several prediction indexes. We found that the variability of the estimated parameters is affected, with the balanced-data scenario having the lowest variability, thus, affecting the statistical inference as well. Furthermore, the prediction capabilities of the model are altered by balancing the data, with the balanced-data scenario having the better sensitivity/specificity ratio. Balancing, or not, the data to be used for fitting a logistic regression models may affect the conclusion that can arise from the fitted model and its subsequent applications.
|
Tipo de Recurso |
Artículo original
|
Description |
This study was supported by the Chilean research grant Fondecyt No. 1151495. AFR is supported by a Postdoctoral Scholarship from Vicerrectoria de Investigacion y Postgrado, Universidad de La Frontera, Temuco, Chile.
|
doi |
10.1016/j.ecolind.2017.10.030
|
Formato Recurso |
pdf
|
Palabras Claves |
Statistical inference# Model prediction# Logit model# Binary variable# Bias# Precision
Statistical inference# Model prediction# Logit model# Binary variable# Bias# Precision
|
Ubicación del archivo |
http://dx.doi.org/10.1016/j.ecolind.2017.10.030
|
Categoría OCDE |
Biodiversity Conservation# Environmental Sciences
|
Materias |
Inferencia estadística# Predicción de modelos# modelo logístico# variable binaria# Inclinación# Precisión
Inferencia estadística# Predicción de modelos# modelo logístico# variable binaria# Inclinación# Precisión
|
Id de Web of Science |
WOS:000430634500051
|
Título de la cita (Recomendado-único) |
A study on the effects of unbalanced data when fitting logistic regression models in ecology
|
Identificador del recurso (Mandatado-único) |
Artículo original
|
Versión del recurso (Recomendado-único) |
version publicada
|
Editorial |
ELSEVIER SCIENCE BV
|
Revista/Libro |
ECOLOGICAL INDICATORS
|
Categoría WOS |
Conservación de la Biodiversidad# Ciencias Ambientales
|
ISSN |
1470-160X
|
Idioma |
en
|
Referencia del Financiador (Mandatado si es aplicable-repetible) |
ANID FONDECYT 1151495
|
Descripción |
This study was supported by the Chilean research grant Fondecyt No. 1151495. AFR is supported by a Postdoctoral Scholarship from Vicerrectoria de Investigacion y Postgrado, Universidad de La Frontera, Temuco, Chile.
|
Formato |
pdf
|
Tipo de ruta |
suscripción#verde
|
Access Rights |
metadata
|
Derechos de acceso |
metadata
|
Página de inicio (Recomendado-único) |
1495
|
Página final (Recomendado-único) |
1501
|
- Colecciones
- Colección Publicaciones Científicas