Prediction of attrition in large longitudinal studies: tree-based methods versus multinomial logistic models

Publication type

Journal Article

Authors

Publication date

March 2, 2021

Summary:

Identifying predictors of attrition is essential for designing longitudinal studies such that attrition bias can be minimised, and for identifying the variables that can be used as auxiliary in statistical techniques to help correct for non-random drop-out. This paper provides a comparative overview of predictive techniques that can be used to model attrition and identify important risk factors that help in its prediction. Logistic regression and several tree-based machine learning methods were applied to Wave 2 dropout in an illustrative sample of 5000 individuals from a large UK longitudinal study, Understanding Society. Each method was evaluated based on accuracy, AUC-ROC, plausibility of key assumptions and interpretability. Our results suggest a 10% improvement in accuracy for random forest compared to logistic regression methods. However, given the differences in estimation procedures we suggest that both models could be used in conjunction to provide the most comprehensive understanding of attrition predictors.

Published in

SocArXiv

DOI

https://doi.org/10.31235/osf.io/tyszr

Subjects

Notes

Open Access

CC-By Attribution-NonCommercial-NoDerivatives 4.0 International

#536709

News

Latest findings, new research

Publications search

Search all research by subject and author

Podcasts

Researchers discuss their findings and what they mean for society

Projects

Background and context, methods and data, aims and outputs

Events

Conferences, seminars and workshops

Survey methodology

Specialist research, practice and study

Taking the long view

ISER's annual report

Themes

Key research themes and areas of interest