Research Paper Centre for Applied Statistics Lancaster University Working Paper Series 2002/01
Evaluating the impact of missing data in social research: simulations and applications using the BHPS and the NCDS
01 Jan 2002
In this paper we present a double selection model and a truncated double selection model for evaluating the impact of non-ignorable missing (NIM) data in social research. The substantive focus of the paper is on the analysis of the determinants of employment and earnings, an area which has received considerable attention by quantitative social scientists. Simulations suggest that where NIM is present the classical OLS and single employment selection model developed by Heckman and others fails to recover the true parameter estimates. The two models are applied to two widely used data sets in social research, that is the BHPS which is a panel survey and the NCDS which is a longitudinal survey. We find NIM in both the BHPS and NCDS. The missing data mechanism cannot be ignored in the BHPS, the inference about covariate effects in the employment and earnngs equations from the double selection models can differ from that obtained from OLS and the employment selection model. The missing data mechanism in the NCDS is has a much weaker impact on the inference about covariate effects than that obtained from OLS and the employment selection model. A conclusion of this paper is that social researchers should always adopt the methods proposed here as a means of determining the nature of the missing data mechanism and as a validity check on the results of models which assume that the missing data can be ignored.