Conference Paper Basel Workshop on Item Non-response and Data Quality in Large Social Surveys
Imputation Procedures and the Quality of Income Information in the ECHP
11 Oct 2003
The aim of the paper is to investigate the impact of the imputation procedures adopted in the European Community Household Panel (ECHP) on the quality of the information about income variables.
We evaluate the imputation methods adopted in the ECHP by looking for systematic differences in the distribution of income across different types of responding units. More precisely, we compare descriptive statistics of the imputed income variables (both in levels and growth rates) for the respondents and for different types of nonrespondents. While income levels do not seem to be affected much by imputation, income dynamics is. This occurs because the imputation procedure seems to alter the tails of the distribution of income growth. The effects of imputation are reduced if we consider statistics that are robust to outliers, such as the median.
A different approach to evaluate the impact of the missing data problem on indicators of poverty can be adopted following the work of Manski (1989) and Horowitz and Manski (1998). These papers show how to derive bounds for the cumulative distribution function of a variable of interest without imposing any assumption on the missing data mechanism.
Net household income in the ECHP is affected by nonresponse in about 22.4% of the cases. Such a high nonresponse rate implies that Manski's bounds tend to be wide.
In many cases, however, the information on income is not completely absent because income may be reported partially, i.e. we may know that total net household income is above a known threshold. This information may be sufficient to identify poor people. In fact, if household income is above the poverty line, then we can classify the members of a household as non-poor. Further, our ability to classify people as non-poor increases as the poverty line is reduced. This lowers the nonresponse rate by a big amount, narrowing Manski's bounds. Our aim is to evaluate if, for a suitable choice of the poverty line, it is better to combine the information from the fully respondents and the partially respondents and avoid using the imputed values.