Conference Paper Econometric Institute Seminar
Correcting for Sample Selection Bias: alternative estimators compared
20 Nov 2003
This paper assesses three different types of regression estimation procedures used to take account of sample selection problems, in particular the missing data problem in sample surveys. These are propensity score estimation, imputation and classical econometric selection models procedures. All three types of estimation methods are based on assumptions whose validity can only be verified when the missing (counterfactual) data are observed.
Nevertheless, by computing bounds instead of a point estimate, it is possible to avoid untestable assumptions and to carry out an informal check of the underlying assumptions of the above estimators, as suggested in Manski (1989). The check procedure involves two steps. The first step consists of the computation of bounds, say Manski bounds, for a specific feature of interest in a regression model, for example the conditional mean or a conditional quantile, with very weak or no assumptions on the missing data mechanism. The second step consists of checking whether the estimates of interest, using alternative estimation methods, lie inside the Manski bounds.
This checking procedure is applied to the estimation of the poverty probability in Italy using the European Community Household Panel Survey. The poverty is defined by using net household income, which is affected by nonresponse in more than 20% of the cases. Such a high nonresponse rate implies that Manski bounds on the probability to be poor tend to be wide. In many cases, however, the information on income is not completely absent because income may be reported partially, i.e. it is known that total net household income is above a known threshold. I use this information on partial reported income and some weak assumptions to narrow Manski bounds. I then check whether the conditional poverty probabilities, estimated by using different methods, contradict the Manski bounds.