Conference Paper BHPS-2007 Conference: the 2007 British Household Panel Survey Research Conference, 5 July -7 July 2007, Colchester, UK
A heap of trouble? Accounting for mismatch bias in retrospective data -abstract-
When researchers design and administer surveys they face the challenge of how to collect information about events that happened in the distant past. Cross sectional surveys are enriched when they collect detailed retrospective information because the retrospective information allows researchers to study time-varying events and policies with data from a single cross-section. Longitudinal surveys are not only enriched when, as is typical, retrospective data are collected in the baseline survey, but also when subsequent waves of a panel study collect retrospective data about a social, economic, or physical phenomenon that either did not exist or whose importance was unrecognized when the survey began. In the context of longitudinal surveys, the collection of retrospective data yields greater benefits than they do for cross-sectional studies because they can be merged to the rich history longitudinal data provide for a given individual.
Retrospective data are collected for a wide variety of topics. Among others, these topics include marital events (collected through marital histories), births (from fertility histories or constructed from data on age or birth dates), deaths, purchase behavior, getting, changing, or losing jobs. Heaping pervades the majority of types of event-centered data that are collected through retrospective reports.
In this study we investigate heaping in the context of retrospective data on smoking behavior. We use retrospective smoking data from a US crosssectional survey (Tobacco Use Supplements of the Current Population Surveys) and from longitudinal surveys conducted in the US (Panel Study of Income Dynamics, National Longitudinal Survey of Youth 1979), UK (British Household Panel Survey), Germany (German Socio-Economic Panel), and China (China Health and Nutrition Study) to document that heaping occurs in strikingly similar ways in each data source. We show that the way in which heaping manifests itself depends partly on rounding rules defined by the categories respondents are offered and partly on natural rounding rules that may be independent of those categories. We then describe the general problem - a particular type of measurement error - that heaping introduces. Briefly, heaping causes explanatory variables to be “mismatched” with the behavioral outcome they are used to explain. We next investigate the determinants of heaping. We show that factors related to recall accuracy also predict whether a person will round up or down the year or age an event occurred.
Our main purpose is to develop algorithms researchers can use to mitigate the mismatch bias associated with heaping. We develop and test several algorithms that range from a simple averaging rule to a more complicated adjustment to the likelihood function that accounts for a particular (assumed) functional relationship through which heaping occurs. With data from each country we test the performance of each algorithm to show how each algorithm alters coefficients on time-varying covariates.
Our results confirm that heaping results in coefficient estimates that are biased downwards and that our algorithms significantly reduce this bias. While we demonstrate the gains each algorithm affords in the context of models of smoking cessation, we conclude by discussing how our algorithms can be applied more generally to other types of retrospective data.