Skip to content


Imputation of household survey data using mixed models -PhD thesis-


Publication date

Jun 2015


surveys collect information about a household and data items relating
to one or more people within the household. Developing an efficient
strategy for dealing with missing data is essential in the current
climate of falling response rates. People within households are more
likely to share characteristics than a random group of people and this
homogeneity can be used when forming strategies for dealing with
nonresponse. Amongst single value imputation methods, linear models and
donor models are commonly used, but generally ignore relationships
within households. These strategies make use of auxiliary variables
available for nonrespondents to replace the missing value with a single
value, for example a mean or donor value. Imputation strategies for
missing items at person level will be the focus of this thesis. The goal
is to make use of correlation structures within households to form
improved imputed values for missing data.
Imputation models are developed and assessed using the hierarchical
structure of people within households. They are investigated for both
continuous and binary missing response variables. Linear mixed
imputation models, generalized linear mixed imputation models and donor
imputation methods (random, within class and nearest neighbour) are
investigated and compared to existing methods which do not exploit this
hierarchical structure. The imputation methods are evaluated using data
from two large-scale household surveys, the Household, Income and Labour
Dynamics in Australia Survey (HILDA), and the British Household Panel
Survey (BHPS), on a range of criteria relevant to household surveys.
For continuous variables a proposed household nearest neighbour
method results in improved imputed values over other donor methods, and
the success of the linear mixed model increases with the level of
clustering. For binary variables the household nearest neighbour method
and generalized linear mixed models both lead to improvements over
standard donor and generalized linear methods.
The household imputation methods are most beneficial for improving
predictive accuracy and reproducing within-household clustering in the
imputed dataset. They are of some benefit for variance estimation but
did not achieve much improvement over single-level methods for bias
reduction. The level of improvement often depends on the assumed
nonresponse mechanism, with the linear mixed model more beneficial than
the household donor method under informative nonresponse and higher
levels of clustering. Otherwise, the donor household method was
generally at least as good as the multilevel model and is less complex
to implement.



Research home

Research home


Latest findings, new research

Publications search

Search all research by subject and author


Researchers discuss their findings and what they mean for society


Background and context, methods and data, aims and outputs


Conferences, seminars and workshops

Survey methodology

Specialist research, practice and study

Taking the long view

ISER's annual report


Key research themes and areas of interest