Survey non-response biases
Collecting survey data is not easy. One issue is that the people completing the survey (survey respondents) may not be representative of the population it is meant to study in terms of their characteristics. If such characteristics are correlated with survey answers, non-response biases can occur, where survey estimates differ from those that would be obtained if all population members responded. These can lead to false conclusions being drawn from the survey data.
The possibility of false conclusions being drawn from survey data is a concern, so survey designers try to minimise biases. Efforts are made to ensure that the people asked to complete the survey (the survey sample) are representative of the study population, and that responses are obtained from all population subgroups in terms of characteristics. In addition, after data collection methods are used to correct for remaining biases. The most widely used of these methods, and the topic of this explainer, is the production of survey weights.
What do survey weights do, and how are they calculated?
What survey weights do is balance the influence of survey answers from members of different subgroups in analyses of the data so that they reflect the population distribution of subgroups. These weights have several components. The first of these is the design weight, which aims to correct biases arising during sampling. If all population members have the same chance of being sampled, all sample members have a design weight which for sake of simplicity we take to be equal to 1. However, this may not be the case. Sometimes for instance, certain groups are sampled more often than others, to ensure data from enough members is collected to enable analyses. If, to give a very simple example (surveys often make more complicated adjustments using more characteristics, though the principle remains the same), suppose that the country is divided into regions where all but one region has the same population size but the population in the remaining region – Region A – is half the size of the others. Suppose we also want to ensure that the sample size in each region is the same. Then people in Region A need to be twice as likely to be sampled as people in other regions. But to ensure any analysis of the resulting sample data is representative of the population, the weights for those in Region A have to be 1 / 2 = 0.5. This means that the weights for everyone else are 1/0.5 = 2 times larger than the weights for those in the Region A.
The second component is the non-response adjustment to correct biases arising from differential non-response in population subgroups. This is calculated in a similar way to the design weight. Staying with our example, suppose that the response rate among people in Region A is 50% – one in every two people responds – then the non-response adjustment to the selection weight is 1/0.5 = 2. Now optimistically suppose that the response rate everywhere else is twice as high as in Region A, that is, 100% so the non-response adjustment of 1/1 = 1 is not actually needed. As the overall survey weight contains the product of the selection weight and the non-response adjustment, the survey weights for the two regions are the same: for Region A it is 0.5 x 2 = 1 and for the second region it is 1 x 1 = 1. In other words, the overall non-response weights would be the same in the two regions, although generally this will not be true!
The third component is the poststratification adjustment. This adjusts the non-response weighted survey subgroup totals to estimates of population subgroup totals, for example from a census. To explain this adjustment, we’ll use another example with only two subgroups: males and females (once more, surveys often make more complex adjustments). Assume that there are 1000 people of each sex in the population, and that we have responses from 250 males and 500 females. The poststratification weight for each person in a subgroup is calculated as the population subgroup total divided by the non-response weighted survey subgroup total i.e. given the non-response weights from the previous paragraph, for males it is:
1000 / (1 x 250) = 4,
and for females, it is:
1000 / (1 x 500) = 2.
To produce the final survey weights, the post-stratification weights are then rescaled to have a mean of 1 i.e. each is divided by the overall weight mean. In this case, the overall weight mean is:
((4000 x (1 x 250)) + (2000 x (1 x 500))) / ((1 x 250) + (1 x 500)) = 2666.667.
Hence, for each male, the final survey weight is:
4000 / 2666.667 = 1.5,
and for each female it is:
2000 / 2666.667 = 0.75.
At the Institute for Social and Economic Research, the long running Understanding Society – the UK Household Longitudinal Study (UKHLS) is produced. The information it collects on the UK population is used by many researchers and policy makers. The team responsible for the survey, of which the author is a member, release a range of weights with the survey datasets. Each has a different purpose: for example, separate weights for respondents to the adult and youth questionnaires are released, along with weights for all respondents to each wave of data collection (participants are surveyed annually) and also weights for only respondents to the wave and all waves previous. All though, have the same aim: to ensure that with their use respondents reflect (the relevant part of) the population. All also include the three weight components described above in their calculation, though exact procedures used are weight specific and more complex. By using these weights in analyses, researchers and policy makers can ensure that conclusions drawn from the UKHLS data are free from biases.
Jamie Moore is a Research Fellow at the Institute for Social and Economic Research