American Association for Public Opinion Research Conference
May 13, 2005
Survey designers often want to oversample specific sub-populations, usually to improve precision and robustness in analysis of key differences. Nearly all existing methods for this in two stage sampling produce explicit problems, and have specific weaknesses, because domains (sub-populations) targeted for over sampling usually cross cut the clusters that form the PSUs.
This paper sets out a new method of sampling in such circumstances which gives major gains over previous methods, in both effectiveness and efficiency. This is achieved by simultaneously constraining three factors:
- sample sizes within domains that cross-cut clusters
- sample sizes within clusters
- variation in selection probabilities, both within domains and overall.
This is achieved through use of a size measure which, instead of being a simple count of the second stage units, is a weighted sum of the counts of units in each domain within the cluster. This relies on knowledge of the distribution of units over domains within each cluster, something which is often possible when administrative data are available. These data can be exploited to produce samples with closely controlled composition of PSUs and that are both more effective in boosting sub-samples than standard methods and offer, through reduction of design effects and ensuing improvements in effective sample size, greater cost efficiency than would otherwise be possible.
The paper comprises a general description of the method, and its underlying theoretical framework, and a case study of its application. This was in the design and implementation of the initial sample for a longitudinal study of young people in England. The overall design was a conventional two-stage one with PPS, with schools being PSUs and individuals sampled from their registers. A critical design objective was, however, boosting the numbers in the initial sample from six ethnic minority groups. The paper sets out the explicit advantages of the new method in this case, the problems encountered and their solutions. The conclusion is that this method has several substantial advantages over others and should be the default method for use in similar circumstances.