Matching
FAQs:
- How do I match respondents across waves?
- Respondents within households?
- Children to their parents?
- Spouses to one another?
There are three key identifying variables on the BHPS data which you need to match together records either cross-sectionally or longitudinally.
They are:
- wHID is the household ID at that wave
- wPNO is the person number of the respondent within the household at that wave
- PID is the unique personal identifier which all sample members, including children under 16, are assigned and carry with them for the life of the survey regardless of the household they are found in at a given wave. This is the key variable you need to use to match individuals records longitudinally. Note that you cannot match households longitudinally using the wHID as this is not constant across waves.
Detailed guidelines on how to match respondent data across waves, locate respondents within households, matching children to parents and spouses to one another, etc, can be found in the data documentation describing the BHPS data structure.
Individuals can be matched across waves using the variable PID which can be found in all respondent level data. Respondents can be linked to their household data for that wave using wHID, the identifier variable found in the household record as well as the individual level data files.
The record xWAVEID is indexed by PID and is updated with each release. It contains the interview outcome, sample status, and key cross-wave identifying information for each enumerated individual including household identifiers for all waves.
The record wEGOALT, is a data file containing a record for each pair of household members. Each record represents each respondent paired sequentially with each other household member. For example, a three person household would be represented by six cases: PNO 1, called “EGO” matched with PNO 2 and PNO 3, called “ALTERS”, PNO 2 as EGO matched with PNO 1 and PNO 3 as ALTERS, and lastly PNO 3 as EGO matched with PNO 1 and PNO 2 as ALTERS. The records contain some basic details about the EGO person in the pair as well as the ALTER and the nature of their relationship – son, daughter, spouse, unmarried partner, etc included are the PID for the EGO and the PID for the ALTER so that the full data for any two individuals within the household can be linked together. This also includes information whether the ALTER was in the same household as the EGO at the past wave (to identify household joiners) and whether the ALTER is still in the same household by the following wave (to identify household splits).