*============================================================* * Ex1.do * *============================================================* *------------------------------------------------------------* * Current version: 02-10-06. * * Creates panel data set from raw BHPS data * * * * DO NOT USE - THIS IS FOR ILLUSTRATION ONLY * * * *------------------------------------------------------------* clear version 9.2 set more off set matsize 800 set mem 150m capture log close log using ex1.log, replace *------------------------------------------------------------* * Aim is to create a panel of individuals from raw * * BHPS data. We also want to include some household * * characteristics. * * First, a simple example using individual data only. * * Get individual data, beginning with wave a. * * Note: text following // at the end of a command is a * * comment, and will be ignored by Stata. * *------------------------------------------------------------* cd "O:\mydata" // add path where data has been stored dir // display the content of the current directory use aindresp_ss.dta *------------------------------------------------------------* * What does file contain? * *------------------------------------------------------------* describe *------------------------------------------------------------* * We want a final file in long format, so need to remove * * the wave indicator from variable names. * * Also need to create a wave indicator. Why? * *------------------------------------------------------------* renpfix a gen byte wave = 1 label var wave "Wave identifier" *------------------------------------------------------------* * Save wave A data as the first part of the panel data file * *------------------------------------------------------------* save ourpanel.dta, replace *------------------------------------------------------------* * Now we do same thing for other waves, using a loop * * for simplicity * *------------------------------------------------------------* local i = 2 foreach q in b c d e f g h i j k l m n { use `q'indresp_ss.dta renpfix `q' gen byte wave = `i' local i=`i'+1 save `q'junk.dta, replace } *------------------------------------------------------------* * Append the 14 wave files * * (How could we do this with a loop?) * *------------------------------------------------------------* use ourpanel.dta append using bjunk.dta append using cjunk.dta append using djunk.dta append using ejunk.dta append using fjunk.dta append using gjunk.dta append using hjunk.dta append using ijunk.dta append using jjunk.dta append using kjunk.dta append using ljunk.dta append using mjunk.dta append using njunk.dta *------------------------------------------------------------* * Don't forget to clean up afterwards * *------------------------------------------------------------* foreach q in b c d e f g h i j k l m n { erase `q'junk.dta } *------------------------------------------------------------* * How could we do the last three steps in a single loop? * * - see example later * * Now we should have a long file of individual data * *------------------------------------------------------------* describe describe pid wave summarize pid wave su pid wave *------------------------------------------------------------* * View data (after putting in order) * * Are there missing cases? * *------------------------------------------------------------* sort pid wave browse pid wave sex age *------------------------------------------------------------* * Now create a file with household level data as well. * * Easiest way to do this is to start again: * * We go back to each wave's * * individual file and -merge- in household variables. * * (Then afterwards, rebuild the longitudinal file). * * As before, let's start with wave A. * *------------------------------------------------------------* drop _all // drop previous data as don't need use aindresp_ss.dta sort ahid // order the data by the household identifier save ourpanel.dta, replace use ahhresp_ss.dta // get household data sort ahid merge ahid using ourpanel.dta tabulate _merge *------------------------------------------------------------* * _merge==1 obs. from master data * * (file currently in memory) ahhresp_ss.dta here* * _merge==2 obs. from using data * * (file being merged in) i.e. ajunk.dta here * * _merge==3 obs. from both master and using data * *------------------------------------------------------------* keep if _merge==3 // keep only cases with both hh and ind info drop _merge // don't need anymore *------------------------------------------------------------* * View data (after putting in order - note we sort by hh) * * Examine hh and person identifier, sex, age and * * relationship to reference person, and household type * *------------------------------------------------------------* tab ahgr2r // look at values of relation to reference person tab ahhtype // look at values of hh type sort ahid br ahid pid asex aage ahgr2r ahhtype *------------------------------------------------------------* * Save the new file for combining with other waves later. * * (first get rid of wave prefix and create wave indicator) * *------------------------------------------------------------* renpfix a gen byte wave = 1 save ourpanel.dta, replace // overwrite existing file-don't need *------------------------------------------------------------* * And then create the new files for the other waves. * *------------------------------------------------------------* foreach q in b c d e f g h i j k l m n { use `q'indresp_ss.dta sort `q'hid save `q'junk.dta, replace use `q'hhresp_ss.dta sort `q'hid merge `q'hid using `q'junk.dta tab _merge // (always check _merge after merging files) keep if _merge==3 drop _merge renpfix `q' gen byte wave = index("abcdefghijklmn","`q'") save `q'junk.dta, replace use ourpanel append using `q'junk.dta save ourpanel.dta, replace erase `q'junk.dta } *------------------------------------------------------------* * Examine data * *------------------------------------------------------------* tab wave sort wave hid // look at hhs within each wave browse wave hid pid age sex hhtype hgr2r tenure sort pid wave // look at persons across waves browse pid wave hid age sex hhtype hgr2r tenure *------------------------------------------------------------* * Data contain some household level variables from hh file * * But we may want to create our own based on individual data * * e.g. number of employed and total labour earnings * * * * Employment status can be derived from jbstat (note coding * * changed between wave a and other waves (fixed in our data) * *------------------------------------------------------------* describe jbstat label list jbstat tab jbstat recode jbstat -9/-1 = . // change negative values to missing gen emp = jbstat==1 | jbstat==2 if jbstat~=. label var emp "Employee or self-employed" egen nemphh = total(emp), by(wave hid) label var nemphh "Number of employed in hh" *------------------------------------------------------------* * View the data, and also compare the new variable nemphh * * with BHPS hh-level variable nemp (number in employment in * * in hh). Any differences? Why? * *------------------------------------------------------------* sort wave hid browse wave hid pid jbstat emp nemphh nemp *------------------------------------------------------------* * Save final data set * *------------------------------------------------------------* save ourpanel.dta, replace set more on log c exit