*============================================================*
*                           Ex1.do                           *
*============================================================*

*------------------------------------------------------------*
*  Current version: 02-10-06.                                *
*  Creates panel data set from raw BHPS data                 *
*										 *
*  DO NOT USE - THIS IS FOR ILLUSTRATION ONLY			 *
*										 *
*------------------------------------------------------------*


clear
version 9.2 
set more off
set matsize 800
set mem 150m
capture log close

log using ex1.log, replace

*------------------------------------------------------------*
* Aim is to create a panel of individuals from raw           *
* BHPS data. We also want to include some household          *
* characteristics.                                           *
* First, a simple example using individual data only.        *
* Get individual data, beginning with wave a.                *
* Note: text following // at the end of a command is a       *
* comment, and will be ignored by Stata.                     *
*------------------------------------------------------------*

cd "O:\mydata" // add path where data has been stored
dir // display the content of the current directory
use aindresp_ss.dta

*------------------------------------------------------------*
* What does file contain?                                    *
*------------------------------------------------------------*

describe

*------------------------------------------------------------*
* We want a final file in long format, so need to remove     *
* the wave indicator from variable names.                    *
* Also need to create a wave indicator. Why?                 *
*------------------------------------------------------------*

renpfix a 
gen byte wave = 1
label var wave "Wave identifier"

*------------------------------------------------------------*
* Save wave A data as the first part of the panel data file  *
*------------------------------------------------------------*

save ourpanel.dta, replace

*------------------------------------------------------------*
* Now we do same thing for other waves, using a loop         *
* for simplicity                                             *
*------------------------------------------------------------*

local i = 2
foreach q in b c d e f g h i j k l m n {
    
    use `q'indresp_ss.dta
    renpfix `q'
    gen byte wave = `i'
    local i=`i'+1
    save `q'junk.dta, replace

}

*------------------------------------------------------------*
* Append the 14 wave files                                   *
* (How could we do this with a loop?)                        *
*------------------------------------------------------------*

use ourpanel.dta
append using bjunk.dta
append using cjunk.dta
append using djunk.dta
append using ejunk.dta
append using fjunk.dta
append using gjunk.dta
append using hjunk.dta
append using ijunk.dta
append using jjunk.dta
append using kjunk.dta
append using ljunk.dta
append using mjunk.dta
append using njunk.dta


*------------------------------------------------------------*
* Don't forget to clean up afterwards                        *
*------------------------------------------------------------*


foreach q in b c d e f g h i j k l m n {
    
    erase `q'junk.dta

}


*------------------------------------------------------------*
* How could we do the last three steps in a single loop?     *
* - see example later                                        *
* Now we should have a long file of individual data          *
*------------------------------------------------------------*

describe
describe pid wave

summarize pid wave
su pid wave

*------------------------------------------------------------*
* View data (after putting in order)                         *
* Are there missing cases?                                   *
*------------------------------------------------------------*

sort pid wave
browse pid wave sex age

*------------------------------------------------------------*
* Now create a file with household level data as well.       *
* Easiest way to do this is to start again:                  *
* We go back to each wave's                                  *
* individual file and -merge- in household variables.        *
* (Then afterwards, rebuild the longitudinal file).          *
* As before, let's start with wave A.                        *
*------------------------------------------------------------*

drop _all // drop previous data as don't need

use aindresp_ss.dta
sort ahid // order the data by the household identifier 
save ourpanel.dta, replace

use ahhresp_ss.dta // get household data
sort ahid

merge ahid using ourpanel.dta
tabulate _merge

*------------------------------------------------------------*
* _merge==1    obs. from master data                         *
*              (file currently in memory) ahhresp_ss.dta here*
* _merge==2    obs. from using data                          *
*              (file being merged in)   i.e. ajunk.dta here  *
* _merge==3    obs. from both master and using data          *
*------------------------------------------------------------*

keep if _merge==3 // keep only cases with both hh and ind info 
drop _merge // don't need anymore

*------------------------------------------------------------*
* View data (after putting in order - note we sort by hh)    *
* Examine hh and person identifier, sex, age and             *
* relationship to reference person, and household type       *
*------------------------------------------------------------*

tab ahgr2r // look at values of relation to reference person
tab ahhtype // look at values of hh type

sort ahid
br ahid pid asex aage ahgr2r ahhtype

*------------------------------------------------------------*
* Save the new file for combining with other waves later.    *
* (first get rid of wave prefix and create wave indicator)  *
*------------------------------------------------------------*

renpfix a
gen byte wave = 1
save ourpanel.dta, replace // overwrite existing file-don't need

*------------------------------------------------------------*
* And then create the new files for the other waves.         *
*------------------------------------------------------------*

foreach q in b c d e f g h i j k l m n {
    
    use `q'indresp_ss.dta
    sort `q'hid 
    save `q'junk.dta, replace

    use `q'hhresp_ss.dta 
    sort `q'hid 

    merge `q'hid using `q'junk.dta
    tab _merge  // (always check _merge after merging files)
    keep if _merge==3 
    drop _merge

    renpfix `q'
    gen byte wave = index("abcdefghijklmn","`q'")

    save `q'junk.dta, replace

    use ourpanel
    append using `q'junk.dta
    save ourpanel.dta, replace

    erase `q'junk.dta
}



*------------------------------------------------------------*
* Examine data                                               *
*------------------------------------------------------------*

tab wave

sort wave hid // look at hhs within each wave
browse wave hid pid age sex hhtype hgr2r tenure

sort pid wave // look at persons across waves
browse pid wave hid age sex hhtype hgr2r tenure

*------------------------------------------------------------*
* Data contain some household level variables from hh file   *
* But we may want to create our own based on individual data *
* e.g. number of employed and total labour earnings          *
*                                                            *
* Employment status can be derived from jbstat (note coding  *
* changed between wave a and other waves (fixed in our data) *
*------------------------------------------------------------*

describe jbstat
label list jbstat
tab jbstat

recode jbstat -9/-1 = . // change negative values to missing
gen emp = jbstat==1 | jbstat==2 if jbstat~=.
label var emp "Employee or self-employed"

egen nemphh = total(emp), by(wave hid)
label var nemphh "Number of employed in hh"

*------------------------------------------------------------*
* View the data, and also compare the new variable nemphh    *
* with BHPS hh-level variable nemp (number in employment in  *
* in hh). Any differences? Why?                              *
*------------------------------------------------------------*

sort wave hid
browse wave hid pid jbstat emp nemphh nemp

*------------------------------------------------------------*
* Save final data set                                        *
*------------------------------------------------------------*

save ourpanel.dta, replace


set more on

log c
exit