***********************************************
** SC968 PANEL DATA METHODS for SOCIOLOGISTS
** DO-FILE FOR LECTURE 4
***********************************************
***********************************************
** 4.0 OBJECTIVES
***********************************************
***********************************************
** 4.1 GETTING STARTED
***********************************************
cd m:
log using "SC968 Lecture 7 Worksheet log.log",replace
use teaching.dta,clear
***********************************************
** 4.2 LINEAR RANDOM COEFFICIENTS MODELS
***********************************************
** Have a look at the variables
summarize jbstat hlghq* hlstat mrjsec sex age12
** Prepare data for analysis
mvdecode jbstat hlghq* hlstat mrjsec sex age12, mv(-9/-1)
generate empstat=jbstat
recode empstat 1/2=1 3=2 4/10=3
generate onssec=mrjsec
recode onssec 10/119=0 120/139=1
generate hlghq3 = hlghq2
recode hlghq3 0/2=0 3/12=1
tab1 empstat hlghq3 onssec
** Null model
xtmixed hlghq1 || pid:, mle variance
** Question: Is there evidence of variability in GHQ scores between individuals
** and within individuals over time?
** Answer: Yes, within individuals and between individuals, variation significantly different from 0
** Question: Is the variability between individuals greater than the variability within individuals?
** Answer: No, between = 12, within = 17
** Question: What is the average GHQ score overall?
** Answer: 11.23
** Calculate VPC
display 12.099/( 12.099+17.264)
** Question: What do you conclude about the variability in GHQ scores?
** Answer: 41% of the total variability can be attributed to differences between individuals
** Model 0:
** Null
**Fixed part
**Intercept 11.23 (0.07)
**Random part
**Within variation 17.26 (0.16)
**Between variation 12.10 (0.40)
**VPC 0.41
** Regress GHQ on employment status using a random intercepts specification
xi: xtmixed hlghq1 i.empstat || pid:, mle variance
** Calculate VPC
display 11.908/(11.908+17.185)
** Model 1:
** Random intercepts
**Fixed part
**Intercept 10.95 (0.08)
**Unemployed 1.84 (0.16)
**OLF 0.49 (0.08)
**Random part
**Within variation 17.18 (0.16)
**Between variation 11.91 (0.39)
**VPC 0.41
** Question: Which group has the highest GHQ scores on average?
** Answer: Unemployed
** Question: Have we explained any of the variability between individuals?
** Answer: Only very little. Variability reduced slightly but VPC same.
** Predictions from the random intercept model
** Variability in intercepts
predict model1_re*, reffects
bysort pid:generate tolist=(_n==1)
list pid model1_re1 in 1/100 if tolist, sep(10) noobs
generate intercept= _b[_cons] + model1_re1
list pid intercept in 1/100 if tolist, sep(10) noobs
** Question: What do you conclude?
** Answer: Large differences in mean GHQ of employed people.
*** But note in a later model that there is greater variability among the unemployed.
** Q-Q plot to check normality
predict rs, rstandard
qnorm rs
graph save Graph "SC968 Lecture 7 Worksheet graph 1.gph",replace
** Question: What does the graph show?
** Answer: Residuals do not follow a normal distribution – some departure from a straight line
** Save estimates
estimates store model1
** Add random slopes to the model
xi: xtmixed hlghq1 i.empstat || pid: i.empstat, mle cov(unstr) variance
** Question: Is there significant random variation in the slopes?
** Answer: Yes
** Likelihood ratio test on the significance of the slope
estimates store model2
lrtest model1 model2
** Question: Is this consistent with your interpretation above?
** Answer: Yes, Chisq with 5 df = 355, p < 0.00005 for random slopes model compared with random intercepts model
** Calculate VPC
display 9.565/(9.564+16.422)
** Model 2:
** Random coeffcients
**Fixed part
**Intercept 10.90 (0.07)
**Unemployed 2.02 (0.21)
**OLF 0.55 (0.10)
**Random part
**Unemployed 11.25 (0.68)
**OLF 5.70 (0.64)
**Within variation 16.42 (0.16)
**Between variation 9.56 (0.38)
**VPC 0.37
** Question: How do the findings from model 2 differ from model 1?
** Answer:
** Question: How would you interpret the change in the VPC from model 1 to model 2?
** Answer: The
** Individual regression lines
predict model2_re*, reffects
replace intercept = _b[_cons] + model2_re3
generate unemp = _b[_Iempstat_2] + model2_re1
generate olf = _b[_Iempstat_3] + model2_re2
list pid intercept unemp olf in 1/100 if tolist, sep(10) noobs
** Question: How do the individual regression estimates differ from the average effects?
** Answer:
** Add sex as a covariate
xi: xtmixed hlghq1 i.empstat i.sex || pid: i.empstat, mle cov(unstr) variance
display 8.87/(8.87+16.42)
** Model 3:
** Random coefficient and level-2 covariate
**Fixed part
**Intercept 10.11 (0.10)
**Unemployed 2.06 (0.21)
**OLF 0.45 (0.10)
**Gender 1.58 (0.13)
**Random part
**Unemployed 11.07 (1.66)
**OLF 5.78 (0.64)
**Within variation 16.42 (0.16)
**Between variation 8.87 (0.36)
**VPC 0.35
** Include an interaction between employment status and sex.
xi: xtmixed hlghq1 i.empstat*i.sex || pid: i.empstat, mle cov(unstr) variance
** Model 4:
** Random coefficient and level-1 by level-2 interaction
**Fixed part
**Intercept 10.09 (0.10)
**Unemployed 1.88 (0.28)
**OLF 0.61 (0.17)
**Gender 1.62 (0.14)
**Unemployed X gender 0.45 (0.43)
**OLF X gender -0.24 (0.21)
**Random part
**Unemployed 10.94 (1.65)
**OLF 5.77 (0.64)
**Within variation 16.42 (0.16)
**Between variation 8.86 (0.36)
**VPC 0.35
***********************************************
** 4.3 LOGISTIC RANDOM COEFFICIENTS MODELS
***********************************************
drop if wave>5
** Null model
xtmelogit hlghq3 || pid:, variance
** Model 0:
** Null
**Fixed part
**Intercept -1.45 (0.05)
**Random part
**Between variation 3.18 (0.24)
** Random intercepts model
xi: xtmelogit hlghq3 i.empstat || pid: ,variance
xi:xtmelogit , or
** Model 1:
** Random intercepts
**Fixed part
**Intercept -1.62 (0.07)
**Unemployed 0.72 (0.15)
**OLF 0.35 (0.09)
**Random part
**Between variation 3.17 (0.24)
** Random coefficients model
xi: xtmelogit hlghq3 i.empstat || pid: i.empstat, variance cov(unstr)
** Question: What problems do you have?
** Answer: the model fails to converge.
***********************************************
** 4.4 GROWTH CURVE MODELS
***********************************************
use teaching.dta,clear
** Derive variable for time
generate time = wave -1
generate timesq = time^2
** Linear growth curve
xtmixed hlstat time || pid:time , mle cov(unstr) variance
estimates store growth1
** Quadratic growth curve
xtmixed hlstat time timesq || pid:time, mle cov(unstr) variance
estimates store growth2
** Likelihood ratio test
lrtest growth1 growth2
** Question: Does a linear or a quadratic growth curve fit the data best?
** Answer: quadratic – lrtest highly significant
** Question: Is there any variation in individual growth curves?
** Answer: yes, there is significant variation in the intercepts and the rate of change
** Generate new age and cohort variables
summarize age12,detail
summarize age12 if wave==1,detail
generate age_cen = age12 - 35
bysort pid: generate cohort=age12[1]-35
** Add fixed effect for age
xtmixed hlstat time timesq age_cen || pid:time, mle cov(unstr) variance
** Add fixed effect for cohort
xtmixed hlstat time timesq cohort || pid:time, mle cov(unstr) variance
** Question: Which model fits the data best?
** Answer: Age model but difference fairly modest (difference in LL = 9)
** Add a time varying covariate for social class
xi: xtmixed hlstat time timesq cohort i.onssec || pid:time timesq, mle cov(unstr) variance
** Add interaction between social class and time
xi: xtmixed hlstat time timesq cohort i.onssec*time || pid:time timesq, mle cov(unstr) variance
** Question: Is health poorer for those in more disadvantaged social groups (onssec=1)?
** Answer:Yes, coefficient 0.089, p<0.0005
** Question: Does health decline faster over time for those in more disadvantaged social classes?
** Answer: No, the interaction is not significant
** Graph estimated mean growth curves by social group
gen Advantaged = _b[_cons] + (time*_b[time]) + (timesq*_b[timesq])
gen Disadvantaged = _b[_cons] + (time*_b[time]) + (timesq*_b[timesq]) + _b[_Ionssec_1]
twoway (line Advantaged wave, sort lcolor(navy) lwidth(thick)) ///
(line Disadvantaged wave, sort lcolor(pink) lwidth(thick))
graph save Graph "SC968 Lecture 7 Worksheet graph 2.gph",replace
log close