***********************************************
** SC968 PANEL DATA METHODS for SOCIOLOGISTS
** DO-FILE FOR LECTURE 5
***********************************************
***********************************************
** 5.0 OBJECTIVES
***********************************************
***********************************************
** 5.1 GETTING STARTED
***********************************************
cd D:\Home\savram\SC968
cap log close
log using "SC968 Lecture 9 Worksheet log.log",replace
use survival_2014.dta,clear
** check the data
describe
summarize
***********************************************
** 5.2 SUMMARISING TIME TO EVENT DATA
***********************************************
** prepare data for survival analysis
stset wave, failure(mastat==1/2) id(pid) exit(mastat==1/2 .)
list pid wave mastat _st _d _t _t0 in 1/100,sepby(pid) noobs
** Question: In the previous setting, Stata considers the start of the risk period to be wave 1. How can you tell that?
** Answer: By looking at _t0 and _t. _t0=0 in wave 1 meaning that the start of the risk period is in wave 1.
**Look at ages at the start of the study
sum age if wave==1, detail
stset age, id(pid) failure(mastat=1/2) origin(time 16) entry(wave=1) exit(mastat=1/2 .)
list pid wave age mastat _st _d _t _t0 in 1/100,sepby(pid) noobs
stdes
** Question: Are there any gaps?
** Answer: No
stsum
** Question: What is the total person years of follow-up?
** Answer: 2539
** Question: What is the cohabiting relationship rate per year of follow up?
** Answer: 6.8%
stsum, by(sex)
** produce Kaplan-Meier graph
sts graph, by (sex)
graph save Graph "SC968 Lecture 9 Worksheet graph 1.gph",replace
** Question: What is the median time to cohabitation for men and women?
** Answer: Men 11 years, women 9 years
** log rank test of difference in survival
sts test sex
** Question: Is there a significant difference in time to cohabitation between men and women?
** Answer: Yes, the log rank test is significant at 5% level
***********************************************
** 5.3 COX REGRESSION MODELS
***********************************************
xi:stcox i.sex
xi:stcox i.sex i.agegroup
** Table 1. Hazard ratios by gender for time to first cohabiting partnership
** Hazard ratio 95% C.I. P value
**Unadjusted 1.66 1.22-2.24 0.001
**Adjusted for age group 1.64 1.21-2.23 0.001
** Question: What happens to the hazard ratio for gender when you adjust for age group?
** Answer: It remains more or less the same.
** Question: What does the hazard ratio for gender represent when you adjust for age?
** Answer: It represents the hazard ratio of women relative to men when age is equal to the ommitted category, i.e. 15-24.
** Question: What’s the hazard ratio of women finding a partner relative to men when they are aged 35 and over?
** Answer: It's 1.64*2.01=3.30. This is the harzard ratio of women aged 35+ relative to the base category which is men, aged 15-24.
** report whether variables vary over time
stvary nssec* hqual* income*
xi:stcox i.nssec_w1
xi:stcox i.nssec_w1 i.sex i.agegroup
xi:stcox i.income_w1
xi:stcox i.income_w1 i.sex i.agegroup
xi:stcox i.hqual_w1
xi:stcox i.hqual_w1 i.sex i.agegroup
xi:stcox i.sex i.agegroup i.nssec_w1 i.hqual_w1 i.income_w1
**Table 2 Hazard ratios for time to first cohabiting partnership
** Class Education Income
**Unadjusted 1.36 1.43 1.48
**Adjusted for age/gender 1.25 1.35 1.33
**Adjusted for age/gender 1.04 1.36 1.25
** and other SEP measures
** Question: Which measure of SEP has the highest hazard ratios? And which the lowest?
** Answer: Highest = education and income; lowest = social class
** Question: Are they all significant predictors of time to cohabitation?
** Answer: Only income is in univariate models.
** Question: How would you interpret the differences between the unadjusted hazard ratios and the hazard ratios adjusted for age and sex?
** Answer: The hazard ratios decrease a little bit suggesting the three variables are confounded by age and sex, i.e. there are gender and age differences between education/class/income groups
** that explain part of the differences in the time to first partnership
** Question: Does each SEP measure still predict time to cohabitation when you control for the other SEP measures? What do you conclude from this?
** Answer: None of the three predictors is significant at the conventional 5% level. However, education and income still have relatively high hazard ratios but they may be too imprecisely estimated.
** Class appears to have no independent effect.
***********************************************
** 5.4 THE PROPORTIONAL HAZARDS ASSUMPTION
***********************************************
** Kaplan Meier plot by gender
stcoxkm, by(sex)
graph save Graph "SC968 Lecture 9 Worksheet graph 2.gph",replace
** Question: What do you notice? Are the observed lines close to the predicted lines?
** Answer: Yes, they appear to be reasonably close.
** plot of cummulative hazard by gender
sts graph, by(sex) cumhaz
graph save Graph "SC968 Lecture 9 Worksheet graph 3.gph",replace
** Question: Do the cumulative survival curves cross?
** Answer: No
** log-log survival plot
stphplot, strata(sex) adjust(agegroup nssec_w1 hqual_w1 income_w1)
graph save Graph "SC968 Lecture 9 Worksheet graph 4.gph",replace
** Question: The lines should be approximately parallel. Are they?
** Answer: Yes, until the end of the analysis time
** interaction of gender with time
xi: stcox i.sex i.agegroup i.nssec_w1 i.hqual_w1 i.income_w1 , tvc(i.sex)texp(log(_t))
** Question: Does the effect of gender vary by time?
** Answer: No. Look at the second part of the table termed-tvc. It contains the interaction between log(analysis time) and sex.
** The p-value is 0.236--> the interaction term is not statisticaly significant.
** Schoenfeld residuals
xi:stcox i.sex i.agegroup i.nssec_w1 i.hqual_w1 i.income_w1, schoenfeld(sch*) scaledsch(sca*)
estat phtest, rank detail
** Question: Is the test statistic significant for sex?
** Answer: No, chisq= 2.44, p = 0.18
** Question: Is there any evidence that hazards are non proportional for any of the other covariates?
** Answer: Yes, education.
** Kaplan Meier plot by education
stcoxkm, by(hqual_w1)
graph save Graph "SC968 Lecture 9 Worksheet graph 5.gph",replace
** plot of cummulative hazard by education
sts graph, by(hqual_w1) cumhaz
graph save Graph "SC968 Lecture 9 Worksheet graph 6.gph",replace
** log-log survival plot by education
stphplot, strata(hqual_w1) adjust(sex nssec_w1 agegroup income_w1)
graph save Graph "SC968 Lecture 9 Worksheet graph 7.gph",replace
** interaction of education with time
xi: stcox i.sex i.agegroup i.nssec_w1 i.hqual_w1 i.income_w1 , tvc(i.hqual_w1)texp(log(_t))
** Cox model stratified by education
xi:stcox i.sex i.nssec_w1 i.agegroup i.income_w1, strata(hqual_w1)
** run model with time varying variables
xi: stcox i.sex i.agegroup i.nssec i.hqual i.income
** Question: Why do you think the estimates are different from table 2?
** Answer: Income and social class show significant variation over time – see stvary output from the start of this session. Education does not change much.
*******************************************************
** 5.5 TRYING A NEW SURVIVAL ANALYSIS ON YOUR OWN!
*******************************************************
** analyse time to drop-out from survey
stset wave, id(pid) failure(wdrawn==1)
list pid wave wdrawn _st _d _t _t0 in 1/100,sepby(pid) noobs
stdes
stsum
xi:stcox i.sex i.agegroup i.nssec_w1 i.hqual_w1 i.income_w1
** Question: Is gender, age or SEP related to withdrawal from the survey?
** Answer: Gender and age only. But should check whether SEP measures have any univariate effect.
log close