python - Format data for survival analysis using pandas -



python - Format data for survival analysis using pandas -

i'm trying figure out quickest way survival analysis info format allow time varying covariates. python implementation of stsplit in stata. give simple example, next set of information:

id start end x1 x2 exit 1 0 18 12 11 1

this tells observation started @ time 0, , ended @ time 18. exit tells 'death' rather right censoring. x1 , x2 variables constant on time.

id t age 1 0 30 1 7 40 1 17 50

i'd get:

id start end x1 x2 exit age 1 0 7 12 11 0 30 1 7 17 12 11 0 40 1 17 18 12 11 1 50

exit 1 @ end, signifying t=18 when death occurred.

assuming:

>>> df1 id start end x1 x2 exit 0 1 0 18 12 11 1

and:

>>> df2 id t age 0 1 0 30 1 1 7 40 2 1 17 50

you can do:

df = df2.copy() # start df2 df['x1'] = df1.ix[0, 'x1'] # x1 column df['x2'] = df1.ix[0, 'x2'] # x2 column df.rename(columns={'t': 'start'}, inplace=true) # start column df['end'] = df['start'].shift(-1) # end column df.ix[len(df)-1, 'end'] = df1.ix[0, 'end'] df['exit'] = 0 # exit column df.ix[len(df)-1, 'exit'] = 1 df = df[['id', 'start', 'end', 'x1', 'x2', 'exit', 'age']] # reorder columns

output:

>>> df id start end x1 x2 exit age 0 1 0 7 12 11 0 30 1 1 7 17 12 11 0 40 2 1 17 18 12 11 1 50

python pandas stata survival-analysis

Comments

Popular posts from this blog

formatting - SAS SQL Datepart function returning odd values -

c++ - Apple Mach-O Linker Error(Duplicate Symbols For Architecture armv7) -

php - Yii 2: Unable to find a class into the extension 'yii2-admin' -