python - Pandas - Create new dataframes based on ranked values in select columns -
python - Pandas - Create new dataframes based on ranked values in select columns -
i have dataframe columns containing numerical info , other containing text. looks like:
age weight blood sugar study grouping gender notes 29 195 126 b female notes of kind 34 180 140 b male different set of notes 48 220 111 c male blah blah 55 189 109 c male more notes
i want create sub-divisions of info frame based on rankings of numerical info columns. example, if need 2 oldest patients new dataframe this:
age weight blood sugar study grouping gender notes 48 220 111 c male blah blah 55 189 109 c male more notes
the rank function looks useful. figure run:
df2 = rank.df(axis=0)
and find way utilize index of df2 pull rows df new dataframes. along lines of:
cutoff = df2[df2 > 10] # delete rows nan values in columns of involvement
this feels bit clunky though. i'm hoping there's more straight-forward way say,
"pandas, want new dataframe 15 oldest people in one. great! want new dataframe 20 youngest people, etc"
one alternative sort dataframe age:
df = df.sort('age')
then age of n-th youngest person df['age'].values[n]
, age of n-th oldest person df['age'].values[-n]
.
therefore, view dataframe people of 15 oldest ages, do:
df[df['age'] >= df['age'].values[-15]]
alternatively, if want limit number of rows returned (e.g. don't mind there may 20 people sharing oldest age of, say, 55), utilize head
, tail
methods on sorted dataframe...
df_age = df.sort('age', ascending=false)
...then df_age.head(15)
view 15 of people, df_age.tail(20)
view 20 of youngest people.
python pandas dataframes
Comments
Post a Comment