python - Transform slow pandas iterrows into apply -
python - Transform slow pandas iterrows into apply -
i have next dataframe:
value count recl_2007 recl_2008 recl_2009 a_a a_b a_c b_a b_b \ 0 189 149.5872 503 503 500 0 0 0 0 0 1 209 1939.6160 503 503 503 0 0 0 0 0 2 499 617.4784 503 500 503 0 0 0 0 0 3 585 73.0688 503 503 503 0 0 0 0 0 4 611 133.9072 503 500 503 0 0 0 0 0 5 645 278.7904 503 503 503 0 0 0 0 0 6 659 138.2976 500 503 503 0 0 0 0 0 7 719 769.5744 503 503 502 0 0 0 0 0 b_c c_a c_b c_c 0 0 0 0 0 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 7 0 0 0 0
also, values in columns:recl_2007, recl_2008 , recl_2009 correspond variables a, b , c follows:
= 500, b=502, c=503.
i want fill columns a_a...c_c using values in count column such recl_2007 value gives first part of column name , recl_2009 gives sec part of column name.
i.e, if recl_2007 == 503 , recl_2009 == 500, column c_a , value should updated whatever in count column of row.
currently iterating through pandas dataframe using iterrows:
for index, row in df.iterrows(): init = opp_lu[row[name_init]] # first part of column name finl = opp_lu[row[name_finl]] # sec part of column name col_name = init+'_'+finl df.loc[index,col_name] = row['count']
this slow, not sure how translate using apply. hints?
there 2 ways that.
you can utilizeapply
function, need work here(just simplify work). a dictionary help build naming
d={'500':'a','502':'b','503':'c'}
function naming
name= lambda x: "{0}_{1}".format(d[str(int(x['recl_2007']))],d[str(int(x['recl_2009']))])
then, go through items , re-create count item name similar.
df["c_a"] = df.apply(lambda x: x['count'] if name(x)=='c_a' else 0, axis=1)
the other solution, simpler filter info have, re-create count item
df.loc[(df['recl_2007']==503) & (df['recl_2009']==503), 'c_c']= df['count']
the code like, quick example, need work on other scenarios.
data= """value,count,recl_2007,recl_2008,recl_2009\n189,149.5872,503,503,500\n209,939.6160,503,503,503\n499,617.4784,503,500,503\n585,73.0688,503,503,503\n611,133.9072,503,500,503\n645,278.7904,503,503,503\n659,138.2976,500,503,503\n719,769.5744,503,503,502""" import pandas pd io import stringio df= pd.read_csv(stringio(data.decode('utf-8')),sep=',' ) #first approach: d={'500':'a','502':'b','503':'c'} name= lambda x: "{0}_{1}".format(d[str(int(x['recl_2007']))],d[str(int(x['recl_2009']))]) df['c_c']=[0]*len(df.value) df["c_a"] = df.apply(lambda x: x['count'] if name(x)=='c_a' else 0, axis=1) #second approach: df.loc[(df['recl_2007']==503) & (df['recl_2009']==503), 'c_c']= df['count'] print df
output:
value count recl_2007 recl_2008 recl_2009 c_c c_a 0 189 149.5872 503 503 500 0.0000 149.5872 1 209 939.6160 503 503 503 939.6160 0.0000 2 499 617.4784 503 500 503 617.4784 0.0000 3 585 73.0688 503 503 503 73.0688 0.0000 4 611 133.9072 503 500 503 133.9072 0.0000 5 645 278.7904 503 503 503 278.7904 0.0000 6 659 138.2976 500 503 503 0.0000 0.0000 7 719 769.5744 503 503 502 0.0000 0.0000
python pandas
Comments
Post a Comment