python - Groupby multiple columns -
python - Groupby multiple columns -
i have dataframe want sum values in 20 different columns based on mutual enteries in 'value' column
here how single column:
df.groupby('value').aggregate({'count':numpy.sum},as_index=false)
is there improve way extend 20 columns not write names out explicitly? i.e, way pass list of column names.
please see hernamesbarbara's reply below illustration can used illustrate thhis issue.
you can select columns sum list of column names using sub notation on pandas group. you're looking for?
import numpy np import pandas pd info = { "dim1": [np.random.choice(['foo', 'bar']) _ in range(10)], "measure1": np.random.random_integers(0, 100, 10), "measure2": np.random.random_integers(0, 100, 10) } df = pd.dataframe(data) df out[1]: dim1 measure1 measure2 0 bar 9 86 1 bar 24 64 2 bar 47 46 3 foo 60 98 4 bar 94 53 5 foo 95 89 6 foo 98 9 7 bar 4 95 8 foo 63 66 9 foo 40 47 df.groupby(['dim1'])['measure1', 'measure2'].sum() out[2]: measure1 measure2 dim1 bar 178 344 foo 356 309
update 2015-01-02 delayed reply comment below, improve late never
if don't know how many columns have know column naming convention, build list of columns aggregate dynamically. here's 1 way:
colnames = ["measure".format(i+1) in range(100)] # create 100 false columns df = pd.dataframe(np.ones((10, 100)), columns=colnames) df['dim1'] = [np.random.choice(['foo', 'bar']) _ in range(10)] # add together false dimension groupby desired_columns = [col col in df.columns if "94" in col or "95" in col] # select columns 94 , 95 df.groupby(['dim1'])[desired_columns].sum() out[52]: measure94 measure95 dim1 bar 4 4 foo 6 6
python pandas
Comments
Post a Comment