How to groupby based on two columns in pandas?

How to groupby based on two columns in pandas? - python

A similar question might have been asked before, but I couldn't find the exact one fitting to my problem.
I want to group by a dataframe based on two columns.
For exmaple to make this
id product quantity
1 A 2
1 A 3
1 B 2
2 A 1
2 B 1
3 B 2
3 B 1
Into this:
id product quantity
1 A 5
1 B 2
2 A 1
2 B 1
3 B 3
Meaning that summation on "quantity" column for same "id" and same "product".

You need groupby with parameter as_index=False for return DataFrame and aggregating mean:
df = df.groupby(['id','product'], as_index=False)['quantity'].sum()
print (df)
id product quantity
0 1 A 5
1 1 B 2
2 2 A 1
3 2 B 1
4 3 B 3
Or add reset_index:
df = df.groupby(['id','product'])['quantity'].sum().reset_index()
print (df)
id product quantity
0 1 A 5
1 1 B 2
2 2 A 1
3 2 B 1
4 3 B 3

You can use pivot_table with aggfunc='sum'
df.pivot_table('quantity', ['id', 'product'], aggfunc='sum').reset_index()
id product quantity
0 1 A 5
1 1 B 2
2 2 A 1
3 2 B 1
4 3 B 3

You can use groupby and aggregate function
import pandas as pd
df = pd.DataFrame({
'id': [1,1,1,2,2,3,3],
'product': ['A','A','B','A','B','B','B'],
'quantity': [2,3,2,1,1,2,1]
})
print df
id product quantity
0 1 A 2
1 1 A 3
2 1 B 2
3 2 A 1
4 2 B 1
5 3 B 2
6 3 B 1
df = df.groupby(['id','product']).agg({'quantity':'sum'}).reset_index()
print df
id product quantity
0 1 A 5
1 1 B 2
2 2 A 1
3 2 B 1
4 3 B 3

Related

Merging a groupby object with another data frame

I have a data frame with repesenting the sales of an item:
import pandas as pd
data = {'id': [1,1,1,1,2,2], 'week': [1,2,2,3,1,3], 'quantity': [1,2,4,3,2,2]}
df_sales = pd.DataFrame(data)
🐍 >>> df_sales
id week quantity
0 1 1 1
1 1 2 2
2 1 3 3
3 2 1 2
4 2 3 2
I have another data frame that represents the available weeks:
data = {'week': [1,2,3]}
df_week = pd.DataFrame(data)
🐍 >>> df_week
week
0 1
1 2
2 3
I want to groupby the id and the week and compute the mean, which I do as follows:
df = df_sales.groupby(by=['id', 'week'], as_index=False).mean()
🐍 >>> df
id week quantity
0 1 1 1
1 1 2 3
2 1 3 3
3 2 1 2
4 2 3 2
However, I want to fill the missing week values (present in df_week) with 0, such that the output is:
🐍 >>> df
id week quantity
0 1 1 1
1 1 2 3
2 1 3 3
3 2 1 2
4 2 2 0
4 2 3 2
Is it possible to merge the groupby with the df_week data frame?

We can reindex after groupby
# group and aggregate
df = df_sales.groupby(['id', 'week']).mean()
# define new MultiIndex
idx = pd.MultiIndex.from_product([df.index.levels[0], df_week['week']])
# reindex with fill_value=0
df = df.reindex(idx, fill_value=0).reset_index()
print(df)
id week quantity
0 1 1 1
1 1 2 3
2 1 3 3
3 2 1 2
4 2 2 0
5 2 3 2

Since all unique id and week combinations are needed in the result, one way is to first prepare a combinations frame with pd.merge passed how="cross":
combs = pd.merge(df_sales.id.drop_duplicates(), df_week.week, how="cross")
or for versions below 1.2
combs = pd.merge(df_sales.id.drop_duplicates().to_frame().assign(key=1),
df_week.week.to_frame().assign(key=1), on="key").drop(columns="key")
which gives
>>> combs
id week
0 1 1
1 1 2
2 1 3
3 2 1
4 2 2
5 2 3
Now we can left merge this with df that has the means filling NaNs with 0:
result = combs.merge(df, how="left", on=["id", "week"]).fillna(0, downcast="infer")
where downcast is to go back to integers from float type because of NaN(s) that appeared in the intermediate step,
to get
>>> result
id week quantity
0 1 1 1
1 1 2 3
2 1 3 3
3 2 1 2
4 2 2 0
5 2 3 2

Count unique values for each group in multi column with criteria in Pandas

UPDATED THE SAMPLE DATASET
I have the following data:
location ID Value
A 1 1
A 1 1
A 1 1
A 1 1
A 1 2
A 1 2
A 1 2
A 1 2
A 1 3
A 1 4
A 2 1
A 2 2
A 3 1
A 3 2
B 4 1
B 4 2
B 5 1
B 5 1
B 5 2
B 5 2
B 6 1
B 6 1
B 6 1
B 6 1
B 6 1
B 6 2
B 6 2
B 6 2
B 7 1
I want to count unique Values (only if value is equals to 1 or 2) for each location and for each ID for the following output.
location ID_Count Value_Count
A 3 6
B 4 7
I tried using df.groupby(['location'])['ID','value'].nunique(), but I am getting only the unique count of values, like for I am getting value_count for A as 4 and for B as 2.

Try agg with slice on ID on True values.
For your updated sample, you just need to drop duplicates before processing. The rest is the same
df = df.drop_duplicates(['location', 'ID', 'Value'])
df_agg = (df.Value.isin([1,2]).groupby(df.location)
.agg(ID_count=lambda x: df.loc[x[x].index, 'ID'].nunique(),
Value_count='sum'))
Out[93]:
ID_count Value_count
location
A 3 6
B 4 7

IIUC, You can try series.isin with groupby.agg
out = (df.assign(Value_Count=df['Value'].isin([1,2])).groupby("location",as_index=False)
.agg({"ID":'nunique',"Value_Count":'sum'}))
print(out)
location ID Value_Count
0 A 3 6.0
1 B 4 7.0

Roughly same as anky, but then using Series.where and named aggregations so we can rename the columns while creating them in the groupby.
grp = df.assign(Value=df['Value'].where(df['Value'].isin([1, 2]))).groupby('location')
grp.agg(
ID_count=('ID', 'nunique'),
Value_count=('Value', 'count')
).reset_index()
location ID_count Value_count
0 A 3 6
1 B 4 7

Let's try a very similar approach to other answers. This time we filter first:
(df[df['Value'].isin([1,2])]
.groupby(['location'],as_index=False)
.agg({'ID':'nunique', 'Value':'size'})
)
Output:
location ID Value
0 A 3 6
1 B 4 7

Concatenate groups of multiple dataframes

I have a df1:
a b c
1 0 1 4
2 0 2 5
3 1 1 3
and a second df2:
a b c
1 0 1 5
2 0 2 5
3 1 1 4
These df's have the same goups in a and b. Within groupby of 'a' and 'b' I want df2 underneath df1:
a b c
1 0 1 4
2 0 1 5
3 0 2 5
4 0 2 5
5 1 1 3
6 1 1 4
How can I combine groupby() and concat() to get the desired output?

You can do concat then sort_values
df=pd.concat[df1,df2]).sort_values(['a','b']).reset_index(drop=True)

Group identical consecutive values in pandas DataFrame

I have the following pandas dataframe :
a
0 0
1 0
2 1
3 2
4 2
5 2
6 3
7 2
8 2
9 1
I want to store the values in another dataframe such as every group of consecutive indentical values make a labeled group like this :
A B
0 0 2
1 1 1
2 2 3
3 3 1
4 2 2
5 1 1
The column A represent the value of the group and B represents the number of occurences.
this is what i've done so far:
df = pd.DataFrame({'a':[0,0,1,2,2,2,3,2,2,1]})
df2 = pd.DataFrame()
for i,g in df.groupby([(df.a != df.a.shift()).cumsum()]):
vc = g.a.value_counts()
df2 = df2.append({'A':vc.index[0], 'B': vc.iloc[0]}, ignore_index=True).astype(int)
It works but it's a bit messy.
Do you think of a shortest/better way of doing this ?

use GrouBy.agg in Pandas >0.25.0:
new_df= ( df.groupby(df['a'].ne(df['a'].shift()).cumsum(),as_index=False)
.agg(A=('a','first'),B=('a','count')) )
print(new_df)
A B
0 0 2
1 1 1
2 2 3
3 3 1
4 2 2
5 1 1
pandas <0.25.0
new_df= ( df.groupby(df['a'].ne(df['a'].shift()).cumsum(),as_index=False)
.a
.agg({'A':'first','B':'count'}) )

I would try:
df['blocks'] = df['a'].ne(df['a'].shift()).cumsum()
(df.groupby(['a','blocks'],
as_index=False,
sort=False)
.count()
.drop('blocks', axis=1)
)
Output:
a B
0 0 2
1 1 1
2 2 3
3 3 1
4 2 2
5 1 1

Sort pandas dataframe within groups

I have a dataframe:
>>> df
Category Score
0 A 1
1 A 2
2 A 3
3 B 5
4 B 9
I expect the output:
Sorting Score within Category.
>>> df
Category Score
2 A 3
1 A 2
0 A 1
4 B 9
3 B 5
Any ideas?

Use sort_values by mention order.
In [17]: df.sort_values(by=['Category', 'Score'], ascending=[True, False])
Out[17]:
Category Score
2 A 3
1 A 2
0 A 1
4 B 9
3 B 5

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to groupby based on two columns in pandas? - python

You can use pivot_table with aggfunc='sum' df.pivot_table('quantity', ['id', 'product'], aggfunc='sum').reset_index() id product quantity 0 1 A 5 1 1 B 2 2 2 A 1 3 2 B 1 4 3 B 3

Related

Merging a groupby object with another data frame

Count unique values for each group in multi column with criteria in Pandas

Concatenate groups of multiple dataframes

Group identical consecutive values in pandas DataFrame

Sort pandas dataframe within groups

Categories

Resources