Sort pandas dataframe within groups

Sort pandas dataframe within groups - python

I have a dataframe:
>>> df
Category Score
0 A 1
1 A 2
2 A 3
3 B 5
4 B 9
I expect the output:
Sorting Score within Category.
>>> df
Category Score
2 A 3
1 A 2
0 A 1
4 B 9
3 B 5
Any ideas?

Use sort_values by mention order.
In [17]: df.sort_values(by=['Category', 'Score'], ascending=[True, False])
Out[17]:
Category Score
2 A 3
1 A 2
0 A 1
4 B 9
3 B 5

Related

Count unique values for each group in multi column with criteria in Pandas

UPDATED THE SAMPLE DATASET
I have the following data:
location ID Value
A 1 1
A 1 1
A 1 1
A 1 1
A 1 2
A 1 2
A 1 2
A 1 2
A 1 3
A 1 4
A 2 1
A 2 2
A 3 1
A 3 2
B 4 1
B 4 2
B 5 1
B 5 1
B 5 2
B 5 2
B 6 1
B 6 1
B 6 1
B 6 1
B 6 1
B 6 2
B 6 2
B 6 2
B 7 1
I want to count unique Values (only if value is equals to 1 or 2) for each location and for each ID for the following output.
location ID_Count Value_Count
A 3 6
B 4 7
I tried using df.groupby(['location'])['ID','value'].nunique(), but I am getting only the unique count of values, like for I am getting value_count for A as 4 and for B as 2.

Try agg with slice on ID on True values.
For your updated sample, you just need to drop duplicates before processing. The rest is the same
df = df.drop_duplicates(['location', 'ID', 'Value'])
df_agg = (df.Value.isin([1,2]).groupby(df.location)
.agg(ID_count=lambda x: df.loc[x[x].index, 'ID'].nunique(),
Value_count='sum'))
Out[93]:
ID_count Value_count
location
A 3 6
B 4 7

IIUC, You can try series.isin with groupby.agg
out = (df.assign(Value_Count=df['Value'].isin([1,2])).groupby("location",as_index=False)
.agg({"ID":'nunique',"Value_Count":'sum'}))
print(out)
location ID Value_Count
0 A 3 6.0
1 B 4 7.0

Roughly same as anky, but then using Series.where and named aggregations so we can rename the columns while creating them in the groupby.
grp = df.assign(Value=df['Value'].where(df['Value'].isin([1, 2]))).groupby('location')
grp.agg(
ID_count=('ID', 'nunique'),
Value_count=('Value', 'count')
).reset_index()
location ID_count Value_count
0 A 3 6
1 B 4 7

Let's try a very similar approach to other answers. This time we filter first:
(df[df['Value'].isin([1,2])]
.groupby(['location'],as_index=False)
.agg({'ID':'nunique', 'Value':'size'})
)
Output:
location ID Value
0 A 3 6
1 B 4 7

Split rows into multiple rows based on column value

Input DF:
Index Parameters A B C
1 Apple 1 2 3
2 Banana 2 4 5
3 Potato 3 5 2
4 Tomato 1 x 4 1 x 6 2 x 12
Output DF
Index Parameters A B C
1 Apple 1 2 3
2 Banana 2 4 5
3 Potato 3 5 2
4 Tomato_P 1 1 2
5 Tomato_Q 4 6 12
Problem Statement:
I want convert a row of data into multiple rows based on particular column value (Tomato) and with split parameter as x
Code/Findings:
I have a code which works well if I transpose this data set and then apply the answer from here or here and then re-transpose the same.
Looking for a solution which can directly work on the given dataframe

Solution if always only one x values in data - first Series.str.split by columns in list, then Series.explode, added all another columns by DataFrame.join and set _P with _Q with Series.duplicated and numpy.select:
cols = ['A','B','C']
df[cols] = df[cols].apply(lambda x : x.str.split(' x '))
df1 = pd.concat([df[x].explode() for x in cols],axis=1)
#print (df1)
df = df[df.columns.difference(cols)].join(df1)
df['Parameters'] += np.select([df.index.duplicated(keep='last'),
df.index.duplicated()],
['_P','_Q'],
default='')
df = df.reset_index(drop=True)
print (df)
Parameters A B C
0 Apple 1 2 3
1 Banana 2 4 5
2 Potato 3 5 2
3 Tomato_P 1 1 2
4 Tomato_Q 4 6 12
EDIT:
Answer with no explode:
cols = df.columns[1:]
df1 = (pd.concat([df[x].str.split(' x ', expand=True).stack() for x in cols],axis=1, keys=cols)
.reset_index(level=1, drop=True))
print (df1)
A B C
Index
1 1 2 3
2 2 4 5
3 3 5 2
4 1 1 2
4 4 6 12
df = df.iloc[:, [0]].join(df1)
df['Parameters'] += np.select([df.index.duplicated(keep='last'),
df.index.duplicated()],
['_P','_Q'],
default='')
df = df.reset_index(drop=True)
print (df)
Parameters A B C
0 Apple 1 2 3
1 Banana 2 4 5
2 Potato 3 5 2
3 Tomato_P 1 1 2
4 Tomato_Q 4 6 1

This is more like a explode problem , available after pandas 0.25
df[['A','B','C']]=df[['A','B','C']].apply(lambda x : x.str.split(' x '))
df
Index Parameters A B C
0 1 Apple [1] [2] [3]
1 2 Banana [2] [4] [5]
2 3 Potato [3] [5] [2]
3 4 Tomato [1, 4] [1, 6] [2, 12]
df.set_index(['Index','Parameters'],inplace=True)
pd.concat([df[x].explode() for x in ['A','B','C']],axis=1)
A B C
Index Parameters
1 Apple 1 2 3
2 Banana 2 4 5
3 Potato 3 5 2
4 Tomato 1 1 2
Tomato 4 6 12

Aggregate data frame rows based on conditions

I have this table
A B C E
1 2 1 3
1 2 4 4
2 7 1 1
3 4 0 2
3 4 8 3
Now, I want to remove duplicates based on column A and B and at the same time sum up column C. For E, it should take the value where C shows the max value. The desirable result table should look like this:
A B C E
1 2 5 4
2 7 1 1
3 4 8 3
I tried this: df.groupby(['A', 'B']).sum()['C'] but my data frame does not change at all as I am thinking that I didn't incorporate the E column part properly...Can somebody advise?
Thanks so much!

If the first and second rows are duplicates, we can group by them.
In [20]: df
Out[20]:
A B C E
0 1 1 5 4
1 1 1 1 1
2 3 3 8 3
In [21]: df.groupby(['A', 'B'])['C'].sum()
Out[21]:
A B
1 1 6
3 3 8
Name: C, dtype: int64
I tried this: df.groupby(['A', 'B']).sum()['C'] but my data frame does not change at all
yes, it's because pandas didn't overwrite initial DataFrame
In [22]: df
Out[22]:
A B C E
0 1 1 5 4
1 1 1 1 1
2 3 3 8 3
You have to overwrite it explicitly.
In [23]: df = df.groupby(['A', 'B'])['C'].sum()
In [24]: df
Out[24]:
A B
1 1 6
3 3 8
Name: C, dtype: int64

Pandas: Sort the column on frequency by another column having same value grouped

I've dataframe which is group by y column and sorted on their count column of y column.
Code:
df['count'] = df.groupby(['y'])['y'].transform(pd.Series.value_counts)
df = df.sort('count', ascending=False)
Output:
x y count
1 a 4
3 a 4
2 a 4
1 a 4
2 c 3
1 c 3
2 c 3
2 b 2
1 b 2
Now, I want to sort x column on its frequency having same values grouped on y column like below:
Expected Output:
x y count
1 a 4
1 a 4
2 a 4
3 a 4
2 c 3
2 c 3
1 c 3
2 b 2
1 b 2

It seems you need groupby and value_counts and then numpy.repeat for expand index values by their counts to DataFrame:
s = df.groupby('y', sort=False)['x'].value_counts()
#alternative
#s = df.groupby('y', sort=False)['x'].apply(pd.Series.value_counts)
print (s)
y x
a 1 2
2 1
3 1
c 2 2
1 1
b 1 1
2 1
Name: x, dtype: int64
df1 = pd.DataFrame(np.repeat(s.index.values, s.values).tolist(), columns=['y','x'])
#change order of columns
df1 = df1.reindex_axis(['x','y'], axis=1)
print (df1)
x y
0 1 a
1 1 a
2 2 a
3 3 a
4 2 c
5 2 c
6 1 c
7 1 b
8 2 b

If you are using an older version where df.sort_values is not supported. you can use:
df.sort(columns=['count','x'], ascending=[False,True])

How to groupby based on two columns in pandas?

A similar question might have been asked before, but I couldn't find the exact one fitting to my problem.
I want to group by a dataframe based on two columns.
For exmaple to make this
id product quantity
1 A 2
1 A 3
1 B 2
2 A 1
2 B 1
3 B 2
3 B 1
Into this:
id product quantity
1 A 5
1 B 2
2 A 1
2 B 1
3 B 3
Meaning that summation on "quantity" column for same "id" and same "product".

You need groupby with parameter as_index=False for return DataFrame and aggregating mean:
df = df.groupby(['id','product'], as_index=False)['quantity'].sum()
print (df)
id product quantity
0 1 A 5
1 1 B 2
2 2 A 1
3 2 B 1
4 3 B 3
Or add reset_index:
df = df.groupby(['id','product'])['quantity'].sum().reset_index()
print (df)
id product quantity
0 1 A 5
1 1 B 2
2 2 A 1
3 2 B 1
4 3 B 3

You can use pivot_table with aggfunc='sum'
df.pivot_table('quantity', ['id', 'product'], aggfunc='sum').reset_index()
id product quantity
0 1 A 5
1 1 B 2
2 2 A 1
3 2 B 1
4 3 B 3

You can use groupby and aggregate function
import pandas as pd
df = pd.DataFrame({
'id': [1,1,1,2,2,3,3],
'product': ['A','A','B','A','B','B','B'],
'quantity': [2,3,2,1,1,2,1]
})
print df
id product quantity
0 1 A 2
1 1 A 3
2 1 B 2
3 2 A 1
4 2 B 1
5 3 B 2
6 3 B 1
df = df.groupby(['id','product']).agg({'quantity':'sum'}).reset_index()
print df
id product quantity
0 1 A 5
1 1 B 2
2 2 A 1
3 2 B 1
4 3 B 3

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sort pandas dataframe within groups - python

I have a dataframe: >>> df Category Score 0 A 1 1 A 2 2 A 3 3 B 5 4 B 9 I expect the output: Sorting Score within Category. >>> df Category Score 2 A 3 1 A 2 0 A 1 4 B 9 3 B 5 Any ideas?

Use sort_values by mention order. In [17]: df.sort_values(by=['Category', 'Score'], ascending=[True, False]) Out[17]: Category Score 2 A 3 1 A 2 0 A 1 4 B 9 3 B 5

Related

Count unique values for each group in multi column with criteria in Pandas

Split rows into multiple rows based on column value

Aggregate data frame rows based on conditions

Pandas: Sort the column on frequency by another column having same value grouped

How to groupby based on two columns in pandas?

Categories

Resources