how sort rows with respect of a group? [duplicate] - python

This question already has answers here:
How to sort a dataFrame in python pandas by two or more columns?
(3 answers)
Closed 1 year ago.
Hi i have panda data frame. I wana sort data with respect of a group id and sorting with respect of order
id title order
2 A 2
2 B 1
2 C 3
3 H 2
3 T 1
out put:
id title order
2 B 1
2 A 2
2 C 3
3 T 1
3 H 2

Since you're not aggregating, you can sort by multiple columns to get the output you want.
import pandas as pd
df = pd.DataFrame({'id': [2, 2, 2, 3, 3],
'title': ['A', 'B', 'C', 'H', 'T'],
'order': [2, 1, 3, 2, 1]})
df = df.sort_values(by=['id', 'order'])
print(df)
Output:
id title order
1 2 B 1
0 2 A 2
2 2 C 3
4 3 T 1
3 3 H 2

Related

tag occurrences of a value multiple times inn column based on dates using pandas [duplicate]

This question already has answers here:
cumulative number of unique elements for pandas dataframe
(4 answers)
Closed 10 months ago.
Below is my dataframe:
df = pd.DataFrame({'ID':[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
'date': ['2020-12-1', '2020-12-2', '2020-12-3', '2020-12-4',
'2020-12-10', '2020-12-11', '2020-12-12', '2020-12-13',
'2020-12-25', '2020-12-26', '2020-12-27', '2020-12-28'],
'name':['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b' , 'a', 'a', 'a', 'a']})
Looks like this:
I want the output with a column added as shown:
The last column adds a unique id for each new occurrence of value 'a' in the column 'name'.
Chain cumsum with shift
df['id2'] = df.name.ne(df.name.shift()).cumsum()
df
Out[456]:
ID date name id2
0 1 2020-12-1 a 1
1 1 2020-12-2 a 1
2 1 2020-12-3 a 1
3 1 2020-12-4 a 1
4 1 2020-12-10 b 2
5 1 2020-12-11 b 2
6 1 2020-12-12 b 2
7 1 2020-12-13 b 2
8 1 2020-12-25 a 3
9 1 2020-12-26 a 3
10 1 2020-12-27 a 3
11 1 2020-12-28 a 3

How do I flatten A Python DataFrame by an index and a list of values? [duplicate]

This question already has answers here:
Unnest (explode) a Pandas Series
(8 answers)
Closed 2 years ago.
Suppose that I have the following code
import pandas as pd
cars = {'Index': [1, 2, 3, 4],
'Values': ['A, B, C, D', 'A, B', 'C', 'D']
}
df = pd.DataFrame(cars, columns = ['Index', 'Values'])
print (df)
which creates a DataFrame that looks like this...
Index Values
0 1 A, B, C, D
1 2 A, B
2 3 C
3 4 D
How do I take that Dataframe, and create a new one which looks like this...
Index Values
0 1 A
1 1 B
2 1 C
3 1 D
4 2 A
5 2 B
6 3 C
7 4 D
df.Values = df.Values.str.split(",")
df = df.explode('Values').reset_index(drop=True)
Index Values
0 1 A
1 1 B
2 1 C
3 1 D
4 2 A
5 2 B
6 3 C
7 4 D

Pandas index in groupby operation [duplicate]

This question already has answers here:
Add a sequential counter column on groups to a pandas dataframe
(4 answers)
Closed 3 years ago.
I am trying to get the index (or running count if you will) of each individual record in a groupby object into a column. I doesn't have to be a groupby, but the order has to remain the same, so for example, I want to sort and reindex by column C:
df = pd.DataFrame([[1, 2, 'Foo'],
[1, 3, 'Foo'],
[4, 6,'Bar'],
[7,8,'Bar']],
columns=['A', 'B', 'C'])
Out[72]:
A B C
0 1 2 Foo
1 1 3 Foo
2 4 6 Bar
3 7 8 Bar
My desired output would be:
Out[75]:
A B C sorted
0 1 2 Foo 1
1 1 3 Foo 2
2 4 6 Bar 1
3 7 8 Bar 2
It seems like this should be really easy, but nothing I've tried really comes close without looping through the entire data frame, which I would prefer to avoid. Thanks
Try with cumcount:
>>> df = pd.DataFrame([[1, 2, 'Foo'],
... [1, 3, 'Foo'],
... [4, 6,'Bar'],
... [7,8,'Bar']],
... columns=['A', 'B', 'C'])
>>> df["sorted"]=df.groupby("C").cumcount()+1
>>> df
A B C sorted
0 1 2 Foo 1
1 1 3 Foo 2
2 4 6 Bar 1
3 7 8 Bar 2

pandas.groubpy.apply executed too many times? [duplicate]

This question already has answers here:
Pandas GroupBy.apply method duplicates first group
(3 answers)
Closed 3 years ago.
I am trying to understand how to use the groupby().apply() function in Pandas, so I made a simple dummy program that would print the grouped dataframe for each group:
import pandas as pd
def dummy(df):
print(df)
return df
df_original = pd.DataFrame({'A': ['a,a,a,a','b,b,b','c','d,d,d', 'e'], 'B': [0, 0, 1, 1, 2]})
print(df_original)
df2 = df_original.groupby('B').apply(dummy)
The output I get however, shows that the first group is printed twice, as if the apply function iterated twice over it:
# original dataframe
A B
0 a,a,a,a 0
1 b,b,b 0
2 c 1
3 d,d,d 1
4 e 2
# output of dummy()
A B
0 a,a,a,a 0
1 b,b,b 0
A B
0 a,a,a,a 0
1 b,b,b 0
A B
2 c 1
3 d,d,d 1
A B
4 e 2
I cannot understand where something so simple can go wrong
You can read what went wrong there as suggested by #Gwendal
If you want a quick fix, then use this
df_original = pd.DataFrame({'A': ['a,a,a,a','b,b,b','c','d,d,d', 'e'], 'B': [0, 0, 1, 1, 2]})
for _ in df_original['B'].unique():
print(df_original[df_original['B']==_])
Output
A B
0 a,a,a,a 0
1 b,b,b 0
A B
2 c 1
3 d,d,d 1
A B
4 e 2

Count the frequency of two different values in a column that share the same value in a different column?

Say I have two different columns within a large transportation dataset, one with a trip id and another with a user id. How can I count the amount of times two people have ridden on the same trip together, i.e. different user id but same trip id?
df = pd.DataFrame([[1, 1, 1, 2, 2, 3, 3, 4, 4, 4, 5, 5], ['A', 'B', 'C', 'A', 'B', 'A', 'B', 'B', 'C', 'D', 'D','A']]).T
df.columns = ['trip_id', 'user_id']
print(df)
trip_id user_id
0 1 A
1 1 B
2 1 C
3 2 A
4 2 B
5 3 A
6 3 B
7 4 B
8 4 C
9 4 D
10 5 D
11 5 A
The ideal output would be a sort of aggregated pivot table or crosstab that displays each user_id and their count of trips with other user_id's, so as to see who has the highest counts of trips together.
I tried something like this:
df5 = pd.crosstab(index=df4['trip_id'], columns=df4['user_id'])
df5['sum'] = df5[df5.columns].sum(axis=1)
df5
user_id A B C D sum
trip_id
1 1 1 1 0 3
2 1 1 0 0 2
3 1 1 0 0 2
4 0 1 1 1 3
5 1 0 0 1 2
which I can use to get the average users per trip, but not the frequency of unique user_ids riding together on a trip.
I also tried some variations with this:
df.trip_id = df.trip_id+'_'+df.groupby(['user_id','trip_id']).cumcount().add(1).astype(str)
df.pivot('trip_id','user_id')
but I'm not getting what I want. I'm not sure if I need to approach this by iterating with a for loop or if I'll need to stack the dataframe from a crosstab to get those aggregate values. Also, I'm trying to avoid having the trip_id and user_id in the original data be aggregated as numerical datatypes since they should not be treated as ints but strings.
Thank you for any insight you may be able to provide!
Here is an example dataset
import pandas as pd
df = pd.DataFrame([[1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3], ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B']]).T
df.columns = ['trip_id', 'user_id']
print(df)
Gives:
trip_id user_id
0 1 A
1 1 B
2 1 C
3 2 A
4 2 B
5 2 C
6 3 A
7 3 B
8 3 C
9 3 A
10 3 B
I think what you're asking for is:
df.groupby(['trip_id', 'user_id']).size()
trip_id user_id
1 A 1
B 1
C 1
2 A 1
B 1
C 1
3 A 2
B 2
C 1
dtype: int64
Am I correct?

Categories

Resources