I am trying to create a column (is_max) that has either 1 if a column B is the maximum in a group of values of column A or 0 if it is not.
Example:
[Input]
A B
1 2
2 3
1 4
2 5
[Output]
A B is_max
1 2 0
2 5 0
1 4 1
2 3 0
What I'm trying:
df['is_max'] = 0
df.loc[df.reset_index().groupby('A')['B'].idxmax(),'is_max'] = 1
Fix your code by remove the reset_index
df['is_max'] = 0
df.loc[df.groupby('A')['B'].idxmax(),'is_max'] = 1
df
Out[39]:
A B is_max
0 1 2 0
1 2 3 0
2 1 4 1
3 2 5 1
I make assumption A is your group now that you did not state
df['is_max']=(df['B']==df.groupby('A')['B'].transform('max')).astype(int)
or
df1.groupby('A')['B'].apply(lambda x: x==x.max()).astype(int)
I have a df1:
a b c
1 0 1 4
2 0 2 5
3 1 1 3
and a second df2:
a b c
1 0 1 5
2 0 2 5
3 1 1 4
These df's have the same goups in a and b. Within groupby of 'a' and 'b' I want df2 underneath df1:
a b c
1 0 1 4
2 0 1 5
3 0 2 5
4 0 2 5
5 1 1 3
6 1 1 4
How can I combine groupby() and concat() to get the desired output?
You can do concat then sort_values
df=pd.concat[df1,df2]).sort_values(['a','b']).reset_index(drop=True)
I have a dataframe that looks like this:
data metadata
A 0
A 1
A 2
A 3
A 4
B 0
B 1
B 2
A 0
A 1
B 0
A 0
A 1
B 0
df.data contains two different categories, A and B. df.metadata stores a running count the number of times a category appears consecutively before the category changes. I want to create a column consecutive_count that assigns the max value of metadata per consecutive group to every row in that group. It should look like this:
data metadata consecutive_count
A 0 4
A 1 4
A 2 4
A 3 4
A 4 4
B 0 2
B 1 2
B 2 2
A 0 1
A 1 1
B 0 0
A 0 1
A 1 1
B 0 0
Please advise. Thank you.
Method 1:
You may try transform max on groupby of each group of data
s = df.data.ne(df.data.shift()).cumsum()
df['consecutive_count'] = df.groupby(s).metadata.transform('max')
Out[96]:
data metadata consecutive_count
0 A 0 4
1 A 1 4
2 A 2 4
3 A 3 4
4 A 4 4
5 B 0 2
6 B 1 2
7 B 2 2
8 A 0 1
9 A 1 1
10 B 0 0
11 A 0 1
12 A 1 1
13 B 0 0
Method 2:
Since metadata is sorted per group, you may reverse dataframe and do groupby cummax
s = df.data.ne(df.data.shift()).cumsum()
df['consecutive_count'] = df[::-1].groupby(s).metadata.cummax()
Out[101]:
data metadata consecutive_count
0 A 0 4
1 A 1 4
2 A 2 4
3 A 3 4
4 A 4 4
5 B 0 2
6 B 1 2
7 B 2 2
8 A 0 1
9 A 1 1
10 B 0 0
11 A 0 1
12 A 1 1
13 B 0 0
I have a data frame and I want to count the number of consecutive entries of one column and record the counts in a separate variable. Here is an example:
ID Class
1 A
1 A
2 A
1 B
1 B
1 B
2 B
1 C
1 C
2 A
2 A
2 A
I want in each group ID to count the number of consecutive classes, so the output would look like this:
ID Class Counts
1 A 0
1 A 1
2 A 0
1 B 0
1 B 1
1 B 2
2 B 0
1 C 0
1 C 1
2 A 0
2 A 1
2 A 2
I am not looking the frequency of occurrence of a specific entries like here, rather the consecutive occurrences of an entry on the ID level
You can use cumcount by Series which is create by cumsum of shifted concanecate values by shift:
#use separator which is not in data like _ or ¥
s = df['ID'].astype(str) + '¥' + df['Class']
df['Counts'] = df.groupby(s.ne(s.shift()).cumsum()).cumcount()
print (df)
ID Class Counts
0 1 A 0
1 1 A 1
2 2 A 0
3 1 B 0
4 1 B 1
5 1 B 2
6 2 B 0
7 1 C 0
8 1 C 1
9 2 A 0
10 2 A 1
11 2 A 2
Another solution with ngroup (pandas 0.20.2+):
s = df.groupby(['ID','Class']).ngroup()
df['Counts'] = df.groupby(s.ne(s.shift()).cumsum()).cumcount()
print (df)
ID Class Counts
0 1 A 0
1 1 A 1
2 2 A 0
3 1 B 0
4 1 B 1
5 1 B 2
6 2 B 0
7 1 C 0
8 1 C 1
9 2 A 0
10 2 A 1
11 2 A 2
i have a pandas data frame
id tag
1 A
1 A
1 B
1 C
1 A
2 B
2 C
2 B
I want to add a column which computes the cumulative number of unique tags over at id level. More specifically, I would like to have
id tag count
1 A 1
1 A 1
1 B 2
1 C 3
1 A 3
2 B 1
2 C 2
2 B 2
For a given id, count will be non-decreasing. Thanks for your help!
I think this does what you want:
unique_count = df.drop_duplicates().groupby('id').cumcount() + 1
unique_count.reindex(df.index).ffill()
The +1 is because the count starts at zero. This only works if the dataframe is sorted by id. Was that intended? You can always sort beforehand.
You can find some other approaches in R and Python here
df = pd.DataFrame({'id':[1,1,1,1,1,2,2,2],'tag':["A","A", "B","C","A","B","C","B"]})
df['count']=df.groupby('id')['tag'].apply(lambda x: (~pd.Series(x).duplicated()).cumsum())
id tag count
0 1 A 1
1 1 A 1
2 1 B 2
3 1 C 3
4 1 A 3
5 2 B 1
6 2 C 2
7 2 B 2
How about this:
d['X'] = 1
d.groupby("Col").X.cumsum()
idt=[1,1,1,1,1,2,2,2]
tag=['A','A','B','C','A','B','C','B']
df=pd.DataFrame(tag,index=idt,columns=['tag'])
df=df.reset_index()
print(df)
index tag
0 1 A
1 1 A
2 1 B
3 1 C
4 1 A
5 2 B
6 2 C
7 2 B
df['uCnt']=df.groupby(['index','tag']).cumcount()+1
print(df)
index tag uCnt
0 1 A 1
1 1 A 2
2 1 B 1
3 1 C 1
4 1 A 3
5 2 B 1
6 2 C 1
7 2 B 2
df['uCnt']=df['uCnt']//df['uCnt']**2
print(df)
index tag uCnt
0 1 A 1
1 1 A 0
2 1 B 1
3 1 C 1
4 1 A 0
5 2 B 1
6 2 C 1
7 2 B 0
df['uCnt']=df.groupby(['index'])['uCnt'].cumsum()
print(df)
index tag uCnt
0 1 A 1
1 1 A 1
2 1 B 2
3 1 C 3
4 1 A 3
5 2 B 1
6 2 C 2
7 2 B 2
df=df.set_index('index')
print(df)
tag uCnt
index
1 A 1
1 A 1
1 B 2
1 C 3
1 A 3
2 B 1
2 C 2
2 B 2