value count is : df['ID'].value_counts().values
-----> array([4,3,3,1], dtype=int64)
input:
ID emp
a 1
a 1
b 1
a 1
b 1
c 1
c 1
a 1
b 1
c 1
d 1
when I jumble the ID column
df.loc[~df.duplicated(keep='first', subset=['ID']), 'emp']= df['ID'].value_counts().values
output:
ID emp
a 4
c 3
d 3
c 1
b 1
a 1
c 1
a 1
b 1
b 1
a 1
expected result:
ID emp
a 4
c 3
d 1
c 1
b 3
a 1
c 1
a 1
b 1
b 1
a 1
problem :the count is not checking the ID before assigning it the emp.
Here is problem ouput of df['ID'].value_counts() is Series with counted values in different number of values like original data, for new column filled by couter value use Series.map:
df.loc[~df.duplicated(subset=['ID']), 'emp'] = df['ID'].map(df['ID'].value_counts())
Or GroupBy.transform with size:
df.loc[~df.duplicated(subset=['ID']), 'emp'] = df.groupby('ID')['ID'].transform('size')
Output Series with 4 values cannot assign back, because different index in df1.index and df['ID'].value_counts().index
print (df['ID'].value_counts())
a 4
b 3
c 3
d 1
Name: ID, dtype: int64
If convert to numpy array only first 4 values are assigned, because in this DataFrame are 4 groups a,b,c,d, so df.duplicated(subset=['ID']) returned 4 times Trues, but in order 4,3,3,1 what reason of wrong output:
print (df['ID'].value_counts().values)
[4 3 3 1]
What need - new column (Series) with same df.index:
print (df['ID'].map(df['ID'].value_counts()))
0 4
1 4
2 3
3 4
4 3
5 3
6 3
7 4
8 3
9 3
10 1
Name: ID, dtype: int64
print (df.groupby('ID')['ID'].transform('size'))
0 4
1 4
2 3
3 4
4 3
5 3
6 3
7 4
8 3
9 3
10 1
Name: ID, dtype: int64
This alone is giving df.loc[~df.duplicated(keep='first', subset=['ID']), 'emp']= df['ID'].value_counts().values desired output for your given sample dataframe
but you can try:
cond=~df.duplicated(keep='first', subset=['ID'])
df.loc[cond,'emp']=df.loc[cond,'ID'].map(df['ID'].value_counts())
How to drop duplicate in that specific way:
Index B C
1 2 1
2 2 0
3 3 1
4 3 1
5 4 0
6 4 0
7 4 0
8 5 1
9 5 0
10 5 1
Desired output :
Index B C
3 3 1
5 4 0
So dropping duplicate on B but if C is the same on all row and keep one sample/record.
For example, B = 3 for index 3/4 but since C = 1 for both, I do not destroy them all
But for example B = 5 for index 8/9/10 since C = 1 or 0, it get destroy.
Try this, using transform with nunique and drop_duplicates:
df[df.groupby('B')['C'].transform('nunique') == 1].drop_duplicates(subset='B')
Output:
B C
Index
3 3 1
5 4 0
I have a DataFrame
name value
A 2
A 4
A 5
A 7
A 8
B 3
B 4
B 8
C 1
C 3
C 5
And I want to get the value differences based on each name
like this
name value dif
A 2 0
A 4 2
A 5 1
A 7 2
A 8 1
B 3 0
B 4 1
B 8 4
C 1 0
C 3 2
C 5 2
Can anyone show me the easiest way?
You can use GroupBy.diff to compute the difference between consecutive rows per grouped object. Optionally, filling missing values( first row in every group) by 0 and casting them finally as integers.
df['dif'] = df.groupby('name')['value'].diff().fillna(0).astype(int)
df
I have the following pandas DataFrame:
a b c
1 s 5
1 w 5
2 s 5
3 s 6
3 e 6
3 e 5
I need to count duplicate rows for each unique value of a to obtain the following result:
a qty
1 2
2 1
3 3
How to do this in python?
You can use groupby:
g = df.groupby('a').size()
This returns:
a
1 2
2 1
3 3
dtype: int64
EDIT: rename only the single new column of counts.
If you need a new column you can:
g = df1.groupby('a').size().reset_index().rename(columns={0:'qty'})
to obtain:
a qty
0 1 2
1 2 1
2 3 3
I have a data frame:
>>> data
Name Score
0 a 3
1 b 2
2 a 1
3 c 4
4 c 5
5 d 3
I want to combine the rows with same name, adding score rows, so I want to get the following result:
Name Score
0 a 4
1 b 2
2 c 9
3 d 3
Is there an effective solution?
data.groupby('Name').sum()['Score'].reset_index()