Python value difference in dataframe by group key

Python value difference in dataframe by group key - python

I have a DataFrame
name value
A 2
A 4
A 5
A 7
A 8
B 3
B 4
B 8
C 1
C 3
C 5
And I want to get the value differences based on each name
like this
name value dif
A 2 0
A 4 2
A 5 1
A 7 2
A 8 1
B 3 0
B 4 1
B 8 4
C 1 0
C 3 2
C 5 2
Can anyone show me the easiest way?

You can use GroupBy.diff to compute the difference between consecutive rows per grouped object. Optionally, filling missing values( first row in every group) by 0 and casting them finally as integers.
df['dif'] = df.groupby('name')['value'].diff().fillna(0).astype(int)
df

Related

pandas dataframe duplicate values count not properly working

value count is : df['ID'].value_counts().values
-----> array([4,3,3,1], dtype=int64)
input:
ID emp
a 1
a 1
b 1
a 1
b 1
c 1
c 1
a 1
b 1
c 1
d 1
when I jumble the ID column
df.loc[~df.duplicated(keep='first', subset=['ID']), 'emp']= df['ID'].value_counts().values
output:
ID emp
a 4
c 3
d 3
c 1
b 1
a 1
c 1
a 1
b 1
b 1
a 1
expected result:
ID emp
a 4
c 3
d 1
c 1
b 3
a 1
c 1
a 1
b 1
b 1
a 1
problem :the count is not checking the ID before assigning it the emp.

Here is problem ouput of df['ID'].value_counts() is Series with counted values in different number of values like original data, for new column filled by couter value use Series.map:
df.loc[~df.duplicated(subset=['ID']), 'emp'] = df['ID'].map(df['ID'].value_counts())
Or GroupBy.transform with size:
df.loc[~df.duplicated(subset=['ID']), 'emp'] = df.groupby('ID')['ID'].transform('size')
Output Series with 4 values cannot assign back, because different index in df1.index and df['ID'].value_counts().index
print (df['ID'].value_counts())
a 4
b 3
c 3
d 1
Name: ID, dtype: int64
If convert to numpy array only first 4 values are assigned, because in this DataFrame are 4 groups a,b,c,d, so df.duplicated(subset=['ID']) returned 4 times Trues, but in order 4,3,3,1 what reason of wrong output:
print (df['ID'].value_counts().values)
[4 3 3 1]
What need - new column (Series) with same df.index:
print (df['ID'].map(df['ID'].value_counts()))
0 4
1 4
2 3
3 4
4 3
5 3
6 3
7 4
8 3
9 3
10 1
Name: ID, dtype: int64
print (df.groupby('ID')['ID'].transform('size'))
0 4
1 4
2 3
3 4
4 3
5 3
6 3
7 4
8 3
9 3
10 1
Name: ID, dtype: int64

This alone is giving df.loc[~df.duplicated(keep='first', subset=['ID']), 'emp']= df['ID'].value_counts().values desired output for your given sample dataframe
but you can try:
cond=~df.duplicated(keep='first', subset=['ID'])
df.loc[cond,'emp']=df.loc[cond,'ID'].map(df['ID'].value_counts())

Pandas drop duplicate base on 2 columns, having differents value

How to drop duplicate in that specific way:
Index B C
1 2 1
2 2 0
3 3 1
4 3 1
5 4 0
6 4 0
7 4 0
8 5 1
9 5 0
10 5 1
Desired output :
Index B C
3 3 1
5 4 0
So dropping duplicate on B but if C is the same on all row and keep one sample/record.
For example, B = 3 for index 3/4 but since C = 1 for both, I do not destroy them all
But for example B = 5 for index 8/9/10 since C = 1 or 0, it get destroy.

Try this, using transform with nunique and drop_duplicates:
df[df.groupby('B')['C'].transform('nunique') == 1].drop_duplicates(subset='B')
Output:
B C
Index
3 3 1
5 4 0

search column name based on matching row values

I have a data frame like below:
A B C D E F Input
1 2 3 4 5 6 1
1 2 3 4 5 6 3
I want an output column where I can get the column name, something like below:
A B C D E F Input Output
1 2 3 4 5 6 1 A
1 2 3 4 5 6 3 C
As you can see above that in row 1, Input has value 1 and column A also has value 1, so the output is A.

We can do idxmax
df['Output']=df.drop('Input',1).eq(df.Input,0).idxmax(1)
df['Output']
0 A
1 C
dtype: object

Alternative with .dot:
df.drop('Input',1).eq(df['Input'],axis=0).dot(df.columns.difference(['Input']))
0 A
1 C

search for duplicated consecutive rows and put in additional column pandas

I have a df:
df1
a b c d
0 2 4 1
0 2 5 1
0 1 6 2
1 2 7 2
1 1 8 1
1 1 4 1
I need to group by a and b and if two consecutive values in d are = 1 within groups, I want c in a column next to the row . Like:
df1
a b c d c1
0 2 4 1 5
0 1 6 2 nan
1 2 7 2 nan
1 1 8 1 4
Any ideas?
I tried
df1.groupby([df1.a, df1.b, d.diff().ne(0)]
then loc() only the rows with 1s and merge the two dataframes again, but the first function is not completely correct.

Count duplicate rows for each unique row value

I have the following pandas DataFrame:
a b c
1 s 5
1 w 5
2 s 5
3 s 6
3 e 6
3 e 5
I need to count duplicate rows for each unique value of a to obtain the following result:
a qty
1 2
2 1
3 3
How to do this in python?

You can use groupby:
g = df.groupby('a').size()
This returns:
a
1 2
2 1
3 3
dtype: int64
EDIT: rename only the single new column of counts.
If you need a new column you can:
g = df1.groupby('a').size().reset_index().rename(columns={0:'qty'})
to obtain:
a qty
0 1 2
1 2 1
2 3 3

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python value difference in dataframe by group key - python

I have a DataFrame name value A 2 A 4 A 5 A 7 A 8 B 3 B 4 B 8 C 1 C 3 C 5 And I want to get the value differences based on each name like this name value dif A 2 0 A 4 2 A 5 1 A 7 2 A 8 1 B 3 0 B 4 1 B 8 4 C 1 0 C 3 2 C 5 2 Can anyone show me the easiest way?

You can use GroupBy.diff to compute the difference between consecutive rows per grouped object. Optionally, filling missing values( first row in every group) by 0 and casting them finally as integers. df['dif'] = df.groupby('name')['value'].diff().fillna(0).astype(int) df

Related

pandas dataframe duplicate values count not properly working

Pandas drop duplicate base on 2 columns, having differents value

search column name based on matching row values

search for duplicated consecutive rows and put in additional column pandas

Count duplicate rows for each unique row value

Categories

Resources