Combine the dataframe in Pandas

Combine the dataframe in Pandas - python

I have a data frame:
>>> data
Name Score
0 a 3
1 b 2
2 a 1
3 c 4
4 c 5
5 d 3
I want to combine the rows with same name, adding score rows, so I want to get the following result:
Name Score
0 a 4
1 b 2
2 c 9
3 d 3
Is there an effective solution?

data.groupby('Name').sum()['Score'].reset_index()

Related

pandas dataframe duplicate values count not properly working

value count is : df['ID'].value_counts().values
-----> array([4,3,3,1], dtype=int64)
input:
ID emp
a 1
a 1
b 1
a 1
b 1
c 1
c 1
a 1
b 1
c 1
d 1
when I jumble the ID column
df.loc[~df.duplicated(keep='first', subset=['ID']), 'emp']= df['ID'].value_counts().values
output:
ID emp
a 4
c 3
d 3
c 1
b 1
a 1
c 1
a 1
b 1
b 1
a 1
expected result:
ID emp
a 4
c 3
d 1
c 1
b 3
a 1
c 1
a 1
b 1
b 1
a 1
problem :the count is not checking the ID before assigning it the emp.

Here is problem ouput of df['ID'].value_counts() is Series with counted values in different number of values like original data, for new column filled by couter value use Series.map:
df.loc[~df.duplicated(subset=['ID']), 'emp'] = df['ID'].map(df['ID'].value_counts())
Or GroupBy.transform with size:
df.loc[~df.duplicated(subset=['ID']), 'emp'] = df.groupby('ID')['ID'].transform('size')
Output Series with 4 values cannot assign back, because different index in df1.index and df['ID'].value_counts().index
print (df['ID'].value_counts())
a 4
b 3
c 3
d 1
Name: ID, dtype: int64
If convert to numpy array only first 4 values are assigned, because in this DataFrame are 4 groups a,b,c,d, so df.duplicated(subset=['ID']) returned 4 times Trues, but in order 4,3,3,1 what reason of wrong output:
print (df['ID'].value_counts().values)
[4 3 3 1]
What need - new column (Series) with same df.index:
print (df['ID'].map(df['ID'].value_counts()))
0 4
1 4
2 3
3 4
4 3
5 3
6 3
7 4
8 3
9 3
10 1
Name: ID, dtype: int64
print (df.groupby('ID')['ID'].transform('size'))
0 4
1 4
2 3
3 4
4 3
5 3
6 3
7 4
8 3
9 3
10 1
Name: ID, dtype: int64

This alone is giving df.loc[~df.duplicated(keep='first', subset=['ID']), 'emp']= df['ID'].value_counts().values desired output for your given sample dataframe
but you can try:
cond=~df.duplicated(keep='first', subset=['ID'])
df.loc[cond,'emp']=df.loc[cond,'ID'].map(df['ID'].value_counts())

Pandas drop duplicate base on 2 columns, having differents value

How to drop duplicate in that specific way:
Index B C
1 2 1
2 2 0
3 3 1
4 3 1
5 4 0
6 4 0
7 4 0
8 5 1
9 5 0
10 5 1
Desired output :
Index B C
3 3 1
5 4 0
So dropping duplicate on B but if C is the same on all row and keep one sample/record.
For example, B = 3 for index 3/4 but since C = 1 for both, I do not destroy them all
But for example B = 5 for index 8/9/10 since C = 1 or 0, it get destroy.

Try this, using transform with nunique and drop_duplicates:
df[df.groupby('B')['C'].transform('nunique') == 1].drop_duplicates(subset='B')
Output:
B C
Index
3 3 1
5 4 0

search column name based on matching row values

I have a data frame like below:
A B C D E F Input
1 2 3 4 5 6 1
1 2 3 4 5 6 3
I want an output column where I can get the column name, something like below:
A B C D E F Input Output
1 2 3 4 5 6 1 A
1 2 3 4 5 6 3 C
As you can see above that in row 1, Input has value 1 and column A also has value 1, so the output is A.

We can do idxmax
df['Output']=df.drop('Input',1).eq(df.Input,0).idxmax(1)
df['Output']
0 A
1 C
dtype: object

Alternative with .dot:
df.drop('Input',1).eq(df['Input'],axis=0).dot(df.columns.difference(['Input']))
0 A
1 C

How to pandas groupby one column and filter dataframe based on the minimum unique values of another column?

I have a data frame that looks like this:
CP AID type
1 1 b
1 2 b
1 3 a
2 4 a
2 4 b
3 5 b
3 6 a
3 7 b
I would like to groupby the CP column and filter so it only returns rows where the CP has at least 3 unique 'pairs' from the AID column.
The result should look like this:
CP AID type
1 1 b
1 2 b
1 3 a
3 5 b
3 6 a
3 7 b

You can groupby in combination with unique:
m = df.groupby('CP').AID.transform('unique').str.len() >= 3
print(df[m])
CP AID type
0 1 1 b
1 1 2 b
2 1 3 a
5 3 5 b
6 3 6 a
7 3 7 b
Or as RafaelC mentioned in the comments:
m = df.groupby('CP').AID.transform('nunique').ge(3)
print(df[m])
CP AID type
0 1 1 b
1 1 2 b
2 1 3 a
5 3 5 b
6 3 6 a
7 3 7 b

You can do that:
count = df1[['CP', 'AID']].groupby('CP').count().reset_index()
df1 = df1[df1['CP'].isin(count.loc[count['AID'] == 3,'CP'].values.tolist())]

Count duplicate rows for each unique row value

I have the following pandas DataFrame:
a b c
1 s 5
1 w 5
2 s 5
3 s 6
3 e 6
3 e 5
I need to count duplicate rows for each unique value of a to obtain the following result:
a qty
1 2
2 1
3 3
How to do this in python?

You can use groupby:
g = df.groupby('a').size()
This returns:
a
1 2
2 1
3 3
dtype: int64
EDIT: rename only the single new column of counts.
If you need a new column you can:
g = df1.groupby('a').size().reset_index().rename(columns={0:'qty'})
to obtain:
a qty
0 1 2
1 2 1
2 3 3

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Combine the dataframe in Pandas - python

I have a data frame: >>> data Name Score 0 a 3 1 b 2 2 a 1 3 c 4 4 c 5 5 d 3 I want to combine the rows with same name, adding score rows, so I want to get the following result: Name Score 0 a 4 1 b 2 2 c 9 3 d 3 Is there an effective solution?

data.groupby('Name').sum()['Score'].reset_index()

Related

pandas dataframe duplicate values count not properly working

Pandas drop duplicate base on 2 columns, having differents value

search column name based on matching row values

How to pandas groupby one column and filter dataframe based on the minimum unique values of another column?

Count duplicate rows for each unique row value

Categories

Resources