pandas select rows with no duplicate [duplicate] - python

This question already has answers here:
Remove pandas rows with duplicate indices
(7 answers)
Closed 2 years ago.
I have 1 dataframe and I want to select all rows that don't have duplicates
My df:
Name Age
Jp 4
Anna 15
Jp 4
John 10
My output should be :
Name Age
Anna 15
John 10
I am using Pandas dataframe
any suggestions?

You want to drop duplicates across multiple columns:
df.drop_duplicates(['Name','Age'])
Please see the pandas documentation on basic methods of dataframes.

Related

Can I use one pandas table as a mapping table for another? [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I'm not sure how to do this. I have a table structured as following:
true_name reference_name
abc 123
xyz 098
Another table with a column
true_name
abc
abc
xyz
how can I use dataframe one to map all values in dataframe 2?
create key:value pair (dictionaty) of df1 columns using dict(zip()) and map over to df2.
df2['reference_name']=df2['true_name'].map(dict(zip(df1.true_name,df1.reference_name)))
true_name reference_name
0 abc 123
1 abc 123
2 xyz 98

Transform the row to a column and count the occurrence by doing a group by [duplicate]

This question already has answers here:
Pandas, Pivot table from 2 columns with values being a count of one of those columns
(2 answers)
Most efficient way to melt dataframe with a ton of possible values pandas
(2 answers)
How to form a pivot table on two categorical columns and count for each index?
(2 answers)
Closed 2 years ago.
am trying to transform the rows and count the occurrences of the values based on groupby the id
Dataframe:
id value
A cake
A cookie
B cookie
B cookie
C cake
C cake
C cookie
expected:
id cake cookie
A 1 1
B 0 2
c 2 1

Drop pandas rows if a value repeat more than X times [duplicate]

This question already has answers here:
Python: Removing Rows on Count condition
(4 answers)
Closed 2 years ago.
How to remove the rows that have the value of a column repeated more than 2 times. It could or not be consecutive. Like:
NAME EMAIL
Joe joe#email.com
John joe#email.com
Eric eric#mymail.com
Melissa mel#email.com
Ron joe#email.com
I would like to remove all rows with joe#email.com because it repeats more than 2 times.
Create your dataframe
import pandas as pd
import numpy as np
data = {'Name': ['Michael', 'Larry', 'Shaq', 'barry'], 'email': ['asf#gmail.com', 'akfd#gmail.com', 'asf#gmail.com', 'asf#gmail.com'] }
df1 = pd.DataFrame.from_dict(data)
print(df1)
Name email
0 Michael asf#gmail.com
1 Larry akfd#gmail.com
2 Shaq asf#gmail.com
3 barry asf#gmail.com
Then filter it by values in a column that are greater than 2
fil = df1.groupby('email').filter(lambda x : len(x)<2)
print(fil)
Name email
1 Larry akfd#gmail.com

Count the number of times a pair of value occurs in a Dartaframe [duplicate]

This question already has answers here:
Pandas, groupby and count
(3 answers)
Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
(9 answers)
Adding a 'count' column to the result of a groupby in pandas?
(2 answers)
Pandas create new column with count from groupby
(5 answers)
Closed 3 years ago.
I have the following dataframe:
print(df)
Product Store Quantity_Sold
A NORTH 10
A NORTH 5
A SOUTH 8
B SOUTH 8
B SOUTH 5
(...)
I would like to count the number of times the same pair of product and store is present; to illustrate:
print(final_df)
Product Store count
A NORTH 2
A SOUTH 1
B SOUTH 2
(...)
I tried with:
df["Product"].value_counts()
But it only works with single columns. How can I create final_df?

How to pic only duplicate values based on 2cols in pandas df? [duplicate]

This question already has answers here:
how do I remove rows with duplicate values of columns in pandas data frame?
(4 answers)
How to create a list in Python with the unique values of a CSV file?
(3 answers)
Closed 3 years ago.
I have df like this:
df1:
PL IN
22 NE22
22 NE22
22 NE22
33 DE33
33 DE33
66 NL66
66 NL66
66 NL66
I need to save csv with only unique value so the result should be:
22 NE22
33 DE33
66 NL66
I know .unique() method but it works only on Series (?) I need to pic 2 col. Can someone give me an advice?
Drop the duplicates then write to csv.
df1 = df1.drop_duplicates(subset=['PL', 'IN'], keep='first')
df1.to_csv('my_unique_csv.csv', index=False)

Categories

Resources