Delete row indices based on common columns in a Dataframe

Delete row indices based on common columns in a Dataframe - python

I have following two dataframes df1 and df2
final raw st
abc 12 10
abc 17 15
abc 14 17
and
final raw
abc 12
abc 14
My expected output is
final raw st
abc 17 15
I would like to delete rows based on common column value.
My try:
df1.isin(df2)
This is giving me Boolean result. Another thing, I tried
df3 = pd.merge(df1, df2, on = ['final', 'raw'], how = 'inner') so that we get all the common columns for df1 and df3.

You are closed with merge you just need extra step. First you need to perform an outer join to keep all rows from both dataframes and enable indicator of merge then filter on this indicator to keep right values (from df2). Finally, keep only columns from df1:
df3 = pd.merge(df1, df2, on = ['final', 'raw'], how='outer', indicator=True) \
.query("_merge == 'left_only'")[df1.columns]
print(df3)
# Output
final raw st
1 abc 17 15

You need to refer to the correct column when using isin.
result = df1[~df1['raw'].isin(df2['raw'])]

Related

How to compare two columns in different pandas dataframes, store the differences in a 3rd dataframe

I need to compare two df1 (blue) and df2 (orange), store only the rows of df2 (orange) that are not in df1 in a separate data frame, and then add that to df1 while assigning function 6 and sector 20 for the employees that were not present in df1 (blue)
I know how to find the differences between the data frames and store that in a third data frame, but I'm stuck trying to figure out how to store only the rows of df2 that are not in df1.

Can try this:
Get a list with the data os orange u want to keep
Filter df2 with that list
Append
df1 --> blue, df2 --> orange
import pandas as pd
df2['Function'] = 6
df2['Sector'] = 20
ids_df2_keep = [e for e in df2['ID'] if e not in list(df1['ID'])]
df2 = df2[df2['ID'].isin(ids_df2_keep)
df1 = df1.append(df2)

This has been answered in pandas get rows which are NOT in other dataframe
Store it as a merge and simply select the rows that do not share common values.
~ negates the expression, select all that are NOT IN instead of IN.
common = df1.merge(df2,on=['ID','Name'])
df = df2[(~df2['ID'].isin(common['ID']))&(~df2['Name'].isin(common['Name']))]
This was tested using some of your data:
df1 = pd.DataFrame({'ID':[125,134,156],'Name':['John','Mary','Bill'],'func':[1,2,2]})
df2 = pd.DataFrame({'ID':[125,139,133],'Name':['John','Joana','Linda']})
Output is:
ID Name
1 139 Joana
2 133 Linda

Compare 2 dataframes on specific columns

So, I have 2 dataframes like:
DataframeA:
ID,CLASS,DIVISION
1,123,3G
2,456,5G
3,123,4G
DataframeB:
ID,CLASS,DIVISION
1,123,3G
2,456,4G
I would like to substract DataframeA from DataframeB such that, only the records that are in DataframeA and not in DataframeB should be present. But the comparison should be on CLASS and DIVISION columns only.
Expected Output:
ID,CLASS,DIVISION
2,456,5G
3,123,4G
Now I can do a Left-Join between DataframeA and DataframeB on [CLASS, DIVISION] and then select only the isNull values of CLASS, DIVISION columns of DataframeB like so:
new_df = pd.merge(DataframeA, DataframeB, how='left', left_on=fileA_headerList, right_on=fileB_headerList)
new_df = new_df[new_df[fileB_headerList].isnull().all(axis=1)]
But I would like to know whether there's a more Elegant or Pythonic way to do this.

you can use pd.merge() with indicator=True
res = pd.merge(df1,df2[['CLASS','DIVISION']],on=['CLASS','DIVISION'],how='outer',indicator=True)
res =res[res['_merge']=='left_only'].drop(['_merge'],axis=1)
print(res)
ID CLASS DIVISION
1 2.0 456 5G
2 3.0 123 4G

With left join (df1 - left frame, df2 - right frame) and filtering out matched rows:
In [1157]: df3 = df1.merge(df2, on=df1.columns.drop('ID').tolist(), how='left', suffixes=('', '_'))
In [1158]: df3[df3['ID_'].isna()].drop('ID_', axis=1)
Out[1158]:
ID CLASS DIVISION
1 2 456 5G
2 3 123 4G

Check values in dataframe against another dataframe and append values if present

I have two dataframes as follows:
DF1
A B C
1 2 3
4 5 6
7 8 9
DF2
Match Values
1 a,d
7 b,c
I want to match DF1['A'] with DF2['Match'] and append DF2['Values'] to DF1 if the value exists
So my result will be:
A B C Values
1 2 3 a,d
7 8 9 b,c
Now I can use the following code to match the values but it's returning an empty dataframe.
df1 = df1[df1['A'].isin(df2['Match'])]
Any help would be appreciated.

Instead of doing a lookup, you can do this in one step by merging the dataframes:
pd.merge(df1, df2, how='inner', left_on='A', right_on='Match')
Specify how='inner' if you only want records that appear in both, how='left' if you want all of df1's data.
If you want to keep only the Values column:
pd.merge(df1, df2.set_index('Match')['Values'].to_frame(), how='inner', left_on='A', right_index=True)

Pandas: combine data frames of different sizes

I have 2 data frames:
df1 has ID and count of white products
product_id, count_white
12345,4
23456,7
34567,1
df2 has IDs and counts of all products
product_id,total_count
0009878,14
7862345,20
12345,10
456346,40
23456,30
0987352,10
34567,90
df2 has more products than df1. I need to search df2 for products that are in df1 and add total_count column to df1:
product_id,count_white,total_count
12345,4,10
23456,7,30
34567,1,90
I could do a left merge, but I would end up with a huge file. Is there any way to add specific rows from df2 to df1 using merge?

Just perform a left merge on 'product_id' column:
In [12]:
df.merge(df1, on='product_id', how='left')
Out[12]:
product_id count_white total_count
0 12345 4 10
1 23456 7 30
2 34567 1 90

Perform left join/merge:
Data frames are:
left join:
df1=df1.merge(df2, on='product_id', how='left')
The output will look like this:

Pandas: join 'on' failing

I have two DataFrames, df1:
ID value 1
0 5 162
1 7 185
2 11 156
and df2:
ID Comment
1 5
2 7 Yes!
6 11
... which I want to join using ID, with a result that looks like this:
ID value 1 Comment
5 162
7 185 Yes!
11 156
The real DataFrames are much larger and contain more columns, and I essentially want to add the Comment column from df2 to df1. I tried using
df1 = df1.join(df2['Comment'], on='ID')
... but that only gets me a new empty Comment column in df1, like .join somehow fails to use the ID column as the index. I have also tried
df1 = df1.join(df2['Comment'])
... but that uses the default indexes, which don't match between the two DataFrames (they also have different lengths), giving me a Comment value on the wrong place.
What am I doing wrong?

You can just do a merge to achieve what you want:
In [30]:
df1.merge(df2, on='ID')
Out[30]:
ID value1 Comment
0 5 162 None
1 7 185 Yes!
2 11 156 None
[3 rows x 3 columns]
The problem with join is that by default it performs a left index join, because your dataframes do not have a common index values that match then your comment column ends up being empty
EDIT
Following on from the comments, if you want to retain all values in df1 and add just the comments that are not empty and have ID's that exist in df1 then you can perform a left merge:
df1.merge(df2.dropna( subset=['Comment']), on='ID', how='left')
This will drop any rows with empty comments, use the ID column to merge both df1 and df2 to but perform a left merge so retains all values on left hand side but will merge comments that match ID column, the default is inner which retains IDs that are in both left and right dfs.
Further information on merge and further examples.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Delete row indices based on common columns in a Dataframe - python

You need to refer to the correct column when using isin. result = df1[~df1['raw'].isin(df2['raw'])]

Related

How to compare two columns in different pandas dataframes, store the differences in a 3rd dataframe

Compare 2 dataframes on specific columns

Check values in dataframe against another dataframe and append values if present

Pandas: combine data frames of different sizes

Pandas: join 'on' failing

Categories

Resources