I have two DataFrames as follow:
df1 = pd.DataFrame({'Group1': [0.5,5,3], 'Group2' : [2,.06,0.9]}, index=['Dan','Max','Joe'])
df2 = pd.DataFrame({'Name' : ['Joe','Max'], 'Team' : ['Group2','Group1']})
My goal is to get the right value for the Name of the person considering the the column 'Team'.
So the result should look something like this:
I tried it with a merge but I failed because I don't know how to merge on these conditions.
What's the best way in Python to reach my goal?
You can unstack df1, reset its indices, rename columns and merge on Name and Team:
out = (df1.unstack()
.reset_index()
.rename({'level_0':'Team', 'level_1':'Name', 0:'Value'}, axis=1)
.merge(df2, on=['Name','Team']))
Output:
Team Name 0
0 Group1 Max 5.0
1 Group2 Joe 0.9
Related
I have two DataFrames, df1 and df2. In my code I used Pandas.concat method to find the differences between them.
df1 = pd.read_excel(latest_file, 0)
df2 = pd.read_excel(latest_file, 1)
#Reads first and second sheet inside spreadsheet.
new_dataframe = pd.concat([df1,df2]).drop_duplicates(keep=False)
This works perfectly, however I want to know which rows are coming from df1, and which are coming from df2. to show this I want to add a column to new_dataframe, if it's from df1 to say "Removed" in the new column, and to say 'Added' if it's from df2. I can't seem to find any documentation on how to do this. Thanks in advance for any help.
Edit: In my current code it removed all columns which are identical in each DataFrame. The solution has to still remove the common rows.
Consider using pd.merge with indicator=True instead. This will create a new column named _merge that indicates which value came from which column. You can modify this to say Removed and Added
df1 = pd.DataFrame({'col1': [1,2,3,4,5]})
df2 = pd.DataFrame({'col1': [3,4,5,6,7]})
m = {'left_only': 'Removed', 'right_only': 'Added'}
new_dataframe = pd.merge(df1, df2, how='outer', indicator=True) \
.query('_merge != "both"') \
.replace({'_merge': m})
Output:
col1 _merge
0 1 Removed
1 2 Removed
5 6 Added
6 7 Added
As someone who is super new in merge/append on Python, I am trying to merge two different DF together.
DF1 has 2 columns with Text and ID columns and 100 rows
DF2 has 3 columns with Text, ID, and Match columns and has 20 rows
My goal is to combine the two DFs together so the "Match" column from DF2 can be merged into DF1.
The Match column is all "True" value, so when it gets merged over the other 80 rows on DF1 can be NaN and I can fix it later.
Thank you to everyone for the help and support!
Try a left merge using .merge(), like this:
DF_out = DF1.merge(DF2, on=['Text', 'ID'], how='left')
I have two pandas data frames df1 and df2. df1 contains 2 columns and 750 rows, df2 has 2 columns and 88 rows. I want to compare the two data frames and return the values from df1 that are present in df2 and store the matching values in a new column in df2.
Ex.
df1
A B
emp_table emp_id
emp_table emp_name
pay_table basic_amount
pay_table da_amount
df2
A B
emp_table emp_id
emp_table emp_department
pay_table da_amount
I want to add another column in df2 which has the matching values.
df2
A B
emp_table emp_id
pay_table da_amount
I want to perform one to many comparison of each element of df1 with each element of df2.
I think you need merge without parameter on, so all columns are joined:
df = pd.merge(df1, df2)
print (df)
A B
0 emp_table emp_id
1 pay_table da_amount
I have 3 dataframes:
df1 :
ip name
df2 :
name country
df3:
country city
I have to match them by IP. What is correct way to do this? We match them df1 and df2 and then match result of df1 and df2 with df3 with index change. I think that ist not correct way.
It seems you need double merge, parameter on should me omit if only same joined columns of dfs:
df = df1.merge(df2).merge(df3)
I have 2 data frames:
df1 has ID and count of white products
product_id, count_white
12345,4
23456,7
34567,1
df2 has IDs and counts of all products
product_id,total_count
0009878,14
7862345,20
12345,10
456346,40
23456,30
0987352,10
34567,90
df2 has more products than df1. I need to search df2 for products that are in df1 and add total_count column to df1:
product_id,count_white,total_count
12345,4,10
23456,7,30
34567,1,90
I could do a left merge, but I would end up with a huge file. Is there any way to add specific rows from df2 to df1 using merge?
Just perform a left merge on 'product_id' column:
In [12]:
df.merge(df1, on='product_id', how='left')
Out[12]:
product_id count_white total_count
0 12345 4 10
1 23456 7 30
2 34567 1 90
Perform left join/merge:
Data frames are:
left join:
df1=df1.merge(df2, on='product_id', how='left')
The output will look like this: