Pandas merge and keep only non-matching records [duplicate] - python

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
How can I merge/join these two dataframes ONLY on "id". Produce 3 new dataframes:
1)R1 = Merged records
2)R2 = (DF1 - Merged records)
3)R3 = (DF2 - Merged records)
Using pandas in Python.
First dataframe (DF1)
| id | name |
|-----------|-------|
| 1 | Mark |
| 2 | Dart |
| 3 | Julia |
| 4 | Oolia |
| 5 | Talia |
Second dataframe (DF2)
| id | salary |
|-----------|--------|
| 1 | 20 |
| 2 | 30 |
| 3 | 40 |
| 4 | 50 |
| 6 | 33 |
| 7 | 23 |
| 8 | 24 |
| 9 | 28 |
My solution for
R1 =pd.merge(DF1, DF2, on='id', how='inner')
I am unsure that is the easiest way to get R2 and R3
R2 should look like
| id | name |
|-----------|-------|
| 5 | Talia |
R3 should look like:
| id | salary |
|-----------|--------|
| 6 | 33 |
| 7 | 23 |
| 8 | 24 |
| 9 | 28 |

You can turn on indicator in merge and look for the corresponding values:
total_merge = df1.merge(df2, on='id', how='outer', indicator=True)
R1 = total_merge[total_merge['_merge']=='both']
R2 = total_merge[total_merge['_merge']=='left_only']
R3 = total_merge[total_merge['_merge']=='right_only']
Update: Ben's suggestion would be something like this:
dfs = {k:v for k,v in total_merge.groupby('_merge')}
and then you can do, for examples:
dfs['both']

Related

Map Id's in one dataframe with the corresponding names in another dataframe [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed last year.
I have below 2 dataframes:
df_1:
| | assign_to_id |
| | ------------ |
| 0 | 1, 2 |
| 1 | 2 |
| 2 | 3,4,5 |
df_2:
| | id | name |
| | ------------| -----------|
| 0 | 1 | John |
| 1 | 2 | Adam |
| 2 | 3 | Max |
| 3 | 4 | Martha |
| 4 | 5 | Robert |
I want to map the Id's in the df_1 to the names in df_2 by matching their id's
final_df:
| | assign_to_name |
| | ----------------- |
| 0 | John, Adam |
| 1 | Adam |
| 2 | Max,Martha,Robert |
I don't know how to achieve this. Looking forward to some help.
Idea is mapping column splitted by , by dictionary and then join back by ,:
d = df_2.assign(id = df_2['id'].astype(str)).set_index('id')['name'].to_dict()
f = lambda x: ','.join(d[y] for y in x.split(',') if y in d)
df_1['assign_to_name'] = df_1['assign_to_id'].replace('\s+', '', regex=True).apply(f)
print (df_1)
assign_to_id assign_to_name
0 1, 2 John,Adam
1 2 Adam
2 3,4,5 Max,Martha,Robert

Pandas DataFrame) one column replace other df

I have two pandas DataFrame
# python 3
one is | A | B | C | and another is | D | E | F |
|---|---|---| |---|---|---|
| 1 | 2 | 3 | | 3 | 4 | 6 |
| 4 | 5 | 6 | | 8 | 7 | 9 |
| ......... | | ......... |
I want to get 'expected' result
expected result :
| A | D | E | F | C |
|---|---|---|---|---|
| 1 | 3 | 4 | 6 | 3 |
| 4 | 8 | 7 | 9 | 6 |
| ................. |
df1['B'] convert into df2
I have tried
pd.concat([df1,df2], axis=1, sort=False)
and drop column df['B']
but it doesn't seem to be very efficient.
Could it be solved by using insert() or another method?
I think your method is good, also you can remove column before concat:
pd.concat([df1.drop('B', axis=1),df2], axis=1, sort=False)
Another method with DataFrame.join:
df1.drop('B', axis=1).join(df2)

Merge columns from two dataframes when value in columns are not equal [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I have a Pandas df which looks like this:
| | yyyy_mm_dd | id | product | status | is_50 | cnt |
|---|------------|----|------------|--------|-------|-----|
| | 2002-12-15 | 7 | prod_rs | 2 | 0 | 8 |
| | 2002-12-15 | 16 | prod_go | 2 | 0 | 1 |
| | 2002-12-15 | 16 | prod_mb | 2 | 0 | 3 |
| | 2002-12-15 | 29 | prod_er | 2 | 0 | 2 |
| | 2002-12-15 | 29 | prod_lm | 2 | 0 | 2 |
| | 2002-12-15 | 29 | prod_ops | 2 | 0 | 2 |
I also have a second dataframe which is similar:
| | id | product | cnt |
|---|----|------------|-----|
| | 7 | prod_rs | 8 |
| | 16 | prod_go | 1 |
| | 16 | prod_mb | 3 |
| | 29 | prod_er | 2 |
| | 29 | prod_lm | 2 |
| | 29 | prod_ops | 6 |
How can I create a third dataframe which will only store the rows which do not have an equal count? Based on the above, only the last row would be returned as the cnt for the id / product combination differs. Example output:
| | id | product | cnt_df1 | cnt_df2 |
|---|----|---------|---------|---------|
| | 29 | prod_ops| 2 | 6 |
The second df is one row larger in size so not all id / product combinations may be present in both dataframes.
I've been looking at merge but I'm unsure how to use when the cnt columns are not equal.
You would still use merge and just check whether the count columns are different in a second step
In [40]: df = pd.merge(df1.drop(["yyyy_mm_dd", "", "status", "is_50"], axis=1), df2, on=['id', 'product'], suffixes=['_df1', '_df2'])
In [41]: df
Out[41]:
id product cnt_df1 cnt_df2
0 7 prod_rs 8 8
1 16 prod_go 1 1
2 16 prod_mb 3 3
3 29 prod_er 2 2
4 29 prod_lm 2 2
5 29 prod_ops 2 6
Now you can simply filter out all rows with the same cnt e.g. with query()
In [42]: df.query("cnt_df1 != cnt_df2")
Out[42]:
id product cnt_df1 cnt_df2
5 29 prod_ops 2 6
You can acheive this in two steps like so:
# Merge the DataFrames
df3 = df1.merge(df2, on=["id", "product"])
# Filter for where `cnt` are not equal
df3 = df3[df3["cnt_x"].ne(df3["cnt_y"])]
# yyyy_mm_dd id product status is_50 cnt_x cnt_y
# 5 2002-12-15 29 prod_ops 2 0 2 6
You can use the suffixes parameter on merge if you don't want the to use the default _x and _y.

Intersect two dataframes in Pandas with respect to first dataframe?

I want to intersect two Pandas dataframes (1 and 2) based on two columns (A and B) present in both dataframes. However, I would like to return a dataframe that only has data with respect to the data in the first dataframe, omitting anything that is not found in the second dataframe.
So for example:
Dataframe 1:
A | B | Extra | Columns | In | 1 |
----------------------------------
1 | 2 | Extra | Columns | In | 1 |
1 | 3 | Extra | Columns | In | 1 |
1 | 5 | Extra | Columns | In | 1 |
Dataframe 2:
A | B | Extra | Columns | In | 2 |
----------------------------------
1 | 3 | Extra | Columns | In | 2 |
1 | 4 | Extra | Columns | In | 2 |
1 | 5 | Extra | Columns | In | 2 |
should return:
A | B | Extra | Columns | In | 1 |
----------------------------------
1 | 3 | Extra | Columns | In | 1 |
1 | 5 | Extra | Columns | In | 1 |
Is there a way I can do this simply?
You can use df.merge:
df = df1.merge(df2, on=['A','B'], how='inner').drop('2', axis=1)
how='inner' is default. Just put it there for your understanding of how df.merge works.
As #piRSquared suggested, you can do:
df1.merge(df2[['A', 'B']], how='inner')

How can I assign columns from one data frame to another, when the ID is the same? Using isin() - Python [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I am using python and pandas.
I have:
df1: df2:
|index | ID | |index | ID | name |
| 1 | 34 | | 1 | 35 | Astri |
| 2 | 35 | | 2 | 36 | Carlos|
| 3 | 36 | | 3 | 34 | Xim |
So, i want somthing like this using isin():
df1:
|index | ID | name |
| 1 | 34 | Xim |
| 2 | 35 | Astri |
| 3 | 36 | Carlos |
The function I am using is this. I can tell if the ID of df1 exists in df2, but I am unable to assign the 'name' column of df2, to df1
df1 = df1[df1.ID.isin(df2.ID)]
I don't want to have to iterate using a FOR and iterrows(), because I will apply this to a dataset of eight million records. Thank you!
This is merging two dataframes together. consider merge or join.
df1 = df1.merge(df2, how = 'left', on = 'ID')

Categories

Resources