Merge dataframes with conditions with pandas python [duplicate] - python

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 1 year ago.
I want to merge two dataframes, but under certain conditions:
If the values of the columns from dataframe 2 (Source1,Target1,Source2,Target2) occur in dataframe 1, then I want to replace them with the data from dataframe 2, but merge them with all columns from dataframe 2 and all columns from dataframe 1.
My current problem is when I do a concatenation, the data from DF2 is only merged with that from DF1 and I have invalid data.
In short: match DF1 with DF2 and if there are intersections, then overwrite the intersection from in DF1, but merge all columns from DF2 with those from DF1.
Thanks for your help
DF1
DF2
What I get
What I need
frames = [DF1,DF2]
result = pd.concat(frames)
print(result)

Use merge:
out = pd.merge(DF1, DF2, how='left',
on=['Source 1', 'Target 1', 'Source 2', 'Target 2'])
Take a while to read Pandas Merging 101

Related

How to do a merge with conditions in python [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I have 2 dataframe df1 and df2.
df1 has 4000 records and df2 has 160 records.
I need to merge the 7th column of df2 with df1 based on Date and Time(which is a common column in both).
Condition:
If date and time are same in df1 and df2 then a normal merge will happen
If date is same but the time in df1 is 14:00 and df2 has a time of 13:59 and after that if it has only 14:03, then the merge should happen with 13:59 time(which is the time before 14:00).
I tried:
Extracting only the Date, time and 7th column from df1.
then i did a pd.merge(left merge)
pd.merge(df1,df2,on['Date,Time],how=left)
but it misses many values where the time is not matching.
Even if the exact time is not available i wanted the merge to happen with whatever time available before the required time.
try
pd.merge_asof(df1, df2, on=['Date','Time'], how='left', direction='backward')

Pandas : Two 'isin', one condition [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I've been using pandas for some months and today I found something weird.
Let's say I have these two dataframes :
df1 = pd.DataFrame(data={'C1' : [1,1,1,2],'C2' : ['A','B','C','D']})
df2 = pd.DataFrame(data={'C1':[2,2,2],'C2':['A','B','C']})
What I want is : from df2, every pairs of {C1,C2} that exist in df1.
This is what I wrote : df2[df2.C1.isin(df1.C1) & df2.C2.isin(df1.C2)]
The result I would like to have is an empty dataFrame because in df1, 2 is not linked with 'A','B' or 'C' and what I get is df2. I tried df2[df2[["C1,"C2"]].isin(df1[["C1,"C2"]])] but it does not work if df2 has more columns (even if unused).
You can do it with inner merge:
df2.merge(df1, how='inner', on=['C1', 'C2'])
Empty DataFrame
Columns: [C1, C2]
Index: []

How to merge two Pandas DataFrames when two columns are the same [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 3 years ago.
I have these two dataframes:
orderItemId orderId orderDate latestDeliveryDate
0 BFC0000332253518 2648507110 2019-11-10T21:08:30+01:00 2019-11-11T00:00:00+01:00
0 BFC0000332123047 2647717360 2019-11-10T15:42:39+01:00 2019-11-11T00:00:00+01:00
0 BFC0000332291194 2648712140 2019-11-10T22:24:56+01:00 2019-11-11T00:00:00+01:00
orderItemId orderId shipmentId shipmentReference shipmentDate
0 BFC0000332253518 2648507110 689508122 081234500926730318 2019-11-11T00:10:06+01:00
1 BFC0000332123047 2647717360 689505054 081234500926572451 2019-11-10T23:55:38+01:00
2 BFC0000332291194 2648712140 689505045 081234500926710549 2019-11-10T23:55:37+01:00
How can I merge those together with Pandas merge? Because they have two columns that are the same. Can I use multiple on= values?
Yes you can use multiple on values. I suppose from your example above, you want to merge on orderItemId and orderId right?
Just use:
final_df = pd.merge(df1, df2, how = 'inner', left_on = ['orderItemId','orderId'], right_on = ['orderItemId','orderId'])

Join two Dataframe table [duplicate]

This question already has answers here:
Merge two dataframes by index
(7 answers)
Pandas Merging 101
(8 answers)
Closed 5 years ago.
I have two dataframe table :
df1
id A
1 wer
3 dfg
5 dfg
df2
id A
2 fgv
4 sdfsdf
I want to join this to dataframe for one that will look like that:
df3
id A
1 wer
2 fgv
3 dfg
...
df3 = df1.merge(df2,how='outer',sort=True)
There is concat method in pandas that you can use.
df3 = pd.concat([df1, df2])
You can sort index with -
df3 = df3.sort_index()
Or reset index like
df3 = df3.reset_index(drop=True)
I see you have ellipsis (...) at the end of your df3 dataframe if that means continuation in dataframe use above otherwise go for Jibril's answer

Merge a list of dataframes to create one dataframe [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 4 years ago.
I have a list of 18 data frames:
dfList = [df1, df2, df3, df4, df5, df6.....df18]
All of the data frames have a common id column so it's easy to join them each together with pd.merge 2 at a time. Is there a way to join them all at once so that dfList comes back as a single dataframe?
I think you need concat, but first set index of each DataFrame by common column:
dfs = [df.set_index('id') for df in dfList]
print pd.concat(dfs, axis=1)
If need join by merge:
from functools import reduce
df = reduce(lambda df1,df2: pd.merge(df1,df2,on='id'), dfList)

Categories

Resources