This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I have 2 dataframe df1 and df2.
df1 has 4000 records and df2 has 160 records.
I need to merge the 7th column of df2 with df1 based on Date and Time(which is a common column in both).
Condition:
If date and time are same in df1 and df2 then a normal merge will happen
If date is same but the time in df1 is 14:00 and df2 has a time of 13:59 and after that if it has only 14:03, then the merge should happen with 13:59 time(which is the time before 14:00).
I tried:
Extracting only the Date, time and 7th column from df1.
then i did a pd.merge(left merge)
pd.merge(df1,df2,on['Date,Time],how=left)
but it misses many values where the time is not matching.
Even if the exact time is not available i wanted the merge to happen with whatever time available before the required time.
try
pd.merge_asof(df1, df2, on=['Date','Time'], how='left', direction='backward')
Related
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 1 year ago.
I want to merge two dataframes, but under certain conditions:
If the values of the columns from dataframe 2 (Source1,Target1,Source2,Target2) occur in dataframe 1, then I want to replace them with the data from dataframe 2, but merge them with all columns from dataframe 2 and all columns from dataframe 1.
My current problem is when I do a concatenation, the data from DF2 is only merged with that from DF1 and I have invalid data.
In short: match DF1 with DF2 and if there are intersections, then overwrite the intersection from in DF1, but merge all columns from DF2 with those from DF1.
Thanks for your help
DF1
DF2
What I get
What I need
frames = [DF1,DF2]
result = pd.concat(frames)
print(result)
Use merge:
out = pd.merge(DF1, DF2, how='left',
on=['Source 1', 'Target 1', 'Source 2', 'Target 2'])
Take a while to read Pandas Merging 101
I have df1 and df2, where df1 is a balanced panel of 20 stocks with daily datetime data. Due to missing days (weekends, holidays) I am assigning each day available to an integer of how many days I have (1-252). df2 is a 2 column matrix which maps each day to the integer.
df2
date integer
2020-06-26, 1
2020-06-29, 2
2020-06-30, 3
2020-07-01, 4
2020-07-02, 5
...
2021-06-25, 252
I would like to map these dates to every asset I have in df1 for each date, therefore returning a single column of (0-252) repeated for each asset.
So far I have tried this:
df3 = (df1.merge(df2, left_on='date', right_on='integer'))
which returns an empty dataframe - I dont think I'm fully understanding the logic here
Assuming both df1 and df2 having the same column label as date hence,
df3 = df1.merge(df2)
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I've been using pandas for some months and today I found something weird.
Let's say I have these two dataframes :
df1 = pd.DataFrame(data={'C1' : [1,1,1,2],'C2' : ['A','B','C','D']})
df2 = pd.DataFrame(data={'C1':[2,2,2],'C2':['A','B','C']})
What I want is : from df2, every pairs of {C1,C2} that exist in df1.
This is what I wrote : df2[df2.C1.isin(df1.C1) & df2.C2.isin(df1.C2)]
The result I would like to have is an empty dataFrame because in df1, 2 is not linked with 'A','B' or 'C' and what I get is df2. I tried df2[df2[["C1,"C2"]].isin(df1[["C1,"C2"]])] but it does not work if df2 has more columns (even if unused).
You can do it with inner merge:
df2.merge(df1, how='inner', on=['C1', 'C2'])
Empty DataFrame
Columns: [C1, C2]
Index: []
This question already has answers here:
Merge two dataframes by index
(7 answers)
Pandas Merging 101
(8 answers)
Closed 5 years ago.
I have two dataframe table :
df1
id A
1 wer
3 dfg
5 dfg
df2
id A
2 fgv
4 sdfsdf
I want to join this to dataframe for one that will look like that:
df3
id A
1 wer
2 fgv
3 dfg
...
df3 = df1.merge(df2,how='outer',sort=True)
There is concat method in pandas that you can use.
df3 = pd.concat([df1, df2])
You can sort index with -
df3 = df3.sort_index()
Or reset index like
df3 = df3.reset_index(drop=True)
I see you have ellipsis (...) at the end of your df3 dataframe if that means continuation in dataframe use above otherwise go for Jibril's answer
I’ve two imported Panda DataFrames out of Excel (df1 and df2). Df1 represents the dates of replacement consisting out of a column with Dates and a column with Notes (200 rows). Df2 represents the dates when a check was performed (40 rows).
I would like to filter df1 (or generate a new table (df1')), that all dates of df1 which differ less than 5 days with the dates of df2 will be deleted in df1.
As a check is performed, we could say that the component was not replaced within a margin of 10 days.
e.g.
df1
22/04/2017
23/04/2017
07/06/2017
20/08/2017
df2
21/04/2017
df1'
07/06/2017
20/08/2017
You can perform datetime subtraction with numpy broadcasting and filter df1 accordingly.
df1
A
0 2017-04-22
1 2017-04-23
2 2017-07-06
3 2017-08-20
df2
A
0 2017-04-21
df1.A = pd.to_datetime(df1.A) # convert to datetime first
df2.A = pd.to_datetime(df2.A)
df1[((df1.values[:, None] - df2.values) / pd.Timedelta(days=1) > 5).all(1)]
A
2 2017-07-06
3 2017-08-20
For your data, this will generate 8000 elements on broadcasted subtraction, which certainly is manageable. Though note for much larger data, this results in a memory blowup (a pricey tradeoff for the high performance).