Merge a list of dataframes to create one dataframe [duplicate] - python

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 4 years ago.
I have a list of 18 data frames:
dfList = [df1, df2, df3, df4, df5, df6.....df18]
All of the data frames have a common id column so it's easy to join them each together with pd.merge 2 at a time. Is there a way to join them all at once so that dfList comes back as a single dataframe?

I think you need concat, but first set index of each DataFrame by common column:
dfs = [df.set_index('id') for df in dfList]
print pd.concat(dfs, axis=1)
If need join by merge:
from functools import reduce
df = reduce(lambda df1,df2: pd.merge(df1,df2,on='id'), dfList)

Related

Merge dataframes with conditions with pandas python [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 1 year ago.
I want to merge two dataframes, but under certain conditions:
If the values of the columns from dataframe 2 (Source1,Target1,Source2,Target2) occur in dataframe 1, then I want to replace them with the data from dataframe 2, but merge them with all columns from dataframe 2 and all columns from dataframe 1.
My current problem is when I do a concatenation, the data from DF2 is only merged with that from DF1 and I have invalid data.
In short: match DF1 with DF2 and if there are intersections, then overwrite the intersection from in DF1, but merge all columns from DF2 with those from DF1.
Thanks for your help
DF1
DF2
What I get
What I need
frames = [DF1,DF2]
result = pd.concat(frames)
print(result)
Use merge:
out = pd.merge(DF1, DF2, how='left',
on=['Source 1', 'Target 1', 'Source 2', 'Target 2'])
Take a while to read Pandas Merging 101

Create data frames in a for loop [duplicate]

This question already has answers here:
How to modify list entries during for loop?
(10 answers)
Can't modify list elements in a loop [duplicate]
(5 answers)
Change values in a list using a for loop (python)
(4 answers)
Closed 1 year ago.
I have a function that takes in a df --> modifies the df --> and returns back the modified df.
I have a list dfs containing 5 df - I want to loop over them so that each is modified by the function, something like this:
dfs = [df1, df2, df3, df4, df5] # df1 to df5 : valid DataFrames
for df in dfs:
df = function(df)
When I do that the content of the list dfs is not changed, I just end up with a new variable called 'df' that contains the modified information of df5 (The last df in the list).
What am I doing wrong? Is there a way I can achieve this?
You assign the modified df back to the name df but that will not change the item in the list it represents. You need to store your modified local df back to your list:
dfs = [df1, df2, df3, df4, df5]
for idx, df in enumerate(dfs):
dfs[idx] = function(df) # immediately store result in list
would solve your problem.
Full demo:
import pandas as pd
dfs = [pd.DataFrame({"t":[n]}) for n in range(1,6)]
def function(df):
df["t"] = df["t"] * 100
return df
print(*dfs,"", sep= "\n\n")
for idx, df in enumerate(dfs):
dfs[idx] = function(df)
print(*dfs, sep="\n\n")
Output:
t
0 1
t
0 2
t
0 3
t
0 4
t
0 5
t
0 100
t
0 200
t
0 300
t
0 400
t
0 500

Pandas : Two 'isin', one condition [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I've been using pandas for some months and today I found something weird.
Let's say I have these two dataframes :
df1 = pd.DataFrame(data={'C1' : [1,1,1,2],'C2' : ['A','B','C','D']})
df2 = pd.DataFrame(data={'C1':[2,2,2],'C2':['A','B','C']})
What I want is : from df2, every pairs of {C1,C2} that exist in df1.
This is what I wrote : df2[df2.C1.isin(df1.C1) & df2.C2.isin(df1.C2)]
The result I would like to have is an empty dataFrame because in df1, 2 is not linked with 'A','B' or 'C' and what I get is df2. I tried df2[df2[["C1,"C2"]].isin(df1[["C1,"C2"]])] but it does not work if df2 has more columns (even if unused).
You can do it with inner merge:
df2.merge(df1, how='inner', on=['C1', 'C2'])
Empty DataFrame
Columns: [C1, C2]
Index: []

How to merge two Pandas DataFrames when two columns are the same [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 3 years ago.
I have these two dataframes:
orderItemId orderId orderDate latestDeliveryDate
0 BFC0000332253518 2648507110 2019-11-10T21:08:30+01:00 2019-11-11T00:00:00+01:00
0 BFC0000332123047 2647717360 2019-11-10T15:42:39+01:00 2019-11-11T00:00:00+01:00
0 BFC0000332291194 2648712140 2019-11-10T22:24:56+01:00 2019-11-11T00:00:00+01:00
orderItemId orderId shipmentId shipmentReference shipmentDate
0 BFC0000332253518 2648507110 689508122 081234500926730318 2019-11-11T00:10:06+01:00
1 BFC0000332123047 2647717360 689505054 081234500926572451 2019-11-10T23:55:38+01:00
2 BFC0000332291194 2648712140 689505045 081234500926710549 2019-11-10T23:55:37+01:00
How can I merge those together with Pandas merge? Because they have two columns that are the same. Can I use multiple on= values?
Yes you can use multiple on values. I suppose from your example above, you want to merge on orderItemId and orderId right?
Just use:
final_df = pd.merge(df1, df2, how = 'inner', left_on = ['orderItemId','orderId'], right_on = ['orderItemId','orderId'])

Join two Dataframe table [duplicate]

This question already has answers here:
Merge two dataframes by index
(7 answers)
Pandas Merging 101
(8 answers)
Closed 5 years ago.
I have two dataframe table :
df1
id A
1 wer
3 dfg
5 dfg
df2
id A
2 fgv
4 sdfsdf
I want to join this to dataframe for one that will look like that:
df3
id A
1 wer
2 fgv
3 dfg
...
df3 = df1.merge(df2,how='outer',sort=True)
There is concat method in pandas that you can use.
df3 = pd.concat([df1, df2])
You can sort index with -
df3 = df3.sort_index()
Or reset index like
df3 = df3.reset_index(drop=True)
I see you have ellipsis (...) at the end of your df3 dataframe if that means continuation in dataframe use above otherwise go for Jibril's answer

Categories

Resources