I am trying to merge three dataframes based on their row indices. However, the on attribute will not take index as an option. Is there any better ways to merge the dataframes without having to write the row indices to each dataframe as a column?
from functools import reduce
dfs = [result_eu_SpeciesNameGenuine, result_ieu_SpeciesNameGenuine, result_cosine_SpeciesNameGenuine]
df_final = reduce(lambda left,right: pd.merge(left,right,on=index), dfs)
df_final
Try this using pd.DataFrame.join per documentation other can be a list of dataframes:
dfs[0].join(dfs[1:])
Related
I have a initial dataframe D. I extract two data frames from it like this:
A = D[D.label == k]
B = D[D.label != k]
I want to combine A and B into one DataFrame. The order of the data is not important. However, when we sample A and B from D, they retain their indexes from D.
DEPRECATED: DataFrame.append and Series.append were deprecated in v1.4.0.
Use append:
df_merged = df1.append(df2, ignore_index=True)
And to keep their indexes, set ignore_index=False.
Use pd.concat to join multiple dataframes:
df_merged = pd.concat([df1, df2], ignore_index=True, sort=False)
Merge across rows:
df_row_merged = pd.concat([df_a, df_b], ignore_index=True)
Merge across columns:
df_col_merged = pd.concat([df_a, df_b], axis=1)
If you're working with big data and need to concatenate multiple datasets calling concat many times can get performance-intensive.
If you don't want to create a new df each time, you can instead aggregate the changes and call concat only once:
frames = [df_A, df_B] # Or perform operations on the DFs
result = pd.concat(frames)
This is pointed out in the pandas docs under concatenating objects at the bottom of the section):
Note: It is worth noting however, that concat (and therefore append)
makes a full copy of the data, and that constantly reusing this
function can create a significant performance hit. If you need to use
the operation over several datasets, use a list comprehension.
If you want to update/replace the values of first dataframe df1 with the values of second dataframe df2. you can do it by following steps —
Step 1: Set index of the first dataframe (df1)
df1.set_index('id')
Step 2: Set index of the second dataframe (df2)
df2.set_index('id')
and finally update the dataframe using the following snippet —
df1.update(df2)
To join 2 pandas dataframes by column, using their indices as the join key, you can do this:
both = a.join(b)
And if you want to join multiple DataFrames, Series, or a mixture of them, by their index, just put them in a list, e.g.,:
everything = a.join([b, c, d])
See the pandas docs for DataFrame.join().
# collect excel content into list of dataframes
data = []
for excel_file in excel_files:
data.append(pd.read_excel(excel_file, engine="openpyxl"))
# concatenate dataframes horizontally
df = pd.concat(data, axis=1)
# save combined data to excel
df.to_excel(excelAutoNamed, index=False)
You can try the above when you are appending horizontally! Hope this helps sum1
Use this code to attach two Pandas Data Frames horizontally:
df3 = pd.concat([df1, df2],axis=1, ignore_index=True, sort=False)
You must specify around what axis you intend to merge two frames.
I have a initial dataframe D. I extract two data frames from it like this:
A = D[D.label == k]
B = D[D.label != k]
I want to combine A and B into one DataFrame. The order of the data is not important. However, when we sample A and B from D, they retain their indexes from D.
DEPRECATED: DataFrame.append and Series.append were deprecated in v1.4.0.
Use append:
df_merged = df1.append(df2, ignore_index=True)
And to keep their indexes, set ignore_index=False.
Use pd.concat to join multiple dataframes:
df_merged = pd.concat([df1, df2], ignore_index=True, sort=False)
Merge across rows:
df_row_merged = pd.concat([df_a, df_b], ignore_index=True)
Merge across columns:
df_col_merged = pd.concat([df_a, df_b], axis=1)
If you're working with big data and need to concatenate multiple datasets calling concat many times can get performance-intensive.
If you don't want to create a new df each time, you can instead aggregate the changes and call concat only once:
frames = [df_A, df_B] # Or perform operations on the DFs
result = pd.concat(frames)
This is pointed out in the pandas docs under concatenating objects at the bottom of the section):
Note: It is worth noting however, that concat (and therefore append)
makes a full copy of the data, and that constantly reusing this
function can create a significant performance hit. If you need to use
the operation over several datasets, use a list comprehension.
If you want to update/replace the values of first dataframe df1 with the values of second dataframe df2. you can do it by following steps —
Step 1: Set index of the first dataframe (df1)
df1.set_index('id')
Step 2: Set index of the second dataframe (df2)
df2.set_index('id')
and finally update the dataframe using the following snippet —
df1.update(df2)
To join 2 pandas dataframes by column, using their indices as the join key, you can do this:
both = a.join(b)
And if you want to join multiple DataFrames, Series, or a mixture of them, by their index, just put them in a list, e.g.,:
everything = a.join([b, c, d])
See the pandas docs for DataFrame.join().
# collect excel content into list of dataframes
data = []
for excel_file in excel_files:
data.append(pd.read_excel(excel_file, engine="openpyxl"))
# concatenate dataframes horizontally
df = pd.concat(data, axis=1)
# save combined data to excel
df.to_excel(excelAutoNamed, index=False)
You can try the above when you are appending horizontally! Hope this helps sum1
Use this code to attach two Pandas Data Frames horizontally:
df3 = pd.concat([df1, df2],axis=1, ignore_index=True, sort=False)
You must specify around what axis you intend to merge two frames.
I was wondering if it is possible to check the similarity between the two dataframes below. They are the same, however the first and the third rows are flipped. Is there a way to check that these dataframes are the same regardless of the order of the index? Thank you for any help!
You can use merge and then look for a subset of rows that doesn't exist in either dataframe.
df_a = pd.DataFrame([['a','b','c'], ['c','d','e'], ['e','f','g']], columns=['col1','col2','col3'])
df_b = pd.DataFrame([['e','f','g'], ['c','d','e'], ['a','b','c']], columns=['col1','col2','col3'])
df_merged = pd.merge(df_a, df_b, on=df_a.columns.tolist(), how='outer', indicator='Exist')
print(df_merged[(df_merged['Exist'] != 'both')])
Sort the DFs in the same way and then compare, or iterate through all the columns, sort one column at a time and compare it
If this is ok and you need help writing the code, let me know
I have a number of dataframes (100) in a list as:
frameList = [df1,df2,..,df100]
Each dataframe has the two columns DateTime, Temperature.
I want to intersect all the dataframes on the common DateTime column and get all their Temperature columns combined/merged into one big dataframe: Temperature from df1, Temperature from df2, Temperature from df3, .., Temperature from df100.
(pandas merge doesn't work as I'd have to compute multiple (99) pairwise intersections).
Use pd.concat, which works on a list of DataFrames or Series.
pd.concat(frameList, axis=1, join='inner')
This is better than using pd.merge, as pd.merge will copy the data pairwise every time it is executed. pd.concat copies only once. However, pd.concat only merges based on an axes, whereas pd.merge can also merge on (multiple) columns.
you can try using reduce functionality in python..something like this
dfs = [df0, df1, df2, dfN]
df_final = reduce(lambda left,right: pd.merge(left,right,on='DateTime'), dfs)
You could iterate over your list like this:
df_merge = frameList[0]
for df in frameList[1:]:
df_merge = pd.merge(df_merge, df, on='DateTime', how='inner')
Is there a way to concat, join or merge dataframes based on both the index and columns? For example, suppose I have a list of dataframes and I want something like
df = pandas.fullConcat(dfList)
where df.index should be the union of the indices in dfList ('outer' join) and df.columns should also be the union of the columns in dfList. I think all of the concat, join and merge methods just do a join on either the index or the column. I suppose a workaround is stack/unstack or reset_index? Do i miss something ?
I think you're going to have to reset the index:
df = df1.reset_index().merge(df2.reset_index(), on=['index','cols']).set_index('index')