Is there a way to concat, join or merge dataframes based on both the index and columns? For example, suppose I have a list of dataframes and I want something like
df = pandas.fullConcat(dfList)
where df.index should be the union of the indices in dfList ('outer' join) and df.columns should also be the union of the columns in dfList. I think all of the concat, join and merge methods just do a join on either the index or the column. I suppose a workaround is stack/unstack or reset_index? Do i miss something ?
I think you're going to have to reset the index:
df = df1.reset_index().merge(df2.reset_index(), on=['index','cols']).set_index('index')
Related
Basically, I have two dataframes, the first one looks like this:
And the second one like this:
I want to get the columns "lat" and "lnt" of the second one and add to the first one only if the name of the city matches in both dataframes. I tried using pd.merge(), but it's creating new rows with duplicated values.
If possible, I would like to put a NaN in the rows which didn't have any match at all, but I don't want to remove nor add rows to the original dataframe.
The Pandas merge function defaults to an inner join. Since you're looking to merge in the columns of df2 to df1, you should use a left join. This will give you all the rows of df1, and the matching values from df2.
df3 = df1.merge(df2, on = 'city', how = 'left')
merged_df = df1.merge(df2, how = 'inner', on = ['City'])
I am trying to merge three dataframes based on their row indices. However, the on attribute will not take index as an option. Is there any better ways to merge the dataframes without having to write the row indices to each dataframe as a column?
from functools import reduce
dfs = [result_eu_SpeciesNameGenuine, result_ieu_SpeciesNameGenuine, result_cosine_SpeciesNameGenuine]
df_final = reduce(lambda left,right: pd.merge(left,right,on=index), dfs)
df_final
Try this using pd.DataFrame.join per documentation other can be a list of dataframes:
dfs[0].join(dfs[1:])
Welcome, I have a simple question, to which I haven't found a solution.
I have two dataframes df1 and df2:
df1 contains several columns and a multiindex as year-month-week
df2 contains the multiindex year-week with only one column in the df.
I would like to create an inner join of df1 and df2, joining on 'year' and 'week'.
I have tried to do the following:
df1['newcol'] = df1.index.get_level_values(2).map(lambda x: df2.newcol[x])
Which only joins on month (or year?), is there any way to expand it so that the merge is actually right?
Thanks in advance!
df1
df2
Eventually i solved with with removing the multiindex and doing a good old inner join on the two columns and then recreating the multiindex at the end.
Here are the sniplets:
df=df.reset_index()
df2=df2.reset_index()
df['year']=df['year'].apply(int)
df2['year']=df2['year'].apply(int)
df['week']=df['week'].apply(int)
df2['week']=df2['week'].apply(int)
result = pd.merge(df, df2, how='left', left_on= ['year','week'],right_on= ['year','week'])
result=result.set_index(['year', 'month','week','day'])
I am trying to merge two dataframes (call them DF1 & DF2) that basically look like the below. My goal is:
I want open/close/low/high to all come from DF1.
I want numEvents and Volume = DF1 + DF2.
In cases where DF2 has rows that don't exist in DF1, I want open/close/low/high to be NaN (so I can later backfill them), and numEvents and Volume to come from DF2 as is.
Any help is much appreciated!
use pd.merge:
it's outer join since you want data from both dfs.
pd.merge([A,B],how='outer', on=<mutual_key>)
Use the left_on and right_on attributes of pd.merge(). You choose the fields that you want to merge.
DF1.merge(DF2, how='outer', right_on=<keys>...)
I'm working on a way to transform sequence/genotype data from a csv format to a genepop format.
I have two dataframes: df1 is empty, df1.index (rows = samples) consists of almost the same as df2.index, except I inserted "POP" in several places (to specify the different populations). df2 holds the data, with Loci as columns.
I want to insert the values from df2 into df1, keeping empty rows where df1.index = 'POP'.
I tried join, combine, combine_first and concat, but they all seem to take the rows that exist in both df's.
Is there a way to do this?
It sounds like you want an 'outer' join:
df1.join(df2, how='outer')