Welcome, I have a simple question, to which I haven't found a solution.
I have two dataframes df1 and df2:
df1 contains several columns and a multiindex as year-month-week
df2 contains the multiindex year-week with only one column in the df.
I would like to create an inner join of df1 and df2, joining on 'year' and 'week'.
I have tried to do the following:
df1['newcol'] = df1.index.get_level_values(2).map(lambda x: df2.newcol[x])
Which only joins on month (or year?), is there any way to expand it so that the merge is actually right?
Thanks in advance!
df1
df2
Eventually i solved with with removing the multiindex and doing a good old inner join on the two columns and then recreating the multiindex at the end.
Here are the sniplets:
df=df.reset_index()
df2=df2.reset_index()
df['year']=df['year'].apply(int)
df2['year']=df2['year'].apply(int)
df['week']=df['week'].apply(int)
df2['week']=df2['week'].apply(int)
result = pd.merge(df, df2, how='left', left_on= ['year','week'],right_on= ['year','week'])
result=result.set_index(['year', 'month','week','day'])
Related
How do I join together 4 DataFrames? The names of the DataFrames are called df1, df2, df3, and df4.
They are all the same column size and I am trying to use the 'inner' join.
How would I modify this code to make it work for all four?
I tried using this code and it worked to combine two of them, but I could not figure out how to write it to work for all four DataFrames.
dfJoin = df1.join(df2,how='inner')
print(dfJoin)
You just have to chain together the joins.
dfJoin = df1.join(df2, how="inner", on="common_column") /
.join(df3, how="inner", on="common_column") /
.join(df4, how="inner", on="common_column")
or if you have more than 4, just put them in a list df_list and iterate through it.
Joining or appending multiple DataFrames in one go can be down with pd.concat():
list_of_df = [df1, df2, df3, df4]
df = pd.concat(list_of_df, how=“inner”)
Your question does now state if you want them merged column- or index-wise, but since your state that they have the same number of columns, I assume you wish to append said DataFrames. For this case the code above works. If you want to make a wide DataFrame, change the attribute axisto 1.
I am trying to merge two dataframes (call them DF1 & DF2) that basically look like the below. My goal is:
I want open/close/low/high to all come from DF1.
I want numEvents and Volume = DF1 + DF2.
In cases where DF2 has rows that don't exist in DF1, I want open/close/low/high to be NaN (so I can later backfill them), and numEvents and Volume to come from DF2 as is.
Any help is much appreciated!
use pd.merge:
it's outer join since you want data from both dfs.
pd.merge([A,B],how='outer', on=<mutual_key>)
Use the left_on and right_on attributes of pd.merge(). You choose the fields that you want to merge.
DF1.merge(DF2, how='outer', right_on=<keys>...)
I'm using Python Pandas to merge two dataframe, like so:
new_df = pd.merge(df1, df2, 'inner', left_on='Zip_Code', right_on='Zip_Code_List')
However, I would like to do this ONLY where another column ('Business_Name') in df2 contains a certain value. How do I do this? So, something like, "When Business Name is Walmart, merge these two dataframes."
#you can filter on df2 first and then merge.
pd.merge(df1, df2.query("Business_Name' == 'Walmart'"), how='inner', left_on='Zip_Code', right_on='Zip_Code_List')
Is there a way to concat, join or merge dataframes based on both the index and columns? For example, suppose I have a list of dataframes and I want something like
df = pandas.fullConcat(dfList)
where df.index should be the union of the indices in dfList ('outer' join) and df.columns should also be the union of the columns in dfList. I think all of the concat, join and merge methods just do a join on either the index or the column. I suppose a workaround is stack/unstack or reset_index? Do i miss something ?
I think you're going to have to reset the index:
df = df1.reset_index().merge(df2.reset_index(), on=['index','cols']).set_index('index')
I'm working on a way to transform sequence/genotype data from a csv format to a genepop format.
I have two dataframes: df1 is empty, df1.index (rows = samples) consists of almost the same as df2.index, except I inserted "POP" in several places (to specify the different populations). df2 holds the data, with Loci as columns.
I want to insert the values from df2 into df1, keeping empty rows where df1.index = 'POP'.
I tried join, combine, combine_first and concat, but they all seem to take the rows that exist in both df's.
Is there a way to do this?
It sounds like you want an 'outer' join:
df1.join(df2, how='outer')